您的位置:首页 > 运维架构

Hadoop使用问题记录

2015-06-25 23:48 155 查看
一、输入文件是压缩格式,如何处理?

根据[1]中的

Hadoop checks the file extension to detect compressed files. The compression types supported by Hadoop are: gzip, bzip2, and LZO. You do not need to take any additional action to extract files using these types of compression; Hadoop handles it for you.

So all you have to do write the logic as you would for a text file and pass in the directory which contains the .gz files as input.

But the issue with gzip files is that they are not splittable, imagine you have gzip files of each 5GB, then each mapper will process on the whole 5GB file instead of working with the default block size.

就是说对于hadoop mapreduce来说,输入文件如果是压缩格式,你不需要管,自动会进行解压缩

但是对于输出是怎么样的格式,就得你自己处理

二、如何读取MapReduce任务输出的压缩文件

命令:hadoop fs -text filename>file.txt(导到本地)

含义:text命令会自动进行解压缩

参考文献:

[1]http://stackoverflow.com/questions/26576985/mapreduce-in-java-gzip-input-files
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: