Hadoop使用问题记录
2015-06-25 23:48
155 查看
一、输入文件是压缩格式,如何处理?
根据[1]中的
Hadoop checks the file extension to detect compressed files. The compression types supported by Hadoop are: gzip, bzip2, and LZO. You do not need to take any additional action to extract files using these types of compression; Hadoop handles it for you.
So all you have to do write the logic as you would for a text file and pass in the directory which contains the .gz files as input.
But the issue with gzip files is that they are not splittable, imagine you have gzip files of each 5GB, then each mapper will process on the whole 5GB file instead of working with the default block size.
就是说对于hadoop mapreduce来说,输入文件如果是压缩格式,你不需要管,自动会进行解压缩
但是对于输出是怎么样的格式,就得你自己处理
二、如何读取MapReduce任务输出的压缩文件
命令:hadoop fs -text filename>file.txt(导到本地)
含义:text命令会自动进行解压缩
参考文献:
[1]http://stackoverflow.com/questions/26576985/mapreduce-in-java-gzip-input-files
根据[1]中的
Hadoop checks the file extension to detect compressed files. The compression types supported by Hadoop are: gzip, bzip2, and LZO. You do not need to take any additional action to extract files using these types of compression; Hadoop handles it for you.
So all you have to do write the logic as you would for a text file and pass in the directory which contains the .gz files as input.
But the issue with gzip files is that they are not splittable, imagine you have gzip files of each 5GB, then each mapper will process on the whole 5GB file instead of working with the default block size.
就是说对于hadoop mapreduce来说,输入文件如果是压缩格式,你不需要管,自动会进行解压缩
但是对于输出是怎么样的格式,就得你自己处理
二、如何读取MapReduce任务输出的压缩文件
命令:hadoop fs -text filename>file.txt(导到本地)
含义:text命令会自动进行解压缩
参考文献:
[1]http://stackoverflow.com/questions/26576985/mapreduce-in-java-gzip-input-files
相关文章推荐
- 利用linux的mtrace命令定位内存泄露(Memory Leak)
- Hadoop是什么?
- 快速最优通道布线算法(详细)
- 用git下载linux源码
- Linux的几种查找命令
- 使用本地JConsole监控远程JVM(最权威的总结)
- linux下vi编辑文件
- Centos学习笔记之grep命令
- 鸟哥的Linux私房菜
- Discover browser developer tools
- Linux touch 命令
- CentOS修改22默认端口 SSH默认端口
- shell中IFS用法
- 《鸟哥的Linux私房菜》第三章 主机规划与磁盘分区
- shell中的条件语句
- Docker安全
- linux回到用户上次所在目录/上一级目录/用户家目录/顶级目录
- Linux下gaogent使用指南
- linux中的strings命令简介
- openCV读入图片,openGL实现纹理贴图