Combine small files to Sequence file
2015-02-22 11:05
239 查看
Combine small files to sequence file or avro files are a good method to feed hadoop.
Small files in hadoop will take more namenode memory resource.
SequenceFileInputFormat 是一种Key value 格式的文件格式。
Key和Value的类型可以自己实现其序列化和反序列化内容。
SequenceFile示例内容:
其默认的key,value之间的分隔符 是\001,这个与hive文件的存储格式是匹配的,这样也方便直接把这种文件加载到hive里面。
以下的代码仅供参考作用,真实的项目中使用的时候,可以做适当的调整,以更高地节约资源和满足项目的需要。
示例代码如下:
Small files in hadoop will take more namenode memory resource.
SequenceFileInputFormat 是一种Key value 格式的文件格式。
Key和Value的类型可以自己实现其序列化和反序列化内容。
SequenceFile示例内容:
其默认的key,value之间的分隔符 是\001,这个与hive文件的存储格式是匹配的,这样也方便直接把这种文件加载到hive里面。
以下的代码仅供参考作用,真实的项目中使用的时候,可以做适当的调整,以更高地节约资源和满足项目的需要。
示例代码如下:
package myexamples; import java.io.File; import java.io.IOException; import java.util.HashMap; import java.util.Map; import org.apache.commons.io.FileUtils; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.SequenceFile; import org.apache.hadoop.io.Text; public class localf2seqfile { /* * Local folder has a lot of txt file * we need handle it in map reduce * so we want to load these files to one sequence file * key: source file name * value: file content * */ static void write2Seqfile(FileSystem fs,Path hdfspath,HashMap<Text,Text> hm) { SequenceFile.Writer writer=null; try { writer=SequenceFile.createWriter(fs, fs.getConf(), hdfspath, Text.class, Text.class); for(Map.Entry<Text,Text> entry:hm.entrySet()) writer.append(entry.getKey(),entry.getValue()); } catch (IOException e) { e.printStackTrace(); }finally{ try{writer.close();}catch(IOException ioe){} } } static HashMap<Text,Text> collectFiles(String localpath) throws IOException { HashMap<Text,Text> hm = new HashMap<Text,Text>(); File f = new File(localpath); if(!f.isDirectory()) return hm; for(File file:f.listFiles()) hm.put(new Text(file.getName()), new Text(FileUtils.readFileToString(file))); return hm; } static void readSeqFile(FileSystem fs, Path hdfspath) throws IOException { SequenceFile.Reader reader = new SequenceFile.Reader(fs,hdfspath,fs.getConf()); Text key = new Text(); Text value = new Text(); while (reader.next(key, value)) { System.out.print(key +" "); System.out.println(value); } reader.close(); } public static void main(String[] args) throws IOException { args = "/home/hadoop/test/sub".split(" "); Configuration conf = new Configuration(); conf.set("fs.default.name", "hdfs://namenode:9000"); FileSystem fs = FileSystem.get(conf); System.out.println(fs.getUri()); Path file = new Path("/user/hadoop/seqfiles/seqdemo.seq"); if (fs.exists(file)) fs.delete(file,false); HashMap<Text,Text> hm = collectFiles(args[0]); write2Seqfile(fs,file,hm); readSeqFile(fs,file); fs.close(); } }
相关文章推荐
- HDFS: Using HDFS API to append to a SequenceFile
- The source files "*\A.cpp " and "*\A.cpp " are both configured to produce the output file "*\A.obj "
- Unable to load configuration. - bean - jar:file:/D:/javaProgramFiles/tomcat-6.0.20/webapps/zoyeo
- The source files...are both configured to produce the output file,The project cannot be built.
- Unable to load configuration. - bean - jar:file:/H:/Program Files/Apache Software Foundation/T
- Clean RTF files to any encoding text file type
- Local IIS 7.0 - CS0016: Could not write to output file / Microsoft.Net > Framework > v4.0.30319 > Temporary ASP.NET Files
- Cygwin编译自己定义OpenCV库报错:opencv_contrib: LOCAL_SRC_FILES points to a missing file
- How to create .lib file when you only have .dll and .h files
- Cygwin编译自定义OpenCV库报错:opencv_contrib: LOCAL_SRC_FILES points to a missing file
- Could not write to output file 'c:\Windows\Microsoft.NET ASP.NET Files\root\xx' -- 'Access is denied
- Cocos2d-x 3.12 ndk r12b ERROR: cocos_freet ype2_static:LOCAL_SRC_FILES points to a missing file
- ASP.NET makes uploading files from the client to the server a snap(UploadInterface.PostedFile.SaveAs)
- [.net] 关于CS0016: Could not write to output file ‘c:\WINDOWS\Microsoft.NET\Framework\v4.0.30319\Temporary ASP.NET Files… ‘Access is denied.’ 的解决办法
- Linux audit files to see who made changes to a file
- Downloading files from a server to client, using ASP.Net, when file size is too big for MemoryStream using Generic Handlers (ashx)
- How to Output a List of Files to a File and Sort Them in Linux
- VC编译错误:The source files "*\A.cpp " and "*\A.cpp " are both configured to produce the output file "*\
- File Comparer - To compare two files and check whether they have the same content
- nginx编译安装报错src/os/unix/ngx_files.c: In function 鈔gx_write_chain_to_file?