您的位置：首页 > 大数据 > Hadoop

Flume采集数据到HDFS时，生成的文件中，开头信息有乱码

2015-11-20 17:39 821 查看

Flume版本为Flume-ng 1.5，配置文件如下。在生成的HDFS文件中，总是有“

SEQ!org.apache.hadoop.io.LongWritable"org.apache.hadoop.io.BytesWritable??H謺NSA???y”信息，

配置文件如下，

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

a1.sources.r1.type = spooldir

a1.sources.r1.fileHeader = true

#a1.sources.r1.deserializer.outputCharset=UTF-8

a1.sources.r1.spoolDir = /opt/personal/file/access

a1.sources.r1.channels = c1

# Describe the sink

a1.sinks.k1.type = hdfs

a1.sinks.k1.channel = c1

a1.sinks.k1.hdfs.path = hdfs://node143:9000/access/events

a1.sinks.k1.hdfs.filePrefix = access

a1.sinks.k1.hdfs.fileSuffix=.log

#a1.sinks.k1.hdfs.hdfs.writeFormat= Text

a1.sinks.k1.hdfs.round = true

a1.sinks.k1.hdfs.roundValue = 10

a1.sinks.k1.hdfs.roundUnit = minute

# Use a channel which buffers events in memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 10000

a1.channels.c1.transactionCapacity = 1000

# Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

翻看Flume文档，发现，HdfsEventSink中，hdfs.fileType默认为SequenceFile，将其改为DataStream就可以按照采集的文件原样输入到hdfs，加一行a1.sinks.k1.hdfs.fileType=DataStream

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航