您的位置:首页 > 大数据 > Hadoop

Flume采集数据到HDFS时,生成的文件中,开头信息有乱码

2015-11-20 17:39 821 查看
Flume版本为Flume-ng 1.5,配置文件如下。在生成的HDFS文件中,总是有“

SEQ!org.apache.hadoop.io.LongWritable"org.apache.hadoop.io.BytesWritable??H謺NSA???y”信息,

配置文件如下,

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

a1.sources.r1.type = spooldir

a1.sources.r1.fileHeader = true

#a1.sources.r1.deserializer.outputCharset=UTF-8

a1.sources.r1.spoolDir = /opt/personal/file/access

a1.sources.r1.channels = c1

# Describe the sink

a1.sinks.k1.type = hdfs

a1.sinks.k1.channel = c1

a1.sinks.k1.hdfs.path = hdfs://node143:9000/access/events

a1.sinks.k1.hdfs.filePrefix = access

a1.sinks.k1.hdfs.fileSuffix=.log

#a1.sinks.k1.hdfs.hdfs.writeFormat= Text

a1.sinks.k1.hdfs.round = true

a1.sinks.k1.hdfs.roundValue = 10

a1.sinks.k1.hdfs.roundUnit = minute

# Use a channel which buffers events in memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 10000

a1.channels.c1.transactionCapacity = 1000

# Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

翻看Flume文档,发现,HdfsEventSink中,hdfs.fileType默认为SequenceFile,将其改为DataStream就可以按照采集的文件原样输入到hdfs,加一行a1.sinks.k1.hdfs.fileType=DataStream
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: