您的位置：首页 > 其它

Flume部署和启动

2016-05-27 10:51 344 查看

Flume启动命令
flume-ng agent --conf conf --conf-file conf/file.log --name agent1 -Dflume.root.logger=DEBUG, console
-c (--conf) ： flume的conf文件路径
-f (--conf-file) ：自定义的flume配置文件
-n (--name)：自定义的flume配置文件中agent的name

log4j:WARN No appenders could be found for logger (org.apache.flume.node.PollingPropertiesFileConfigurationProvider).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

是因为--conf 指定的配置路径找不到 log4j.properties的文件。

示例： spooldir 数据源

#声明source、channel、sink
agent-1.channels=ch-1
agent-1.sources=src-1
agent-1.sinks=log-sink1

#channel的类型
agent-1.channels.ch-1.type=memory

#sources的获取，spooldir通过文件夹的形式，只需要把日志文件放入flumelog2文件夹，sink会以logger的形式打印出来
agent-1.sources.src-1.type=spooldir
agent-1.sources.src-1.channels=ch-1
agent-1.sources.src-1.spoolDir=/home/hadoop/app/flumelog2
agent-1.sources.src-1.fileHeader=true

#sink消费类型 logger
agent-1.sinks.log-sink1.type=logger
agent-1.sinks.log-sink1.channel=ch-1

示例： spoolDir --> hdfs

agent1.sources=source1
agent1.sinks=sink1
agent1.channels=channel1

# spoolDir 检测改路径的文件，有新的文件进入，就会被检测到。
agent1.sources.source1.type=spooldir
agent1.sources.source1.spoolDir=/home/hadoop/app/flumelog/log
agent1.sources.source1.channels=channel1
agent1.sources.source1.fileHeader=false
agent1.sources.source1.interceptors=i1
agent1.sources.source1.interceptors.i1.type=timestamp

# sink消费channel1中的数据到hdfs里。
agent1.sinks.sink1.type=hdfs
agent1.sinks.sink1.hdfs.path=hdfs://hdp-server01:9000/flume/log
agent1.sinks.sink1.hdfs.fileType=DataStream
agent1.sinks.sink1.hdfs.writeFormat=Text
agent1.sinks.sink1.hdfs.rollInterval=0
agent1.sinks.sink1.hdfs.rollCount=0
agent1.sinks.sink1.hdfs.bachSize=1000
agent1.sinks.sink1.channel=channel1
agent1.sinks.sink1.hdfs.filePrefix=%Y-%m-%d

# channel以文件的形式保存source传输过来的数据。checkpointDir路径下存放一些数据的缓存。dataDirs路径下存放一些操作记录和数据。（可以自己指定）
agent1.channels.channel1.type=file
agent1.channels.channel1.checkpointDir=/home/hadoop/app/flumelog/log/logdfstmp/point
agent1.channels.channel1.dataDirs=/home/hadoop/app/flumelog/log/logdfstmp

ps：

1、data/log-ID，这种类型的文件存放的是put、take、commit、rollback的操作记录及数据。

2、checkpoint/checkpoint存放的是event在那个data文件logFileID，的什么位置offset等信息。

2、checkpoint/inflightTakes存放的是事务take的缓存数据，每隔段时间就重建文件。内容：1、16字节是校验码；2、transactionID1+eventsCount1+eventPointer11+eventPointer12+...；3、transactionID2+eventsCount2+eventPointer21+eventPointer22+...

3、checkpoint/inflightPuts存放的是事务对应的put缓存数据，每隔段时间就重建文件。内容：1、16字节是校验码；2、transactionID1+eventsCount1+eventPointer11+eventPointer12+...；3、transactionID2+eventsCount2+eventPointer21+eventPointer22+...

4、checkpoint/checkpoint.meta主要存储的是logfileID及对应event的数量等信息。

5、data/log-ID.meta，主要记录log-ID下一个写入位置以及logWriteOrderID等信息。

6、每个data目录里data文件保持不超过2个。

7、putList和takeList是缓存存储的是相应的FlumeEventPointer，但是inflightTakes和inflightPuts其实也是缓存存储的也是相应的信息，只不过比两者多存一些信息罢了，功能重合度很高，为什么会这样呢？我想是一个只能在内存，一个可以永久存储(当然是不断重建的)，后者可以用来进行flume再启动的恢复。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航