大数据企业学习篇05----flume初识
2017-12-20 13:41
387 查看
一、flume架构
<1>Flume is a distributed, reliable, and availableservice for efficiently collecting, aggregating, and moving large amounts of log data.<2>It has a simple and flexible architecture based on streaming data flows. It is robust(健壮)and fault tolerant (容错)with tunable reliability mechanisms and many failover and recovery mechanisms.
<3>It uses a simple extensible data model that allows for online analytic application.(实时性要求较高)
<4>flume data flow model
<5>flume中的角色
<
4000
6>flume中的数据传输
<7>flume的三要素
二、flume的初步使用
<1>解压缩,配置flume-env.shexport JVAV_HOME=/opt/software/jdk1.7.0_67
<2>flume常用的命令
bin/flume-ng Usage: bin/flume-ng <command> [options]... commands: agent run a Flume agent global options: --conf,-c <conf> use configs in <conf> directory -Dproperty=value sets a Java system property value agent options: --name,-n <name> the name of this agent (required) --conf-file,-f <file> specify a config file (required if -z missing)
<3>启动agent
An agent is started using a shell script called flume-ng which is located in the bin directory of the Flume distribution. You need to specify the agent name, the config directory, and the config file on the command line:
bin/flume-ng agent --conf conf --name agent-test --conf-file test.conf
Now the agent will start running source and sinks configured in the given properties file.
<4>安装telnet
*安装rpm包
rpm -ivh ./*.rpm
*启动xinetd服务
/etc/rc.d/init.d/xinetd restart
<5>简单的样例
* 在conf下新建a1.conf
* 编写a1.conf(四步走:agent、source、channel、sink)
# example.conf: A single-node Flume configuration # Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = netcat a1.sources.r1.bind = localhost a1.sources.r1.port = 44444 # Describe the sink a1.sinks.k1.type = logger # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
*运行
bin/flume-ng agent \ -c conf \ -n a1 \ -f conf/a1.conf \ -Dflume.root.logger=DEBUG,console
*测试是否启动监听端口
telnet -nltp
*启动客户端
telnet localhost 44444
三、flume收集hive运行日志
<1>思路分析* 收集log
hive运行的日志
/opt/cdh-5.3.6/hive-0.13.1-cdh5.3.6/logs/hive.log
tail -f
* memory
hdfs
/user/beifeng/flume/hive-logs/
<2>为了使用HDFS sink,需将如下jar包放置到flume/lib下
<3>编写agent配置文件
# The configuration file needs to define the sources, # the channels and the sinks. # Sources, channels and sinks are defined per agent, # in this case called 'agent' ### define agent####### a2.sources = r2 a2.channels = c2 a2.sinks = k2 ### define sources ##### a2.sources.r2.type=exec a2.sources.r2.command=tail -F /opt/cdh-5.3.6/hive-0.13.1-cdh5.3.6/logs/hive.log ### define channels#### a2.channels.c2.type=memory ###define sinks ### a2.sinks.k2.type=hdfs a2.sinks.k2.hdfs.path=hdfs://hadoop-senior.ibeifeng.com:8020/user/beifeng/flume/hive.log a2.sinks.k2.hdfs.fileType=DataStream a2.sinks.k2.hdfs.batchSize=10 ### bind sources and sinks### a2.sources.r2.channels=c2 a2.sinks.k2.channel=c2
<4>运行
bin/flume-ng agent \ -c conf \ -n a2 \ -f conf/a2.conf \ -Dflume.root.logger=DEBUG,console
四、Flume项目架构
五、flume实战案例
<1>agent编写
# The configuration file needs to define the sources, # the channels and the sinks. # Sources, channels and sinks are defined per agent, # in this case called 'agent' ### define agent####### a3.sources = r3 a3.channels = c3 a3.sinks = k3 ### define sources ##### a3.sources.r3.type=spooldir a3.sources.r3.spoolDir=/opt/datas a3.sources.r3.ignorePattern=^(.)*\\.txt$ ### define channels#### a3.channels.c3.type=file a3.channels.c3.checkpointDir =/opt/datas/check_dir a3.channels.c3.dataDirs =/opt/datas/flume_data ###define sinks ### a3.sinks.k3.type= hdfs **a3.sinks.k3.hdfs.path=hdfs://hadoop-senior.ibeifeng.com:8020/user/beifeng/flume/%Y%m%d a3.sinks.k3.hdfs.useLocalTimeStamp=true** ### bind sources and sinks### a3.sources.r3.channels=c3 a3.sinks.k3.channel=c3
<2>测试运行
bin/flume-ng agent \ -c conf \ -n a3 \ -f conf/a3.conf \ -Dflume.root.logger=DEBUG,console
相关文章推荐
- 大数据企业学习篇03_1------hive 初识
- 大数据企业学习篇01之---Linux的那些事
- 大数据企业学习篇02_1------hadoop初识
- 大数据企业学习篇06----Oozie详解
- 大数据企业学习篇02_2------hadoop深入
- Flume在企业大数据仓库架构中位置及功能
- 大数据企业学习篇02_3-------hadoop高级
- 大数据企业学习篇03_3------hive 高级
- 大数据企业学习篇03_2-----hive 深入
- 大数据企业学习篇04-----Sqoop浅析
- Java学习篇3-数据类型和运算符
- 预防企业数据丢失的6种最佳实践
- 大数据测试之初识Hadoop
- 企业数据总线(ESB)和注册服务管理(dubbo)的区别
- 大数据技术--kafka和flume的对比
- 企业数据安全中的泄密途径分析
- 未来的信息化,就是挖掘企业数据、提升战略决策
- 就数据平台建设,80%的500强企业都有一个共性
- 数据分析在企业
- CM+CDH构建企业大数据平台系列(十三)