您的位置:首页 > 大数据

大数据企业学习篇05----flume初识

2017-12-20 13:41 387 查看

一、flume架构

<1>Flume is a distributed, reliable, and availableservice for efficiently collecting, aggregating, and moving large amounts of log data.

<2>It has a simple and flexible architecture based on streaming data flows. It is robust(健壮)and fault tolerant (容错)with tunable reliability mechanisms and many failover and recovery mechanisms.

<3>It uses a simple extensible data model that allows for online analytic application.(实时性要求较高)

<4>flume data flow model



<5>flume中的角色



<
4000
6>flume中的数据传输



<7>flume的三要素



二、flume的初步使用

<1>解压缩,配置flume-env.sh

export JVAV_HOME=/opt/software/jdk1.7.0_67


<2>flume常用的命令

bin/flume-ng
Usage: bin/flume-ng <command> [options]...

commands:
agent                     run a Flume agent

global options:
--conf,-c <conf>          use configs in <conf> directory
-Dproperty=value          sets a Java system property value

agent options:
--name,-n <name>          the name of this agent (required)
--conf-file,-f <file>     specify a config file (required if -z missing)


<3>启动agent

An agent is started using a shell script called flume-ng which is located in the bin directory of the Flume distribution. You need to specify the agent name, the config directory, and the config file on the command line:

bin/flume-ng agent --conf conf --name agent-test --conf-file test.conf


Now the agent will start running source and sinks configured in the given properties file.

<4>安装telnet

*安装rpm包

rpm -ivh ./*.rpm


*启动xinetd服务

/etc/rc.d/init.d/xinetd restart


<5>简单的样例

* 在conf下新建a1.conf

* 编写a1.conf(四步走:agent、source、channel、sink)

# example.conf: A single-node Flume configuration

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1


*运行

bin/flume-ng agent \
-c conf \
-n a1 \
-f conf/a1.conf \
-Dflume.root.logger=DEBUG,console


*测试是否启动监听端口

telnet -nltp


*启动客户端

telnet localhost 44444


三、flume收集hive运行日志

<1>思路分析

* 收集log

hive运行的日志

/opt/cdh-5.3.6/hive-0.13.1-cdh5.3.6/logs/hive.log

tail -f

* memory

hdfs

/user/beifeng/flume/hive-logs/

<2>为了使用HDFS sink,需将如下jar包放置到flume/lib下



<3>编写agent配置文件

# The configuration file needs to define the sources,
# the channels and the sinks.
# Sources, channels and sinks are defined per agent,
# in this case called 'agent'

### define agent#######
a2.sources = r2
a2.channels = c2
a2.sinks = k2

### define sources #####
a2.sources.r2.type=exec
a2.sources.r2.command=tail -F /opt/cdh-5.3.6/hive-0.13.1-cdh5.3.6/logs/hive.log

### define channels####
a2.channels.c2.type=memory

###define sinks ###
a2.sinks.k2.type=hdfs
a2.sinks.k2.hdfs.path=hdfs://hadoop-senior.ibeifeng.com:8020/user/beifeng/flume/hive.log
a2.sinks.k2.hdfs.fileType=DataStream
a2.sinks.k2.hdfs.batchSize=10

### bind sources and sinks###
a2.sources.r2.channels=c2
a2.sinks.k2.channel=c2


<4>运行

bin/flume-ng agent \
-c conf \
-n a2 \
-f conf/a2.conf \
-Dflume.root.logger=DEBUG,console


四、Flume项目架构











五、flume实战案例





<1>agent编写

# The configuration file needs to define the sources,
# the channels and the sinks.
# Sources, channels and sinks are defined per agent,
# in this case called 'agent'

### define agent#######
a3.sources = r3
a3.channels = c3
a3.sinks = k3

### define sources #####
a3.sources.r3.type=spooldir
a3.sources.r3.spoolDir=/opt/datas
a3.sources.r3.ignorePattern=^(.)*\\.txt$

### define channels####
a3.channels.c3.type=file
a3.channels.c3.checkpointDir =/opt/datas/check_dir
a3.channels.c3.dataDirs =/opt/datas/flume_data

###define sinks ###
a3.sinks.k3.type= hdfs
**a3.sinks.k3.hdfs.path=hdfs://hadoop-senior.ibeifeng.com:8020/user/beifeng/flume/%Y%m%d
a3.sinks.k3.hdfs.useLocalTimeStamp=true**

### bind sources and sinks###
a3.sources.r3.channels=c3
a3.sinks.k3.channel=c3


<2>测试运行

bin/flume-ng agent \
-c conf \
-n a3 \
-f conf/a3.conf \
-Dflume.root.logger=DEBUG,console
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  大数据 企业 flume