您的位置:首页 > 其它

Flume-1.6.0部分源码分析续1

2016-06-01 10:54 141 查看

5、Source、Channel和Sink之间是靠什么联系在一起的呢?

上述三者之间的联系主要是基于:Transaction类。Channel采用了Transaction(事务)机制来保证数据的完整性,这里的事务和数据库中的事务概念类似,但并不是完全一致,其语义可以参考下面这个图:

 source
产生Event,通过“put”、“commit”操作将Event放到Channel中

sink
通过“take”操作从Channel中取出Event,进行相应的处理。

6、以MemoryChannel为例来说明Channel是怎么发挥中间缓冲作用的。

6.1 首先看一下MemoryChannel中比较重要的成员变量:

// lock to guard queue, mainly needed to keep it locked down during resizes
// it should never be held through a blocking operationprivate Object queueLock = new Object();
//queue为Memory Channel中存放Event的地方,这里用了LinkedBlockingDeque来实现
@GuardedBy(value = "queueLock")private LinkedBlockingDeque<Event> queue;
//下面的两个信号量用来做同步操作,queueRemaining表示queue中的剩余空间,queueStored表示queue中的使用空间
// invariant that tracks the amount of space remaining in the queue(with all uncommitted takeLists deducted)
// we maintain the remaining permits = queue.remaining - takeList.size()
// this allows local threads waiting for space in the queue to commit without denying access to the
// shared lock to threads that would make more space on the queueprivate Semaphore queueRemaining;
// used to make "reservations" to grab data from the queue.
// by using this we can block for a while to get data without locking all other threads out
// like we would if we tried to use a blocking call on queueprivate Semaphore queueStored;
//下面几个变量为配置文件中Memory Channel的配置项
// 一个事务中Event的最大数目private volatile Integer transCapacity;
// 向queue中添加、移除Event的等待时间private volatile int keepAlive;
// queue中,所有Event所能占用的最大空间
private volatile int byteCapacity;private volatile int lastByteCapacity;
// queue中,所有Event的header所能占用的最大空间占byteCapacity的比例private volatile int byteCapacityBufferPercentage;
// 用于标示byteCapacity中剩余空间的信号量private Semaphore bytesRemaining;
// 用于记录Memory Channel的一些指标,后面可以通过配置监控来观察Flume的运行情况
private ChannelCounter channelCounter;

然后重点说下MemoryChannel里面的MemoryTransaction,它是Transaction类的子类,从其文档来看,一个Transaction的使用模式都是类似的:

<span style="font-size:18px;">Channel ch = ...
Transaction tx = ch.getTransaction();
try {
tx.begin();
...
// ch.put(event) or ch.take()
...
tx.commit();
} catch (ChannelException ex) {
tx.rollback();
...
} finally {
tx.close();
}</span>

可以看到一个Transaction主要有、put、take、commit、rollback这四个方法,我们在实现其子类时,主要也是实现着四个方法。

Flume官方为了方便开发者实现自己的Transaction,定义了BasicTransactionSemantics,这时开发者只需要继承这个辅助类,并且实现其相应的、doPut、doTake、doCommit、doRollback方法即可,MemoryChannel就是继承了这个辅助类。

<span style="font-size:18px;">private class MemoryTransaction extends BasicTransactionSemantics {
//和MemoryChannel一样,内部使用LinkedBlockingDeque来保存没有commit的Event
private LinkedBlockingDeque<Event> takeList;
private LinkedBlockingDeque<Event> putList;
private final ChannelCounter channelCounter;
//下面两个变量用来表示put的Event的大小、take的Event的大小
private int putByteCounter = 0;
private int takeByteCounter = 0;

public MemoryTransaction(int transCapacity, ChannelCounter counter) {
//用transCapacity来初始化put、take的队列
putList = new LinkedBlockingDeque<Event>(transCapacity);
takeList = new LinkedBlockingDeque<Event>(transCapacity);

channelCounter = counter;
}

@Override
protected void doPut(Event event) throws InterruptedException {
//doPut操作,先判断putList中是否还有剩余空间,有则把Event插入到该队列中,同时更新putByteCounter
//没有剩余空间的话,直接报ChannelException
channelCounter.incrementEventPutAttemptCount();
int eventByteSize = (int)Math.ceil(estimateEventSize(event)/byteCapacitySlotSize);

if (!putList.offer(event)) {
throw new ChannelException(
"Put queue for MemoryTransaction of capacity " +
putList.size() + " full, consider committing more frequently, " +
"increasing capacity or increasing thread count");
}
putByteCounter += eventByteSize;
}

@Override
protected Event doTake() throws InterruptedException {
//doTake操作,首先判断takeList中是否还有剩余空间
channelCounter.incrementEventTakeAttemptCount();
if(takeList.remainingCapacity() == 0) {
throw new ChannelException("Take list for MemoryTransaction, capacity " +
takeList.size() + " full, consider committing more frequently, " +
"increasing capacity, or increasing thread count");
}
//然后判断,该MemoryChannel中的queue中是否还有空间,这里通过信号量来判断
if(!queueStored.tryAcquire(keepAlive, TimeUnit.SECONDS)) {
return null;
}
Event event;
//从MemoryChannel中的queue中取出一个event
synchronized(queueLock) {
event = queue.poll();
}
Preconditions.checkNotNull(event, "Queue.poll returned NULL despite semaphore " +
"signalling existence of entry");
//放到takeList中,然后更新takeByteCounter变量
takeList.put(event);

int eventByteSize = (int)Math.ceil(estimateEventSize(event)/byteCapacitySlotSize);
takeByteCounter += eventByteSize;

return event;
}

@Override
protected void doCommit() throws InterruptedException {
//该对应一个事务的提交
//首先判断putList与takeList的相对大小
int remainingChange = takeList.size() - putList.size();
//如果takeList小,说明向该MemoryChannel放的数据比取的数据要多,所以需要判断该MemoryChannel是否有空间来放
if(remainingChange < 0) {
// 1. 首先通过信号量来判断是否还有剩余空间
if(!bytesRemaining.tryAcquire(putByteCounter, keepAlive,
TimeUnit.SECONDS)) {
throw new ChannelException("Cannot commit transaction. Byte capacity " +
"allocated to store event body " + byteCapacity * byteCapacitySlotSize +
"reached. Please increase heap space/byte capacity allocated to " +
"the channel as the sinks may not be keeping up with the sources");
}
// 2. 然后判断,在给定的keepAlive时间内,能否获取到充足的queue空间
if(!queueRemaining.tryAcquire(-remainingChange, keepAlive, TimeUnit.SECONDS)) {
bytesRemaining.release(putByteCounter);
throw new ChannelFullException("Space for commit to queue couldn't be acquired." +
" Sinks are likely not keeping up with sources, or the buffer size is too tight");
}
}
int puts = putList.size();
int takes = takeList.size();
//如果上面的两个判断都过了,那么把putList中的Event放到该MemoryChannel中的queue中。
synchronized(queueLock) {
if(puts > 0 ) {
while(!putList.isEmpty()) {
if(!queue.offer(putList.removeFirst())) {
throw new RuntimeException("Queue add failed, this shouldn't be able to happen");
}
}
}
//清空本次事务中用到的putList与takeList,释放资源
putList.clear();
takeList.clear();
}
//更新控制queue大小的信号量bytesRemaining,因为把takeList清空了,所以直接把takeByteCounter加到bytesRemaining中。
bytesRemaining.release(takeByteCounter);
takeByteCounter = 0;
putByteCounter = 0;
//因为把putList中的Event放到了MemoryChannel中的queue,所以把puts加到queueStored中去。
queueStored.release(puts);
//如果takeList比putList大,说明该MemoryChannel中queue的数量应该是减少了,所以把(takeList-putList)的差值加到信号量queueRemaining
if(remainingChange > 0) {
queueRemaining.release(remainingChange);
}
if (puts > 0) {
channelCounter.addToEventPutSuccessCount(puts);
}
if (takes > 0) {
channelCounter.addToEventTakeSuccessCount(takes);
}

channelCounter.setChannelSize(queue.size());
}

@Override
protected void doRollback() {
//当一个事务失败时,会进行回滚,即调用本方法
//首先把takeList中的Event放回到MemoryChannel中的queue中。
int takes = takeList.size();
synchronized(queueLock) {
Preconditions.checkState(queue.remainingCapacity() >= takeList.size(), "Not enough space in memory channel " +
"queue to rollback takes. This should never happen, please report");
while(!takeList.isEmpty()) {
queue.addFirst(takeList.removeLast());
}
//然后清空putList
putList.clear();
}
//因为清空了putList,所以需要把putList所占用的空间大小添加到bytesRemaining中
bytesRemaining.release(putByteCounter);
putByteCounter = 0;
takeByteCounter = 0;
//因为把takeList中的Event回退到queue中去了,所以需要把takeList的大小添加到queueStored中
queueStored.release(takes);
channelCounter.setChannelSize(queue.size());
}

}</span>

MemoryChannel的逻辑相对简单,主要是通过MemoryTransaction中的putList、takeList与MemoryChannel中的queue打交道,这里的queue相当于持久化层,只不过放到了内存中,如果是FileChannel的话,会把这个queue放到本地文件中。下面表示了Event在一个使用了MemoryChannel的agent中数据流向是:

source ---> putList ---> queue ---> takeList ---> sink
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: