您的位置：首页 > 大数据 > Hadoop

HDFS的读写限流方案

2017-11-16 09:40 211 查看

我们的集群有的时候一跑几个大的任务,就会把机房网络带宽瞬间打满,导致线上部分服务抖动.所以我们就想到了在HDFS对普通读写的限流.所以说,本文可以实质上说是一篇总结报告,阐述了方案的设想,实现以及结果.

在hadoop自身内部，就有一些限流操作：

1、Blancer平衡数据限流

2、FsImage镜像文件上传下载数据传输流

3、VolumeScanner: 磁盘扫描的数据读操作

一 DataTransderThrottler

对数据传输限制流量的一个类，他所指定的带宽大小或者速率被多个线程共享。

他的核心思想就是：通过单位时间内限制的指定字节数的方式来控制平均传输速度。比如如果发现IO传输速度过快，超过规定时间内的带宽限定字节数，则会进行等待操作，等待下一个允许带宽传输的周期到来

//设置一个期间或者周期，比如5s内只允许发送1024字节，这里的周期就是5s

private final
long period;         //period over which
bw is imposed

private final
long periodExtension;
// Max period over which bw accumulates.

//周期内能够被发送的字节数量,1024字节就是允许发送的字节数

private long
bytesPerPeriod; // total number of bytes can be sent ineach period

//当前周期起始点

private long
curPeriodStart; // current period starting time

//当前能够发送的剩余字节数，比如周期前2s发送了500字节，那么还剩余524字节

private long
curReserve;      // remaining bytes can be sent in theperiod

//和上面相对应，这个是已经使用到字节数，前2s发送了500字节，那么这500字节就是已经使用了的字节数
private
long bytesAlreadyUsed;

public synchronized
void throttle(long
numOfBytes, Canceler
canceler) {

if ( numOfBytes <=
0 ) {

return;
    }

//当前的可传输的字节数减去当前发送/接收字节数

curReserve -= numOfBytes;

//当前字节使用量

bytesAlreadyUsed += numOfBytes;


while (curReserve <=
0) {

//如果设置了canceler对象,则不会进行限流操作

if (canceler !=
null &&
canceler.isCancelled()) {

return;
      }

long now =
monotonicNow();

long curPeriodEnd =
curPeriodStart + period;

// 如果当前时间还在本周期时间内的话,则必须等待此周期的结束,


// 重新获取新的可传输字节量

if ( now <
curPeriodEnd ) {

// Wait for next period so that curReservecan be increased.

try {

wait( curPeriodEnd -
now );
        }
catch (InterruptedException
e) {

// Abort throttle and reset interruptedstatus to make sure other

// interrupt handling higher in the callstack executes.
          Thread.currentThread().interrupt();

break;
        }
      }
else if (
now < (curPeriodStart +
periodExtension)) {

// 如果当前时间已经超过此周期的时间且不大于最大周期间隔,则增加可接受字节数,


// 并更新周期起始时间为前一周期的末尾时间

curPeriodStart = curPeriodEnd;

curReserve += bytesPerPeriod;
      }
else {

// discard the prev period.Throttler might not have

// been used for a long time.

// 如果当前时间超过curPeriodStart + periodExtension,则表示


// 已经长时间没有使用Throttler,重新重置时间


curPeriodStart = now;

curReserve = bytesPerPeriod -
bytesAlreadyUsed;
      }
    }

//传输结束,当前字节使用量进行移除

bytesAlreadyUsed -= numOfBytes;
}
}

所以在这里我们知道，影响带宽速率指标不仅和带宽速度上限值有关系，还和周期有关系，如果周期设置小了，那么发生等待的次数会相对变多，那么带宽平均速度回更低。

缺点：

1、周期默认是500，代码写死了，弄成可配置的更好

2、上述场景都不是在job层面，并没有在正常的readBlock和writeBlock操作做这样的限制，这样的话，job的数据传输，将会用光已有宽带。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航