您的位置:首页 > 大数据 > 人工智能

利用QJM实现HDFS自动主从切换(HA Automatic Failover)源码详析

2014-07-25 10:27 701 查看
最近研究了下NameNode HA Automatic Failover方面的东西,当Active NN因为异常或其他原因不能正常提供服务时,处于Standby状态的NN就可以自动切换为Active状态,从而到达真正的高可用
NN HA Automatic Failover架构图


为了实现自动切换,需要依赖ZooKeeper和ZKFC组件,ZooKeeper主要用来记录NN的相关状态信息,zkfc组件以单独的JVM进程的形式运行在NN所在的节点上。下面首先分析下NN的启动流程,NN对象在实例化过程中,如果在hdfs-site.xml中配置的dfs.ha.namenodes.${dfs.nameservices}个数多于1个,则属性haEnabled为true,state的初始状态即为STANDBY
HAUtil.isHAEnabled(Configuration conf, String nsId) {
//根据${dfs.nameservices}获取dfs.ha.namenodes.${dfs.nameservices}
Map<String, Map<String, InetSocketAddress>> addresses =
DFSUtil.getHaNnRpcAddresses(conf);
if (addresses == null) return false;
Map<String, InetSocketAddress> nnMap = addresses.get(nsId);
//如果ndId对应的ha.namenodes个数大于1并且nnMap不为空,返回true
return nnMap != null && nnMap.size() > 1;
}

//NN启动过程中首先判断haEnabled属性值,为true则初始状态为STANDBY
NameNode.createHAState() {
return !haEnabled ? ACTIVE_STATE : STANDBY_STATE;
}
假设集群中有两个NN,这两个NN在初始启动时,都为STANDBY状态,如果不配置Automatic Failover,则需要手动将其中的一个NN切换为ACTIVE模式,命令如下
hdfs haadmin-transitionToActive nn1
如果配置并启动zkfc组件,则会自动将本节点的一个NameNode切换为ACTIVE状态,这个取决于先在哪台NN上启动ZKFC,ZKFC和NN的启动顺序并没有强制的要求。下面来主要分析下,当配置的两个NN节点都启动之后,ZKFC组件的启动主要做了哪些事。ZKFC的启动类是DFSZKFailoverController,继承自ZKFailoverController,首先从main方法入手
DFSZKFailoverController.main(String args[]){
if (DFSUtil.parseHelpArgument(args,
ZKFailoverController.USAGE, System.out, true)) {
System.exit(0);
}
//加载hdfs的配置信息
GenericOptionsParser parser = new GenericOptionsParser(
new HdfsConfiguration(), args);
//根据配置信息构造DFSZKFailoverController
//根据nsId,nnId实例化NNHAServiceTarget对象,并设置到zkfc对象中
//HAServiceTarget对象用来建立到指定NN的各种网络参数
DFSZKFailoverController zkfc = DFSZKFailoverController.create(
parser.getConfiguration());
//建立与zk的连接并格式化zk的目录
//实例化ActiveStandbyElector和HealthMonitor
//启动RPCServer
System.exit(zkfc.run(parser.getRemainingArgs()));
}
在ZKFC启动的过程中,启动了两个非常重要的进程内组件:HealthMonitor和ActiveStandbyElector。ZKFC主要从HealthMonitor和ActiveStandbyElector中订阅事件并管理NN的状态并负责fencing。HealthMonitor定期检查NN的健康状况,如果出现问题,以捕获异常的方式通过回调方法将变化通知给ZKFailoverController。ActiveStandbyElector主要用于管理NN在zk上的状态,包括创建节点,节点监控等。HealthMonitor在初始启动时,如果本地节点处于健康状态,则会触发一系列的事件使当前NN节点参加选举并切换为ACTIVE状态,具体代码如下
HealthMonitor.doHealthChecks() {
while (shouldRun) {
HAServiceStatus status = null;
boolean healthy = false;
try {
//通过proxy获取NN的状态
status = proxy.getServiceStatus();
proxy.monitorHealth();
healthy = true;
} catch (HealthCheckFailedException e) {
LOG.warn("Service health check failed for " + targetToMonitor
+ ": " + e.getMessage());
//调用callbacks集合中的回调方法,以事件的形式通知ZKFC
enterState(State.SERVICE_UNHEALTHY);
} catch (Throwable t) {
RPC.stopProxy(proxy);
proxy = null;
enterState(State.SERVICE_NOT_RESPONDING);
Thread.sleep(sleepAfterDisconnectMillis);
return;
}

if (status != null) {
setLastServiceStatus(status);
}
if (healthy) {
//初始启动时State=INITIALIZING,当前状态与初始状态不一致时,触发状态转移
enterState(State.SERVICE_HEALTHY);
}

Thread.sleep(checkIntervalMillis);
}
}

HealthMonitor.enterState(State newState) {
//本次状态和上一次状态不一致的时候,触发回调
if (newState != state) {
LOG.info("Entering state " + newState);
state = newState;
synchronized (callbacks) {
for (Callback cb : callbacks) {
cb.enteredState(newState);
}
}
}
}

HealthCallbacks.enteredState(HealthMonitor.State newState) {
setLastHealthState(newState);
//根据当前状态判断是否可以参加选举
recheckElectability();
}

HealthMonitor.recheckElectability() {
synchronized (elector) {
synchronized (this) {
boolean healthy = lastHealthState == State.SERVICE_HEALTHY;

long remainingDelay = delayJoiningUntilNanotime - System.nanoTime();
if (remainingDelay > 0) {
if (healthy) {
LOG.info("Would have joined master election, but this node is " +
"prohibited from doing so for " +
TimeUnit.NANOSECONDS.toMillis(remainingDelay) + " more ms");
}
scheduleRecheck(remainingDelay);
return;
}

switch (lastHealthState) {
//状态为SERVICE_HEALTHY,可参加选举
case SERVICE_HEALTHY:
elector.joinElection(targetToData(localTarget));
break;
//状态为INITIALIZING,失去选举资格
case INITIALIZING:
elector.quitElection(false);
break;

//状态为SERVICE_UNHEALTHY或者SERVICE_NOT_RESPONDING,失去选举资格
case SERVICE_UNHEALTHY:
case SERVICE_NOT_RESPONDING:
elector.quitElection(true);
break;

case HEALTH_MONITOR_FAILED:
fatalError("Health monitor failed!");
break;

default:
throw new IllegalArgumentException("Unhandled state:" + lastHealthState);
}
}
}
}

ActiveStandbyElector.joinElectionInternal() {
Preconditions.checkState(appData != null,
"trying to join election without any app data");
if (zkClient == null) {
if (!reEstablishSession()) {
fatalError("Failed to reEstablish connection with ZooKeeper");
return;
}
}

createRetryCount = 0;
wantToBeInElection = true;
//向zk写入ephemeral类型的znode,当NN挂掉后,会被自动删除
createLockNodeAsync();
}

ActiveStandbyElector.createLockNodeAsync() {
//异步调用,当方法返回时,触发回调方法
//processResult(int rc, String path, Object ctx,String name)
zkClient.create(zkLockFilePath, appData, zkAcl, CreateMode.EPHEMERAL,
this, zkClient);
}

ActiveStandbyElector.processResult(int rc, String path, Object ctx,
String name) {
Code code = Code.get(rc);
if (isSuccess(code)) {
//创建成功,试图使节点变为ACTIVE状态
//becomeActive()通过与本地NN进行RPC通信,将NN的state设置为ACTIVE
//从而使得当前节点的NN更新为主节点
if (becomeActive()) {
//监控节点
monitorActiveStatus();
} else {
reJoinElectionAfterFailureToBecomeActive();
}
return;
}
//节点已经存在,说明已经有NN成为了主节点,此时本节点只能作为热备节点存在
//故状态设置为STANDBY
if (isNodeExists(code)) {
if (createRetryCount == 0) {
becomeStandby();
}
//监控ACTIVE NN写入的znode,当节点状态改变时,通过watch机制触发回调事件
monitorActiveStatus();
return;
}

ActiveStandbyElector.becomeActive() {
if (state == State.ACTIVE) {
// already active
return true;
}
try {
//获取上一个ACTIVE NN的面包屑节点数据,并对上一个ACTIVE NN执行fence操作
Stat oldBreadcrumbStat = fenceOldActive();
//更新面包屑节点的数据
writeBreadCrumbNode(oldBreadcrumbStat);

//rpc方式与当前NN交互,使得当前节点的NN变为ACTIVE状态
appClient.becomeActive();
state = State.ACTIVE;
return true;
} catch (Exception e) {
return false;
}
}
处于STANDBY状态的NN会监控ACTIVE NN写入zk的znode节点,当节点状态改变时,触发zk的watch回调,使得STANDBY NN重新参与到选举中,从而完成状态的自动切换,代码如下
ActiveStandbyElector.processWatchEvent(ZooKeeper zk, WatchedEvent event) {
String path = event.getPath();
if (path != null) {
switch (eventType) {
case NodeDeleted:
if (state == State.ACTIVE) {
enterNeutralMode();
}
//重新参加选举
joinElectionInternal();
break;
case NodeDataChanged:
monitorActiveStatus();
break;
default:
monitorActiveStatus();
}
}
上述主要是从代码的角度去理解和分析了NN自动切换的大致流程,相比手动切换的方式,可用性大大提升,同时减轻了运维的负担。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  切换 hadoop HDFS
相关文章推荐