您的位置:首页 > 其它

【ZooKeeper Notes 29】 修复“ZooKeeper客户端打印当前连接的服务器地址为null”的Bug

2012-11-04 08:21 441 查看
转载请注明:@ni掌柜 nileader@gmail.com

问题描述

公司之前进行了几次机房容灾演习中,经常是模拟一个机房挂掉的场景,把一个机房的网络切掉,使得这个机房内部网络通信正常,与外部的网络不通。在容灾演习过程中,我们发现ZK的客户端应用中出现大量类似这样的日志:

An exception was thrown while closing send thread for ession 0x for server null, unexpected error, closing socket connection and attempting

从这个日志中,红色部分出现的是null。当时看到这个情况,觉得,正常情况正在,这个地方应用出现的是那个被隔离的机房中部署的ZK的机器IP的,但是这里出现的是null,非常困惑。

具体描述也可以在这里查看:https://issues.apache.org/jira/browse/ZOOKEEPER-1480

问题定位

看了下3.4.3及其以前版本的ZooKeeper代码,发现问题出在这里,日志打印的逻辑在这里:

} catch (Throwable e) {
if (closing) {
if (LOG.isDebugEnabled()) {
// closing so this is expected
LOG.debug("An exception was thrown while closing send thread for session 0x"
+ Long.toHexString(getSessionId())
+ " : " + e.getMessage());
}
break;
} else {
// this is ugly, you have a better way speak up
if (e instanceof SessionExpiredException) {
LOG.info(e.getMessage() + ", closing socket connection");
} else if (e instanceof SessionTimeoutException) {
LOG.info(e.getMessage() + RETRY_CONN_MSG);
} else if (e instanceof EndOfStreamException) {
LOG.info(e.getMessage() + RETRY_CONN_MSG);
} else if (e instanceof RWServerFoundException) {
LOG.info(e.getMessage());
} else {
LOG.warn(
"Session 0x"
+ Long.toHexString(getSessionId())
+ " for server "
+ clientCnxnSocket.getRemoteSocketAddress()
+ ", unexpected error"
+ RETRY_CONN_MSG, e);
}

可以看到,在打印日志过程,是通过clientCnxnSocket.getRemoteSocketAddress() 来获取当前连接的服务器地址的,那再来看下这个方法:

/**
* Returns the address to which the socket is connected.
* @return ip address of the remote side of the connection or null if not connected
*/
@Override
SocketAddress getRemoteSocketAddress() {
// a lot could go wrong here, so rather than put in a bunch of code
// to check for nulls all down the chain let's do it the simple
// yet bulletproof way
try {
return ((SocketChannel) sockKey.channel()).socket()
.getRemoteSocketAddress();
} catch (NullPointerException e) {
return null;
}
}
/**
* Returns the address of the endpoint this socket is connected to, or
* <code>null</code> if it is unconnected.
* @return a <code>SocketAddress</code> reprensenting the remote endpoint of this
*         socket, or <code>null</code> if it is not connected yet.
* @see #getInetAddress()
* @see #getPort()
* @see #connect(SocketAddress, int)
* @see #connect(SocketAddress)
* @since 1.4
*/
public SocketAddress getRemoteSocketAddress() {
if (!isConnected())
return null;
return new InetSocketAddress(getInetAddress(), getPort());
}

所以,现在基本就可以定位问题了,如果服务器端非正常关闭socket连接(例如容灾演习的时候把机房网络切断),那么getRemoteSocketAddress这个方法就会返回null了,也就是日志中为什么出现null的原因了。

问题解决

这个日志输出对于开发人员来说非常重要,在排查问题过程中可以清楚的定位当时是哪台服务器出现问题,但是这里一旦输出null,那么将无从下手。这里我做了一些改进,确保出现问题的时候,客户端能够输出当前出现问题的服务器IP。在这里下载补丁:https://github.com/downloads/nileader/taokeeper/getCurrentZooKeeperAddr_for_3.4.3.patch

首先是给org.apache.zookeeper.client.HostProvider类添加两个接口,分别用于获取“当前地址列中正在使用的地址序号”和获取“所有地址列表”。关于ZooKeeper客户端地址列表获取和随机原理,具体可以查看这个文章《ZooKeeper客户端地址列表的随机原理》。

public interface HostProvider {
…… ……
/**
* Get current index that is connecting or connected.
* @see ZOOKEEPER-1480:https://issues.apache.org/jira/browse/ZOOKEEPER-1480
* */
public int getCurrentIndex();
/**
* Get all server address that config when use zookeeper client.
* @return List
* @see ZOOKEEPER-1480:https://issues.apache.org/jira/browse/ZOOKEEPER-1480
*/
public List<InetSocketAddress> getAllServerAddress();

}

其次是修改org.apache.zookeeper.ClientCnxn类中日志输出逻辑:

/**
* Get current zookeeper addr that client is connected or connecting.<br>
* Note:The method will return null if can't not get host ip.
* */
private InetSocketAddress getCurrentZooKeeperAddr(){
try {
InetSocketAddress addr = null;
if( null == hostProvider || null == hostProvider.getAllServerAddress() )
return addr;
int index = hostProvider.getCurrentIndex();
if ( index >= 0  ) {
addr = hostProvider.getAllServerAddress().get( index );
}
return addr;
} catch ( Exception e ) {
return null;
}
}
…… ……
//get current ZK host to log
InetSocketAddress addr = getCurrentZooKeeperAddr();

LOG.warn(
"Session 0x"
+ Long.toHexString(getSessionId())
+ " for server ip: " + addr + ", detail conn: "
+ clientCnxnSocket.getRemoteSocketAddress()
+ ", unexpected error"
+ RETRY_CONN_MSG, e);


本文出自 “ni掌柜的IT专栏” 博客,请务必保留此出处http://nileader.blog.51cto.com/1381108/1049470
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐