您的位置:首页 > 运维架构

hadoop SNN故障解决

2013-10-28 17:33 232 查看
集群上线一阵子了。发现SNN日志有如下问题:

2013-10-2816:38:16,280INFOorg.apache.hadoop.hdfs.server.namenode.TransferFsImage:Openingconnectiontohttp://0.0.0.0:50070/getimage?getimage=12013-10-2816:38:16,281ERRORorg.apache.hadoop.security.UserGroupInformation:PriviledgedActionExceptionas:hdfscause:java.net.ConnectException:Connectionrefused
2013-10-2816:38:16,281ERRORorg.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:ExceptionindoCheckpoint:
2013-10-2816:38:16,281ERRORorg.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:java.net.ConnectException:Connectionrefused
atjava.net.PlainSocketImpl.socketConnect(NativeMethod)
atjava.net.AbstractPlainSocketImpl.doConnect(UnknownSource)
atjava.net.AbstractPlainSocketImpl.connectToAddress(UnknownSource)
atjava.net.AbstractPlainSocketImpl.connect(UnknownSource)
atjava.net.SocksSocketImpl.connect(UnknownSource)
atjava.net.Socket.connect(UnknownSource)
atjava.net.Socket.connect(UnknownSource)
atsun.net.NetworkClient.doConnect(UnknownSource)
atsun.net.www.http.HttpClient.openServer(UnknownSource)
atsun.net.www.http.HttpClient.openServer(UnknownSource)
atsun.net.www.http.HttpClient.<init>(UnknownSource)
atsun.net.www.http.HttpClient.New(UnknownSource)
atsun.net.www.http.HttpClient.New(UnknownSource)
atsun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(UnknownSource)
atsun.net.www.protocol.http.HttpURLConnection.plainConnect(UnknownSource)
atsun.net.www.protocol.http.HttpURLConnection.connect(UnknownSource)
atsun.net.www.protocol.http.HttpURLConnection.getInputStream(UnknownSource)
atorg.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:172)
atorg.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$4.run(SecondaryNameNode.java:404)
atorg.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$4.run(SecondaryNameNode.java:393)
atjava.security.AccessController.doPrivileged(NativeMethod)
atjavax.security.auth.Subject.doAs(UnknownSource)
atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
atorg.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.downloadCheckpointFiles(SecondaryNameNode.java:393)
atorg.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:494)
atorg.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:369)
atorg.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:332)
atjava.lang.Thread.run(UnknownSource)


同时发现,SNN的checkpoint目录为空。

解决办法:

设置hdfs-site.xml

<property>
<name>dfs.http.address</name>
<value>192.168.1.81:50070</value>
</property>

SNN启动后需要向NN请求相关image文件,但未配置导致。

配置完毕后,查看SNN日志,发现:

2013-10-2817:01:10,777ERRORorg.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:ExceptionindoCheckpoint:
2013-10-2817:01:10,777ERRORorg.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:java.io.FileNotFoundException:http://192.168.1.81:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:797783056:0:1374228512000:1374228205935&newChecksum=06ed8d62329e47f1863a0230215c8317atsun.net.www.protocol.http.HttpURLConnection.getInputStream(UnknownSource)
atorg.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:172)
atorg.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.putFSImage(SecondaryNameNode.java:435)
atorg.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:499)
atorg.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:369)
atorg.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:332)
atjava.lang.Thread.run(UnknownSource)


因为SNN已经断掉很久,刚起来的时候这个checkpoint位置会有问题。

过一会,日志恢复正常。相应目录也有了editlog等同步文件。

2013-10-2817:06:10,858INFOorg.apache.hadoop.hdfs.server.namenode.TransferFsImage:Openingconnectiontohttp://192.168.1.81:50070/getimage?getedit=12013-10-2817:06:10,861INFOorg.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:Downloadedfileeditssize60832bytes.
2013-10-2817:06:10,861INFOorg.apache.hadoop.hdfs.util.GSet:VMtype=64-bit
2013-10-2817:06:10,861INFOorg.apache.hadoop.hdfs.util.GSet:2%maxmemory=582.5425MB
2013-10-2817:06:10,861INFOorg.apache.hadoop.hdfs.util.GSet:capacity=2^26=67108864entries
2013-10-2817:06:10,861INFOorg.apache.hadoop.hdfs.util.GSet:recommended=67108864,actual=67108864
2013-10-2817:06:11,259INFOorg.apache.hadoop.hdfs.server.namenode.FSNamesystem:fsOwner=hdfs
2013-10-2817:06:11,259INFOorg.apache.hadoop.hdfs.server.namenode.FSNamesystem:supergroup=supergroup
2013-10-2817:06:11,259INFOorg.apache.hadoop.hdfs.server.namenode.FSNamesystem:isPermissionEnabled=true
2013-10-2817:06:11,259INFOorg.apache.hadoop.hdfs.server.namenode.FSNamesystem:dfs.block.invalidate.limit=100
2013-10-2817:06:11,259WARNorg.apache.hadoop.hdfs.server.namenode.FSNamesystem:Thedfs.support.appendoptionisinyourconfiguration,howeverappendisnotsupported.Thisconfigurationoptionisnolongerrequiredtoenablesync.
2013-10-2817:06:11,259INFOorg.apache.hadoop.hdfs.server.namenode.FSNamesystem:isAccessTokenEnabled=falseaccessKeyUpdateInterval=0min(s),accessTokenLifetime=0min(s)
2013-10-2817:06:11,260INFOorg.apache.hadoop.hdfs.server.namenode.NameNode:Cachingfilenamesoccuringmorethan10times
2013-10-2817:06:11,264INFOorg.apache.hadoop.hdfs.server.common.Storage:Numberoffiles=7369
2013-10-2817:06:11,348INFOorg.apache.hadoop.hdfs.server.common.Storage:Numberoffilesunderconstruction=10
2013-10-2817:06:11,386INFOorg.apache.hadoop.hdfs.server.namenode.FSEditLog:EOFof/opt/dfs/snn/current/edits,reachedendofeditlogNumberoftransactionsfound:321.Bytesread:60832
2013-10-2817:06:11,386INFOorg.apache.hadoop.hdfs.server.common.Storage:Editsfile/opt/dfs/snn/current/editsofsize60832edits#321loadedin0seconds.
2013-10-2817:06:11,388INFOorg.apache.hadoop.hdfs.server.namenode.FSNamesystem:Numberoftransactions:0Totaltimefortransactions(ms):0NumberoftransactionsbatchedinSyncs:0Numberofsyncs:0SyncTimes(ms):0000
2013-10-2817:06:11,394INFOorg.apache.hadoop.hdfs.server.namenode.FSEditLog:closingeditlog:position=60832,editlog=/opt/dfs/snn/current/edits
2013-10-2817:06:11,394INFOorg.apache.hadoop.hdfs.server.namenode.FSEditLog:closesuccess:truncateto60832,editlog=/opt/dfs/snn/current/edits
2013-10-2817:06:11,400INFOorg.apache.hadoop.hdfs.server.namenode.FSEditLog:closingeditlog:position=60832,editlog=/data1/dfs/snn/current/edits
2013-10-2817:06:11,400INFOorg.apache.hadoop.hdfs.server.namenode.FSEditLog:closesuccess:truncateto60832,editlog=/data1/dfs/snn/current/edits
2013-10-2817:06:11,406INFOorg.apache.hadoop.hdfs.server.namenode.FSEditLog:closingeditlog:position=60832,editlog=/data2/dfs/snn/current/edits
2013-10-2817:06:11,407INFOorg.apache.hadoop.hdfs.server.namenode.FSEditLog:closesuccess:truncateto60832,editlog=/data2/dfs/snn/current/edits
2013-10-2817:06:11,412INFOorg.apache.hadoop.hdfs.server.namenode.FSEditLog:closingeditlog:position=60832,editlog=/data3/dfs/snn/current/edits
2013-10-2817:06:11,412INFOorg.apache.hadoop.hdfs.server.namenode.FSEditLog:closesuccess:truncateto60832,editlog=/data3/dfs/snn/current/edits
2013-10-2817:06:11,433INFOorg.apache.hadoop.hdfs.server.common.Storage:Imagefileofsize1162298savedin0seconds.
2013-10-2817:06:11,444INFOorg.apache.hadoop.hdfs.server.namenode.FSEditLog:closingeditlog:position=4,editlog=/opt/dfs/snn/current/edits
2013-10-2817:06:11,445INFOorg.apache.hadoop.hdfs.server.namenode.FSEditLog:closesuccess:truncateto4,editlog=/opt/dfs/snn/current/edits


还需要在NN里补上这个配置,这样双方就呼应起来了。

<property>
<name>dfs.secondary.http.address</name>
<value>secondarynamenode:50090</value>
</property>

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: