hadoop SNN故障解决
2013-10-28 17:33
232 查看
集群上线一阵子了。发现SNN日志有如下问题:
同时发现,SNN的checkpoint目录为空。
解决办法:
设置hdfs-site.xml
SNN启动后需要向NN请求相关image文件,但未配置导致。
配置完毕后,查看SNN日志,发现:
因为SNN已经断掉很久,刚起来的时候这个checkpoint位置会有问题。
过一会,日志恢复正常。相应目录也有了editlog等同步文件。
2013-10-2816:38:16,280INFOorg.apache.hadoop.hdfs.server.namenode.TransferFsImage:Openingconnectiontohttp://0.0.0.0:50070/getimage?getimage=12013-10-2816:38:16,281ERRORorg.apache.hadoop.security.UserGroupInformation:PriviledgedActionExceptionas:hdfscause:java.net.ConnectException:Connectionrefused 2013-10-2816:38:16,281ERRORorg.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:ExceptionindoCheckpoint: 2013-10-2816:38:16,281ERRORorg.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:java.net.ConnectException:Connectionrefused atjava.net.PlainSocketImpl.socketConnect(NativeMethod) atjava.net.AbstractPlainSocketImpl.doConnect(UnknownSource) atjava.net.AbstractPlainSocketImpl.connectToAddress(UnknownSource) atjava.net.AbstractPlainSocketImpl.connect(UnknownSource) atjava.net.SocksSocketImpl.connect(UnknownSource) atjava.net.Socket.connect(UnknownSource) atjava.net.Socket.connect(UnknownSource) atsun.net.NetworkClient.doConnect(UnknownSource) atsun.net.www.http.HttpClient.openServer(UnknownSource) atsun.net.www.http.HttpClient.openServer(UnknownSource) atsun.net.www.http.HttpClient.<init>(UnknownSource) atsun.net.www.http.HttpClient.New(UnknownSource) atsun.net.www.http.HttpClient.New(UnknownSource) atsun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(UnknownSource) atsun.net.www.protocol.http.HttpURLConnection.plainConnect(UnknownSource) atsun.net.www.protocol.http.HttpURLConnection.connect(UnknownSource) atsun.net.www.protocol.http.HttpURLConnection.getInputStream(UnknownSource) atorg.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:172) atorg.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$4.run(SecondaryNameNode.java:404) atorg.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$4.run(SecondaryNameNode.java:393) atjava.security.AccessController.doPrivileged(NativeMethod) atjavax.security.auth.Subject.doAs(UnknownSource) atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) atorg.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.downloadCheckpointFiles(SecondaryNameNode.java:393) atorg.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:494) atorg.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:369) atorg.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:332) atjava.lang.Thread.run(UnknownSource)
同时发现,SNN的checkpoint目录为空。
解决办法:
设置hdfs-site.xml
<property> <name>dfs.http.address</name> <value>192.168.1.81:50070</value> </property>
SNN启动后需要向NN请求相关image文件,但未配置导致。
配置完毕后,查看SNN日志,发现:
2013-10-2817:01:10,777ERRORorg.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:ExceptionindoCheckpoint: 2013-10-2817:01:10,777ERRORorg.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:java.io.FileNotFoundException:http://192.168.1.81:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:797783056:0:1374228512000:1374228205935&newChecksum=06ed8d62329e47f1863a0230215c8317atsun.net.www.protocol.http.HttpURLConnection.getInputStream(UnknownSource) atorg.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:172) atorg.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.putFSImage(SecondaryNameNode.java:435) atorg.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:499) atorg.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:369) atorg.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:332) atjava.lang.Thread.run(UnknownSource)
因为SNN已经断掉很久,刚起来的时候这个checkpoint位置会有问题。
过一会,日志恢复正常。相应目录也有了editlog等同步文件。
2013-10-2817:06:10,858INFOorg.apache.hadoop.hdfs.server.namenode.TransferFsImage:Openingconnectiontohttp://192.168.1.81:50070/getimage?getedit=12013-10-2817:06:10,861INFOorg.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:Downloadedfileeditssize60832bytes. 2013-10-2817:06:10,861INFOorg.apache.hadoop.hdfs.util.GSet:VMtype=64-bit 2013-10-2817:06:10,861INFOorg.apache.hadoop.hdfs.util.GSet:2%maxmemory=582.5425MB 2013-10-2817:06:10,861INFOorg.apache.hadoop.hdfs.util.GSet:capacity=2^26=67108864entries 2013-10-2817:06:10,861INFOorg.apache.hadoop.hdfs.util.GSet:recommended=67108864,actual=67108864 2013-10-2817:06:11,259INFOorg.apache.hadoop.hdfs.server.namenode.FSNamesystem:fsOwner=hdfs 2013-10-2817:06:11,259INFOorg.apache.hadoop.hdfs.server.namenode.FSNamesystem:supergroup=supergroup 2013-10-2817:06:11,259INFOorg.apache.hadoop.hdfs.server.namenode.FSNamesystem:isPermissionEnabled=true 2013-10-2817:06:11,259INFOorg.apache.hadoop.hdfs.server.namenode.FSNamesystem:dfs.block.invalidate.limit=100 2013-10-2817:06:11,259WARNorg.apache.hadoop.hdfs.server.namenode.FSNamesystem:Thedfs.support.appendoptionisinyourconfiguration,howeverappendisnotsupported.Thisconfigurationoptionisnolongerrequiredtoenablesync. 2013-10-2817:06:11,259INFOorg.apache.hadoop.hdfs.server.namenode.FSNamesystem:isAccessTokenEnabled=falseaccessKeyUpdateInterval=0min(s),accessTokenLifetime=0min(s) 2013-10-2817:06:11,260INFOorg.apache.hadoop.hdfs.server.namenode.NameNode:Cachingfilenamesoccuringmorethan10times 2013-10-2817:06:11,264INFOorg.apache.hadoop.hdfs.server.common.Storage:Numberoffiles=7369 2013-10-2817:06:11,348INFOorg.apache.hadoop.hdfs.server.common.Storage:Numberoffilesunderconstruction=10 2013-10-2817:06:11,386INFOorg.apache.hadoop.hdfs.server.namenode.FSEditLog:EOFof/opt/dfs/snn/current/edits,reachedendofeditlogNumberoftransactionsfound:321.Bytesread:60832 2013-10-2817:06:11,386INFOorg.apache.hadoop.hdfs.server.common.Storage:Editsfile/opt/dfs/snn/current/editsofsize60832edits#321loadedin0seconds. 2013-10-2817:06:11,388INFOorg.apache.hadoop.hdfs.server.namenode.FSNamesystem:Numberoftransactions:0Totaltimefortransactions(ms):0NumberoftransactionsbatchedinSyncs:0Numberofsyncs:0SyncTimes(ms):0000 2013-10-2817:06:11,394INFOorg.apache.hadoop.hdfs.server.namenode.FSEditLog:closingeditlog:position=60832,editlog=/opt/dfs/snn/current/edits 2013-10-2817:06:11,394INFOorg.apache.hadoop.hdfs.server.namenode.FSEditLog:closesuccess:truncateto60832,editlog=/opt/dfs/snn/current/edits 2013-10-2817:06:11,400INFOorg.apache.hadoop.hdfs.server.namenode.FSEditLog:closingeditlog:position=60832,editlog=/data1/dfs/snn/current/edits 2013-10-2817:06:11,400INFOorg.apache.hadoop.hdfs.server.namenode.FSEditLog:closesuccess:truncateto60832,editlog=/data1/dfs/snn/current/edits 2013-10-2817:06:11,406INFOorg.apache.hadoop.hdfs.server.namenode.FSEditLog:closingeditlog:position=60832,editlog=/data2/dfs/snn/current/edits 2013-10-2817:06:11,407INFOorg.apache.hadoop.hdfs.server.namenode.FSEditLog:closesuccess:truncateto60832,editlog=/data2/dfs/snn/current/edits 2013-10-2817:06:11,412INFOorg.apache.hadoop.hdfs.server.namenode.FSEditLog:closingeditlog:position=60832,editlog=/data3/dfs/snn/current/edits 2013-10-2817:06:11,412INFOorg.apache.hadoop.hdfs.server.namenode.FSEditLog:closesuccess:truncateto60832,editlog=/data3/dfs/snn/current/edits 2013-10-2817:06:11,433INFOorg.apache.hadoop.hdfs.server.common.Storage:Imagefileofsize1162298savedin0seconds. 2013-10-2817:06:11,444INFOorg.apache.hadoop.hdfs.server.namenode.FSEditLog:closingeditlog:position=4,editlog=/opt/dfs/snn/current/edits 2013-10-2817:06:11,445INFOorg.apache.hadoop.hdfs.server.namenode.FSEditLog:closesuccess:truncateto4,editlog=/opt/dfs/snn/current/edits
还需要在NN里补上这个配置,这样双方就呼应起来了。
<property>
<name>dfs.secondary.http.address</name>
<value>secondarynamenode:50090</value>
</property>
相关文章推荐
- linux stat函数讲解
- Linux学习摘抄
- openstack 排错
- Linux_user&group
- linux下system函数错误返回-1 错误原因NO child processes
- nginx报错
- 4.Linux下nohup命令实现退出终端后程序继续后台运行
- 为C++代码批量添加版权信息的shell脚本
- Centos 6.3 64bit安装KVM总结
- 如何在openwrt上做开发
- 【Hadoop】完全分布式添加新节点
- bash编程之算术运算
- Tomcat服务器的配置与运行
- hadoop安全模式理解
- Nginx架构和代码风格
- Linux日志常用指令
- Linux统计行数方法及效率测试
- redhat Linux5 安装vsftp .
- Linux free命令详解(转)
- linux分区