Hadoop series :The solution about nameNode single point failure
2009-07-28 17:50
387 查看
Requirements
1. Two nodes to satisfy availability requirements. 2. High availability for internal components of each node 3. Redundant network architecture. 4. Replication of namenode metadata. 5. Automatic fail over with no human action required.分析一下hadoop nameNode之间切换的可行性
1. NameNode 作为master server,维持自己的mateData ,在hadoop的name.dir 项目的配置中,可以设置多个路径,NameNode将自动同步这多条路径的数据。
2.NameNode 的全部持久化数据来自配置文件和name.dir 目录,只要保证这两者一样,nameNode 的status也完全一样
具体的解决方案
使用zookeeper管理NameNodede之间的切换
4台Zookeeper client 可以看做4个znode,在Zookeeper server上,同一时间只存在一个注册的znode ,即同一时间仅存在一台正常工作的nameNode。其他Zookeeper client 发现已有znode注册,则将自己注册进程阻塞。Zookeeper client 设置定时器,每隔一段时间轮询一次,判断本机的nameNode是否正常工作,如果发现nameNode crash,则向Zookeeper server 发出请求 删除自己注册的znode。Zookeeper server 接收到请求以后,删除现有的znode,并广播所有的Zookeeper client ,唤醒其阻塞进程。此时,剩余Zookeeper client 竞争注册自己的znode,竞争成功的则替代原来的nameNode,没有成功的继续阻塞。
dataNode在虚拟ip层下,其上的变化对于dataNode是透明的。
相关文章推荐
- 使用ZooKeeper解决Hadoop单点故障(single point of failure)
- ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint
- restore hadoop primary namenode from secondary namenode checkpoint step by step
- Could not find the main class: org.apache.hadoop.hdfs.server.namenode.NameNode.解决方法
- Recover Hadoop NameNode Failure
- bug:ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint
- NameNode Recovery Tools for the Hadoop Distributed File System
- hadoop错误FATAL org.apache.hadoop.hdfs.server.namenode.NameNode Exception in namenode join java.io.IOException There appears to be a gap in the edit log
- hadoop集群,突然断电后再次登录集群发现主结点连接各个datanode结点时出现:Agent admitted failure to sign using the key
- NameNode Recovery Tools for the Hadoop Distributed
- 通过tarball形式安装HBASE Cluster(CDH5.0.2)——Hadoop NameNode HA 切换引起的Hbase错误,以及Hbase如何基于NameNode的HA进行配置
- Hadoop "Cannot create directory .Name node is in safe mode."解决方案
- The node about the project of DJ
- 遇到问题---Hadoop---java.io.IOException: NameNode is not formatted
- hadoop安全模式(rm: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /sort. Name )
- Hadoop 解除 NameNode is in safe mode
- CentOS hadoop配置错误Incorrect configuration: namenode address dfs.namenode.servicerpc-address ...
- Hadoop 解除 "Name node is in safe mode"(转)
- SharePoint Issue: Unable to oben the Site (Object null reference Error) Solution
- 发邮件遇到 Failure sending mail.The remote name could not be resolved: 'www.youdomain.com' 问题的解决方法