您的位置:首页 > 运维架构

搭建hadoop2.6.0 HDFS HA及YARN HA

2017-05-08 17:09 369 查看
最终结果:

[hadoop@h41 ~]$ jps

12723 ResourceManager

12995 Jps

12513 NameNode

12605 DFSZKFailoverController

[hadoop@h42 ~]$ jps

12137 ResourceManager

12233 Jps

12009 DFSZKFailoverController

11930 NameNode

[hadoop@h43 ~]$ jps

12196 DataNode

12322 NodeManager

12435 Jps

11965 QuorumPeerMain

12050 JournalNode

[hadoop@h44 ~]$ jps

11848 QuorumPeerMain

11939 JournalNode

12309 Jps

12156 NodeManager

12032 DataNode

[hadoop@h45 ~]$ jps

12357 Jps

11989 JournalNode

11904 QuorumPeerMain

12204 NodeManager

12080 DataNode

角色分配:

h41NameNodeDFSZKFailoverControllerResourceManager    
h42NameNodeDFSZKFailoverControllerResourceManager    
h43   NodeManagerJournalNodeQuorumPeerMainDataNode
h44   NodeManagerJournalNodeQuorumPeerMainDataNode
h45   NodeManagerJournalNodeQuorumPeerMainDataNode
说明:在hadoop2.X中通常由两个NameNode组成,一个处于active状态,另一个处于standby状态。Active NameNode对外提供服务,而Standby NameNode则不对外提供服务,仅同步active namenode的状态,以便能够在它失败时快速进行切换。

hadoop2.0官方提供了两种HDFS HA的解决方案,一种是NFS,另一种是QJM(由cloudra提出,原理类似zookeeper)。这里我使用QJM完成。主备NameNode之间通过一组JournalNode同步元数据信息,一条数据只要成功写入多数JournalNode即认为写入成功。通常配置奇数个JournalNode

虚拟机安装linux系统,我这里用的是Redhat5.5 32位(老师说过装hadoop2版本得64位操作系统,但在实践中用32位装也好使,至少目前没有遇到什么问题)

一、准备环境:

关闭防火墙和selinux(所有虚拟机)

service iptables stop

chkconfig iptables off(设置自动启动为关闭)

setenforce 0

vi /etc/selinux/config

SELINUX=disabled

配置主机名和hosts(所有虚拟机)

vi /etc/sysconfig/network

修改HOSTNAME为HOSTNAME=h41~h45

vi /etc/hosts(上面那步没有好像也不影响什么,但是这步必须有,所有虚拟机,都可以把初始内容都删掉后添加如下内容)

192.168.8.41    h41

192.168.8.42    h42

192.168.8.43    h43

192.168.8.44    h44

192.168.8.45    h45

所有机器同步时间

ntpdate 202.120.2.101(这种方法我没有成功,好像得先yum安装ntpdate并且可以联网,参考文章:https://my.oschina.net/myaniu/blog/182959和http://www.cnblogs.com/liuyou/archive/2012/07/29/2614330.html和http://blog.csdn.net/lixianlin/article/details/7045321)

这里我用了最笨的方法:所有虚拟机都在root用户下执行date -s "2017-05-05 12:00:00"并且重启虚拟机

创建hadoop用户和组(所有虚拟机)

groupadd hadoop

useradd -g hadoop hadoop

passwd hadoop

切换hadoop用户(所有虚拟机)

su - hadoop

配置密钥验证免密码登录[所有虚拟机都要做一遍]

h41:

ssh-keygen -t rsa -P ''

cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

chmod 700 ~/.ssh/

chmod 600 ~/.ssh/authorized_keys

ssh-copy-id -i $HOME/.ssh/id_rsa.pub hadoop@h42

ssh-copy-id -i $HOME/.ssh/id_rsa.pub hadoop@h43

ssh-copy-id -i $HOME/.ssh/id_rsa.pub hadoop@h44

ssh-copy-id -i $HOME/.ssh/id_rsa.pub hadoop@h45

h42:

ssh-keygen -t rsa -P ''

cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

chmod 700 ~/.ssh/

chmod 600 ~/.ssh/authorized_keys

ssh-copy-id -i $HOME/.ssh/id_rsa.pub hadoop@h41

ssh-copy-id -i $HOME/.ssh/id_rsa.pub hadoop@h43

ssh-copy-id -i $HOME/.ssh/id_rsa.pub hadoop@h44

ssh-copy-id -i $HOME/.ssh/id_rsa.pub hadoop@h45

。。。。。。。。。(h43,h44,h45重复以上步骤)

验证:ssh 'hadoop@h42' (h42~h45可以远程登录其他虚拟机)

创建备用目录

mkdir -pv /home/hadoop/storage/hadoop/tmp

mkdir -pv /home/hadoop/storage/hadoop/name

mkdir -pv /home/hadoop/storage/hadoop/data

mkdir -pv /home/hadoop/storage/hadoop/journal

mkdir -pv /home/hadoop/storage/yarn/local

mkdir -pv /home/hadoop/storage/yarn/logs

mkdir -pv /home/hadoop/storage/hbase

mkdir -pv /home/hadoop/storage/zookeeper/data

mkdir -pv /home/hadoop/storage/zookeeper/logs

scp -r /home/hadoop/storage h42:/home/hadoop/

scp -r /home/hadoop/storage h43:/home/hadoop/

scp -r /home/hadoop/storage h44:/home/hadoop/

scp -r /home/hadoop/storage h45:/home/hadoop/

安装jdk1.7和hadoop并配置环境变量,可以配置全局的(修改/etc/profile)也可以配置当前用户的(修改~/.bashrc文件),这里我配置是当前用户的环境变量(包括hbase和hive的我也配了,虽然目前用不上)

h41切换root用户

安装jdk

[root@h41 usr]# tar -zxvf jdk-7u25-linux-i586.tar.gz

[root@h41 usr]# scp -r /usr/jdk1.7.0_25/ h42:/usr/
(这些步骤需要root用户密码。。。)

[root@h41 usr]# scp -r /usr/jdk1.7.0_25/ h43:/usr/

[root@h41 usr]# scp -r /usr/jdk1.7.0_25/ h44:/usr/

[root@h41 usr]# scp -r /usr/jdk1.7.0_25/ h45:/usr/

h41再切换回hadoop用户

vi ~/.bashrc
export JAVA_HOME=/usr/jdk1.7.0_25
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
##java
export HADOOP_HOME=/home/hadoop/hadoop
export HIVE_HOME=/home/hadoop/hive
export HBASE_HOME=/home/hadoop/hbase
##hadoop hbase hive
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
export HADOOP_YARN_HOME=${HADOOP_HOME}
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HDFS_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$HIVE_HOME/bin


scp ~/.bashrc h42:~/.bashrc

scp ~/.bashrc h43:~/.bashrc

scp ~/.bashrc h44:~/.bashrc

scp ~/.bashrc h45:~/.bashrc

使环境变量生效,并且所有虚拟机都执行一遍,否则的话jps命令不好使,并最终导致zookeeper无法启动成功

[hadoop@h41 ~]$ source ~/.bashrc

[hadoop@h42 ~]$ source ~/.bashrc

[hadoop@h43 ~]$ source ~/.bashrc

[hadoop@h44 ~]$ source ~/.bashrc

[hadoop@h45 ~]$ source ~/.bashrc

二、部署hadoop-2.6.0的namenoe HA、resource manager HA

解压、改名

tar -zxvf hadoop-2.6.0.tar.gz -C /home/hadoop

cd /home/hadoop

mv hadoop-2.6.0 hadoop

配置hadoop环境变量[准备环境时已做,略]

验证hadoop安装成功

hadoop version

修改hadoop配置文件

vi /home/hadoop/hadoop/etc/hadoop/core-site.xml
添加:

<!-- 指定hdfs的nameservice为gagcluster,是NameNode的URI。hdfs://主机名:端口/ -->
(这里被http://www.it610.com/article/3334284.htm的博主死坑了一把,原文章中这里写的是<value>hdfs://gagcluster:9000</value>正确的写法应该是把端口去掉,这样写<value>hdfs://gagcluster</value>,否则在搭建完成之后在执行hadoop fs -mkdir /input的时候却报错:
mkdir: Port 9000 specified in URI hdfs://gagcluster:9000 but host 'gagcluster' is a logical (HA) namenode and does not use port information.)
<property>
<name>fs.defaultFS</name>
<value>hdfs://gagcluster</value>
</property>

<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>

<!-- 指定hadoop临时目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/storage/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>

<!--指定可以在任何IP访问-->
<property>
<name>hadoop.proxyuser.hduser.hosts</name>
<value>*</value>
</property>

<!--指定所有用户可以访问-->
<property>
<name>hadoop.proxyuser.hduser.groups</name>
<value>*</value>
</property>

<!-- 指定zookeeper地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>h43:2181,h44:2181,h45:2181</value>
</property>
vi /home/hadoop/hadoop/etc/hadoop/hdfs-site.xml

添加:

<!--节点黑名单列表文件,用于下线hadoop节点 -->
<property>
<name>dfs.hosts.exclude</name>
<value>/home/hadoop/hadoop/etc/hadoop/exclude</value>
</property>

<!--指定hdfs的block大小64M -->
<property>
<name>dfs.block.size</name>
<value>67108864</value>
</property>

<!--指定hdfs的nameservice为gagcluster,需要和core-site.xml中的保持一致 -->
<property>
<name>dfs.nameservices</name>
<value>gagcluster</value>
</property>

<!-- gagcluster下面有两个NameNode,分别是nn1,nn2 -->
<property>
<name>dfs.ha.namenodes.gagcluster</name>
<value>nn1,nn2</value>
</property>

<!-- nn1的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.gagcluster.nn1</name>
<value>h41:9000</value>
</property>

<!-- nn1的http通信地址 -->
<property>
<name>dfs.namenode.http-address.gagcluster.nn1</name>
<value>h41:50070</value>
</property>

<!-- nn2的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.gagcluster.nn2</name>
<value>h42:9000</value>
</property>

<!-- nn2的http通信地址 -->
<property>
<name>dfs.namenode.http-address.gagcluster.nn2</name>
<value>h42:50070</value>
</property>

<!-- 指定NameNode的元数据在JournalNode上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://h43:8485;h44:8485;h45:8485/gagcluster</value>
</property>

<!-- 配置失败自动切换实现方式 -->
<property>
<name>dfs.client.failover.proxy.provider.gagcluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>

<!-- 配置隔离机制 -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>

<!-- 使用隔离机制时需要ssh免密码登陆 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>

<!-- 指定NameNode的元数据在JournalNode上的存放位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop/storage/hadoop/journal</value>
</property>

<!--指定支持高可用自动切换机制-->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>

<!--指定namenode名称空间的存储地址-->
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/storage/hadoop/name</value>
</property>

<!--指定datanode数据存储地址-->
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/storage/hadoop/data</value>
</property>

<!--指定数据冗余份数-->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>

<!--指定可以通过web访问hdfs目录-->
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>

<!--保证数据恢复 -->
<property>
<name>dfs.journalnode.http-address</name>
<value>0.0.0.0:8480</value>
</property>

<property>
<name>dfs.journalnode.rpc-address</name>
<value>0.0.0.0:8485</value>
</property>

<property>
<name>ha.zookeeper.quorum</name>
<value>h43:2181,h44:2181,h45:2181</value>
</property>
vi /home/hadoop/hadoop/etc/hadoop/mapred-site.xml

添加:

<configuration>
<!-- 配置MapReduce运行于yarn中 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

<!-- 配置 MapReduce JobHistory Server 地址 ,默认端口10020 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>0.0.0.0:10020</value>
</property>

<!-- 配置 MapReduce JobHistory Server web ui 地址, 默认端口19888 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>0.0.0.0:19888</value>
</property>
</configuration>
vi /home/hadoop/hadoop/etc/hadoop/yarn-site.xml

添加:
<!--日志聚合功能-->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>

<!--在HDFS上聚合的日志最长保留多少秒。3天-->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>259200</value>
</property>

<!--rm失联后重新链接的时间-->
<property>
<name>yarn.resourcemanager.connect.retry-interval.ms</name>
<value>2000</value>
</property>

<!--开启resource manager HA,默认为false-->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>

<!--配置resource manager -->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>

<property>
<name>ha.zookeeper.quorum</name>
<value>h43:2181,h44:2181,h45:2181</value>
</property>

<!--开启故障自动切换-->
<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>

<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>h41</value>
</property>

<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>h42</value>
</property>

<!--在namenode1上配置rm1,在namenode2上配置rm2,注意:一般都喜欢把配置好的文件远程复制到其它机器上,但这个在YARN的另一个机器上一定要修改-->
<property>
<name>yarn.resourcemanager.ha.id</name>
<value>rm1</value>
<description>If we want to launch more than one RM in single node, we need this configuration</description>
</property>

<!--开启自动恢复功能-->
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>

<!--配置与zookeeper的连接地址-->
<property>
<name>yarn.resourcemanager.zk-state-store.address</name>
<value>h43:2181,h44:2181,h45:2181</value>
</property>

<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>

<property>
<name>yarn.resourcemanager.zk-address</name>
<value>h43:2181,h44:2181,h45:2181</value>
</property>

<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>gagcluster-yarn</value>
</property>

<!--schelduler失联等待连接时间-->
<property>
<name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
<value>5000</value>
</property>

<!--配置rm1-->
<property>
<name>yarn.resourcemanager.address.rm1</name>
<value>h41:8132</value>
</property>

<property>
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>h41:8130</value>
</property>

<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>h41:8188</value>
</property>

<property>
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
<value>h41:8131</value>
</property>

<property>
<name>yarn.resourcemanager.admin.address.rm1</name>
<value>h41:8033</value>
</property>

<property>
<name>yarn.resourcemanager.ha.admin.address.rm1</name>
<value>h41:23142</value>
</property>

<!--配置rm2-->
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>h42:8132</value>
</property>

<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>h42:8130</value>
</property>

<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>h42:8188</value>
</property>

<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
<value>h42:8131</value>
</property>

<property>
<name>yarn.resourcemanager.admin.address.rm2</name>
<value>h42:8033</value>
</property>

<property>
<name>yarn.resourcemanager.ha.admin.address.rm2</name>
<value>h42:23142</value>
</property>

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/home/hadoop/storage/yarn/local</value>
</property>

<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/home/hadoop/storage/yarn/logs</value>
</property>

<property>
<name>mapreduce.shuffle.port</name>
<value>23080</value>
</property>

<!--故障处理类-->
<property>
<name>yarn.client.failover-proxy-provider</name>
<value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value>
</property>

<property>
<name>yarn.resourcemanager.ha.automatic-failover.zk-base-path</name>
<value>/yarn-leader-election</value>
<description>Optional setting. The default value is /yarn-leader-election</description>
</property>


配置DataNode节点

vi /home/hadoop/hadoop/etc/hadoop/slaves

h43
h44
h45


创建exclude文件,用于以后下线hadoop节点

touch /home/hadoop/hadoop/etc/hadoop/exclude

同步hadoop工程到h42~45机器上面

scp -r /home/hadoop/hadoop h42:/home/hadoop/

scp -r /home/hadoop/hadoop h43:/home/hadoop/

scp -r /home/hadoop/hadoop h44:/home/hadoop/

scp -r /home/hadoop/hadoop h45:/home/hadoop/

修改nn2(h42)配置文件yarn-site.xml

修改这一处为:
<property>
<name>yarn.resourcemanager.ha.id</name>
<value>rm2</value>
<description>If we want to launch more than one RM in single node, we need this configuration</description>
</property>


三、部署zookeeper3.4.5三节点完全分布式集群

使用三台服务器安装zookeeper,安装在hadoop用户上(zookeeper最好是奇数安装,其实我这五台机器任意三台安装都可以,我这里选择了h43,h44,h45这三台虚拟机来安装)

h43 192.168.8.43

h44 192.168.8.44

h45 192.168.8.45

解压、改名(在h43上)

tar xf zookeeper-3.4.5.tar.gz -C /home/hadoop/

mv /home/hadoop/zookeeper-3.4.5/ /home/hadoop/zookeeper

cd /home/hadoop/zookeeper

修改配置文件

vi /home/hadoop/zookeeper/conf/zoo.cfg
tickTime=2000
initLimit=5
syncLimit=2
dataDir=/home/hadoop/storage/zookeeper/data
dataLogDir=/home/hadoop/storage/zookeeper/logs
clientPort=2181
server.1=h43:2888:3888
server.2=h44:2888:3888
server.3=h45:2888:3888


同步到h44、h45节点

scp -r /home/hadoop/zookeeper h44:/home/hadoop

scp -r /home/hadoop/zookeeper h45:/home/hadoop

创建zookeeper的数据文件和日志存放目录[准备环境已做,此步骤略]

h43~45分别编辑myid值 

echo 1 > /home/hadoop/storage/zookeeper/data/myid

echo 2 > /home/hadoop/storage/zookeeper/data/myid

echo 3 > /home/hadoop/storage/zookeeper/data/myid

###########################################################################################
Hadoop集群首次启动过程

###########################################################################################

 1.如果zookeeper集群还没有启动的话, 首先把各个zookeeper起来。

/home/hadoop/zookeeper/bin/zkServer.sh start    (记住所有的zookeeper机器都要启动)

/home/hadoop/zookeeper/bin/zkServer.sh status (1个leader,n-1个follower)

输入jps,会显示启动进程:QuorumPeerMain

2.、然后在主namenode节点(h41)执行如下命令,创建命名空间

/home/hadoop/hadoop/bin/hdfs zkfc -formatZK

3、在h43,h44,h45节点用如下命令启日志程序

/home/hadoop/hadoop/sbin/hadoop-daemon.sh start journalnode

4、在主namenode节点用./bin/hadoop namenode -format格式化namenode和journalnode目录

/home/hadoop/hadoop/bin/hadoop namenode -format

验证成功

在zookeeper节点执行

/home/hadoop/zookeeper/bin/zkCli.sh

[zk: localhost:2181(CONNECTED) 0] ls /

[hadoop-ha, zookeeper]

[zk: localhost:2181(CONNECTED) 1] ls /hadoop-ha 

[gagcluster]

[zk: localhost:2181(CONNECTED) 2] quit

5、在主namenode节点启动namenode进程

/home/hadoop/hadoop/sbin/hadoop-daemon.sh start namenode

6、在备namenode节点执行第一行命令,把备namenode节点的目录格式化并把元数据从主namenode节点copy过来,并且这个命令不会把journalnode目录再格式化了!然后用第二个命令启动备namenode进程!

/home/hadoop/hadoop/bin/hdfs namenode -bootstrapStandby【或者直接scp -r /home/hadoop/storage/hadoop/name h42:/home/hadoop/storage/hadoop】

/home/hadoop/hadoop/sbin/hadoop-daemon.sh start namenode

7、在两个namenode节点都执行以下命令

/home/hadoop/hadoop/sbin/hadoop-daemon.sh start zkfc

8、启动datanode

方法一、

在所有datanode节点都执行以下命令启动datanode(我在h43上执行后h43,h44,h45的DataNode就都启动了)

/home/hadoop/hadoop/sbin/hadoop-daemons.sh start datanode

方法二、

启动datanode节点多的时候,可以直接在主NameNode(nn1)上执行如下命令一次性启动所有datanode

/home/hadoop/hadoop/sbin/hadoop-daemons.sh start datanode

9. 启动YARN(在namenode1和namenode2上执行)

/home/hadoop/hadoop/sbin/start-yarn.sh

注意:

在namenode2上执行此命令时会提示NodeManager已存在等信息不用管这些,主要是启动namenode2上的resourceManager完成与namenode1的互备作用,目前没有找到单独启动resourceManager的方法

启动完成之后可以在浏览器中输入http://192.168.8.41:50070和http://192.168.8.42:50070查看namenode分别为Standby和Active。

在namenode1上执行${HADOOP_HOME}/bin/yarn rmadmin -getServiceState rm1查看rm1和rm2分别为active和standby状态,也可以通过浏览器访问http://192.168.8.41:8188查看状态

验证YARN:

然后我想运行一个mr小程序:

(详情请看我的另一篇文章:新装的hadoop2版本无法运行mapreduce的解决方法

可以在h41,h42上成功运行。

[hadoop@h41 ~]$ hadoop fs -cat /output/part-00000

hadoop  1

hello   3

hive    1

world   1

[hadoop@h42 ~]$ ${HADOOP_HOME}/bin/yarn rmadmin -getServiceState rm1

active

[hadoop@h42 ~]$ ${HADOOP_HOME}/bin/yarn rmadmin -getServiceState rm2

standby

[hadoop@h41 ~]$ jps

12723 ResourceManager

14752 Jps

12513 NameNode

12605 DFSZKFailoverController

[hadoop@h41 ~]$ kill -9 12723

[hadoop@h42 ~]$ ${HADOOP_HOME}/bin/yarn rmadmin -getServiceState rm1

17/05/05 16:03:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/05/05 16:03:22 INFO ipc.Client: Retrying connect to server: h41/192.168.8.41:8033. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
Operation failed: Call From h42/192.168.8.42 to h41:8033 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused[/code][hadoop@h42 ~]$ ${HADOOP_HOME}/bin/yarn rmadmin -getServiceState rm2

active

手动启动那个挂掉的ResourceManager

[hadoop@h42 ~]$ ${HADOOP_HOME}/bin/yarn rmadmin -getServiceState rm1

standby

验证HDFS HA:

然后再kill -9掉active的NameNode

通过浏览器访问:http://192.168.8.41:50070

这个时候h41上的NameNode变成了active 

在执行命令:hadoop fs -cat /output/part-00000

刚才上传的文件依然存在!!! 

手动启动那个挂掉的NameNode 

/home/hadoop/hadoop/sbin/hadoop-daemon.sh start namenode

通过浏览器访问:http://192.168.8.42:50070 

NameNode ‘h42’ (standby)

参考博客:
http://www.aboutyun.com/forum.php?mod=viewthread&tid=10572&highlight=hadoop%2B%2B%2B6 http://www.it610.com/article/3334284.htm
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: