您的位置:首页 > 大数据 > 人工智能

Hadoop 2.2.0 HDFS HA(Automatic Failover)搭建

2014-02-16 11:51 435 查看
因为在工作中没有使用hadoop1,虽然前2年自己也搭建过hadoop1,但是理解不深,这次搭建hadoop2的hdfs,还是花了不少时间,文章主要用于自己归纳整理知识,如果不对的地方,欢迎指正。

HA是hadoop2的一项重大改进,解决了单点故障的问题.

hadoop2在64位机器上需要编译:http://blog.csdn.net/w13770269691/article/details/16883663

安装(这里的hdfs的配置还是hadoop1的):http://blog.csdn.net/licongcong_0224/article/details/12972889

hadoop1里面只有一个namenode,hadoop2里面是由2个namenode组成一个nameservice,向外提供服务,2个namenode之间要共享数据,目前稳定的实现方法有2种:NFS,QJM,下文的配置是基于QJM的.

Automatic Failover集群是通过zookeeper集群实现,在每个namenode上启动一个zkfc进程(本质是zookeeper客户端),监控2个NN,当一个active的NN发生问题的时候,自动将standby变成active。

参考文档:http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html

这批文档首先介绍的manual failover HA:2个NN的数据是同步的,但是active挂掉后,需要手动把standby变成active,实现的方法看这篇文章:http://www.cnblogs.com/nb591/p/3535662.html(这个的启动方法是先启动QJM,在启动一个NN,然后同步NN的数据,在启动第二个NN,然后是DN,文章里介绍的sbin/hadoop-daemons.sh
start datanode 可以一次启动所有DN,就不用一个一个启动了)

在此基础上实现Automatic Failover,首先安装一个zookeeep集群:http://blog.csdn.net/shirdrn/article/details/7183503,

下面给出具体配置文件

etc/hadoop/hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License. See accompanying LICENSE file.

-->

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

<name>dfs.namenode.name.dir</name>

<value>file:/home/dis/hadoop/dfs/name</value>

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>file:/home/dis/hadoop/dfs/data</value>

</property>

<property>

<name>dfs.replication</name>

<value>2</value>

</property>

<property>

<name>dfs.webhdfs.enabled</name>

<value>true</value>

</property>

<property>

<name>dfs.nameservices</name>

<value>ns1</value>

</property>

<property>

<name>dfs.ha.namenodes.ns1</name>

<value>nn1,nn2</value>

</property>

<property>

<name>dfs.namenode.rpc-address.ns1.nn1</name>

<value>dis1:8020</value>

</property>

<property>

<name>dfs.namenode.rpc-address.ns1.nn2</name>

<value>dis2:8020</value>

</property>

<property>

<name>dfs.namenode.http-address.ns1.nn1</name>

<value>dis1:50070</value>

</property>

<property>

<name>dfs.namenode.http-address.ns1.nn2</name>

<value>dis2:50070</value>

</property>

<property>

<name>dfs.namenode.shared.edits.dir</name>

<value>qjournal://dis1:8485;dis2:8485;dis3:8485/ns1</value>

</property>

<property>

<name>dfs.journalnode.edits.dir</name>

<value>/home/dis/hadoop/jn</value>

</property>

<property>

<name>dfs.client.failover.proxy.provider.ns1</name>

<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

</property>

<property>

<name>dfs.ha.fencing.methods</name>

<value>sshfence</value>

</property>

<property>

<name>dfs.ha.fencing.ssh.private-key-files</name>

<value>/home/dis/.ssh/id_rsa</value>

</property>

<property>

<name>dfs.ha.automatic-failover.enabled</name>

<value>true</value>

</property>

<property>

<name>ha.zookeeper.quorum</name>

<value>dis1:2181,dis2:2181,dis3:2181</value>

</property>

</configuration>

搭建好以后,jps查看相关进程,用kill -9 删除active的nn,通过bin/hdfs haadmin -DFSHAAdmin -getServiceState 可以查看以前standby的NN已经active,执行查看等操作也都正常。想要在启动kill掉的namenode,用sbin/hadoop-daemon.sh start namenode

最后呢,还是觉得官网的文档是最值得看的,但是这个文档感觉是给用过hadoop1的人看的,直接学习2的同学还是得花些时间了解一个hadoop的基本知识
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: