Hadoop完全分布式搭建
2017-09-08 10:21
190 查看
1. 准备环境
此配置包括:
a.修改/etc/hosts,将各个机器的主机名列表都加到hosts里面,
b.关闭防火墙CentOS_7 防火墙操作.note
c.创建用户hadoop
d.安装适用jdk,目前都是1.8
e.修改hostname
2. 免密码登录
免密码登录实现 免密码登录,第一次登录需要各个机器间都相互登录一次,手动登录费时费力,利用脚本:
ssh_login.sh
#! /bin/bash
array[0]="hadoop.master1"
array[1]="hadoop.master2"
array[2]="hadoop.slave1"
array[3]="hadoop.slave2"
array[4]="hadoop.slave3"
array[5]="hadoop.slave4"
array[6]="hadoop.slave5"
array[7]="hive.master1"
array[8]="mysql.master1"
for data in ${array[@]}
do
/usr/bin/expect <<-EOF
set timeout -1
#这个是拷贝hadoop到各个机器上,可以将hadoop配置好再执行此步
spawn scp -r /home/hadoop/hadoop-2.7.2 hadoop@${data}:/home/hadoop/
expect {
"connecting (yes/no)" { send "yes\r"; exp_continue}
"y/n" { send "y\r"; exp_continue}
"password:" { send "123456\r"; exp_continue}
}
EOF
done
3.搭建zookeeper集群
a.解压zookeeper-3.4.6.tar.gz到目录/home/hadoop/zookeeper-3.4.6
b.编辑/home/hadoop/zookeeper-3.4.6/conf/zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/work/data/zookeeper
# the port at which the clients will connect
clientPort=2181
server.1=hadoop.master1:2888:3888
server.2=hadoop.master2:2888:3888
server.3=hadoop.slave1:2888:3888
server.4=hadoop.slave2:2888:3888
server.5=hadoop.slave3:2888:3888
server.6=hadoop.slave4:2888:3888
server.7=hadoop.slave5:2888:3888
c.赋值zookeeper到所有节点
d.添加快捷启动脚本/usr/bin/zkStart.sh
#!/bin/bash
ssh hadoop@hadoop.master1 "zkServer.sh start"
ssh hadoop@hadoop.master2 "zkServer.sh start"
ssh hadoop@hadoop.slave1 "zkServer.sh start"
ssh hadoop@hadoop.slave2 "zkServer.sh start"
ssh hadoop@hadoop.slave3 "zkServer.sh start"
ssh hadoop@hadoop.slave4 "zkServer.sh start"
ssh hadoop@hadoop.slave5 "zkServer.sh start"
e.添加快捷停止脚本/usr/bin/zkStop.sh
#!/bin/bash
ssh hadoop@hadoop.master1 "zkServer.sh stop"
ssh hadoop@hadoop.master2 "zkServer.sh stop"
ssh hadoop@hadoop.slave1 "zkServer.sh stop"
ssh hadoop@hadoop.slave2 "zkServer.sh stop"
ssh hadoop@hadoop.slave3 "zkServer.sh stop"
ssh hadoop@hadoop.slave4 "zkServer.sh stop"
ssh hadoop@hadoop.slave5 "zkServer.sh stop"
4.hadoop配置
a.解压hadoop-2.7.2.tar.gz(64位)到目录/home/hadoop/hadoop-2.7.2
b.配置hadoop用户的环境变量 .bashrc :
#set java environment
JAVA_HOME=/usr/local/java/jdk1.8.0_101
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:JAVA_HOME/lib/tools.jar
PATH=$JAVA_HOME/bin:$PATH
export JAVA_HOME CLASS_PATH
#Set hadoop
HADOOP_HOME=/home/hadoop/hadoop-2.7.2
PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
#Set zookeeper
ZK_HOME=/home/hadoop/zookeeper-3.4.6
PATH=$PATH:$ZK_HOME/bin
#Set Hive
HIVE_HOME=/home/hadoop/apache-hive-1.2.1-bin
PATH=$PATH:$HIVE_HOME/bin
export PATH
c.编辑${HADOOP_HOME}/etc/hadoop/hadoop-env.sh,增加以下配置
#java环境变量,可配置在脚本上方位置,个人觉得已经在用户环境变量配置了,这里就不用了
export JAVA_HOME=/usr/local/java/jdk1.8.0_101
#hadoop启动后的pid存放目录,默认在/tmp下,系统会定时清理/tmp下的文件,所以改了位置
export HADOOP_PID_DIR=/work/pids
d.编辑${HADOOP_HOME}/etc/hadoop/slaves 文件,将所有slaves的hostname写入
hadoop.slave1
hadoop.slave2
hadoop.slave3
hadoop.slave4
hadoop.slave5
e.编辑${HADOOP_HOME}/etc/hadoop/core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
<description>设置hdfs集群访问名称</description>
</property>
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
<description>流文件的缓冲区为4K</description>
</property>
<property>
<name>fs.trash.interval</name>
<value>4320</value>
<description>文件废弃标识设定,0为禁止此功能</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/work/data/hadoop/tmp</value>
<description>hadoop的临时目录,如果需要配置多个目录,需要逗号隔开,data目录需要我们自己创建</description>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop.master1:2181,hadoop.master2:2181,hadoop.slave1:2181,hadoop.slave2:2181,hadoop.slave3:2181,hadoop.slave4:2181,hadoop.slave5:2181</value>
<description>配置Zookeeper管理HDFS</description>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
<description>hive在访问hadoop时会用到</description>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
<description>hive在访问hadoop时会用到</description>
</property>
</configuration>
f.编辑${HADOOP_HOME}/etc/hadoop/hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///work/data/hadoop/namenode</value>
<description>单独做一个磁盘,以免以后数据量过大,导致系统启动不了</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///work/data/hadoop/datanode</value>
<description>单独做一个磁盘,以免以后数据量过大,导致系统启动不了</description>
</property>
<property>
<name>dfs.permissions</name>
<value>true</value>
<description>开启文件操作时的权限检查标识。</description>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>true</value>
<description>开启文件操作时的权限检查标识。</description>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>hadoop.master1:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>hadoop.master1:50070</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>hadoop.master2:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>hadoop.master2:50070</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
<description>启动故障自动恢复</description>
</property>
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
<description>指定mycluster出故障时,哪个实现类负责执行故障切换</description>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop.master1:8485;hadoop.master2:8485;hadoop.slave1:8485;hadoop.slave2:8485;hadoop.slave3:8485;hadoop.slave4:8485;hadoop.slave5:8485/mycluster</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/work/data/journaldata/jn</value>
<description>指定JournalNode集群在对nameNode的目录进行共享时,自己存储数据的磁盘路径</description>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>shell(/bin/true)</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>10000</value>
<description>脑裂默认配置</description>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>100</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>134217728</value>
</property>
<property>
<name>dfs.datanode.max.xcievers</name>
<value>2048</value>
</property>
<property>
<name>dfs.hosts.exclude</name>
<value>/home/hadoop/hadoop-2.7.2/etc/hadoop/exclude</value>
<description>exclude文件中是写明哪些datanode禁用</description>
</property>
</configuration>
g.编辑${HADOOP_HOME}/etc/hadoop/mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<final>true</final>
</property>
</configuration>
h.编辑${HADOOP_HOME}/etc/hadoop/yarn-site.xml
<?xml version="1.0"?>
<configuration>
<!--启用RM高可用-->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!--RM集群标识符-->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>mycluster</value>
</property>
<!--指定两台RM主机名标识符-->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>nn1,nn2</value>
</property>
<!--RM故障自动切换-->
<property>
<name>yarn.resourcemanager.ha.automatic-failover.recover.enabled</name>
<value>true</value>
</property>
<!--日志设置-->
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/tmp/logs/yarn-log</value>
</property>
<!--RM故障自动恢复-->
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<!--RM主机1-->
<property>
<name>yarn.resourcemanager.hostname.nn1</name>
<value>hadoop.master1</value>
</property>
<!--RM主机2-->
<property>
<name>yarn.resourcemanager.hostname.nn2</name>
<value>hadoop.master2</value>
</property>
<!--RM状态信息存储方式,一种基于内存(MemStore),另一种基于ZK(ZKStore)-->
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<!--使用ZK集群保存状态信息-->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>hadoop.master1:2181,hadoop.master2:2181,hadoop.slave1:2181,hadoop.slave2:2181,hadoop.slave3:2181,hadoop.slave4:2181,hadoop.slave5:2181</value>
</property>
<!--向RM调度资源地址-->
<property>
<name>yarn.resourcemanager.scheduler.address.nn1</name>
<value>hadoop.master1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.nn2</name>
<value>hadoop.master2:8030</value>
</property>
<!--NodeManager通过该地址交换信息-->
<property>
<name>yarn.resourcemanager.resource-tracker.address.nn1</name>
<value>hadoop.master1:8031</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.nn2</name>
<value>hadoop.master2:8031</value>
</property>
<!--客户端通过该地址向RM提交对应用程序操作-->
<property>
<name>yarn.resourcemanager.address.nn1</name>
<value>hadoop.master1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.address.nn2</name>
<value>hadoop.master2:8032</value>
</property>
<!--管理员通过该地址向RM发送管理命令-->
<property>
<name>yarn.resourcemanager.admin.address.nn1</name>
<value>hadoop.master1:8033</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.nn2</name>
<value>hadoop.master2:8033</value>
</property>
<!--RM HTTP访问地址,查看集群信息-->
<property>
<name>yarn.resourcemanager.webapp.address.nn1</name>
<value>hadoop.master1:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.nn2</name>
<value>hadoop.master2:8088</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<description>Classpath for typical applications.</description>
<name>yarn.application.classpath</name>
<value>/home/hadoop/hadoop-2.7.2/etc/hadoop
,/home/hadoop/hadoop-2.7.2/share/hadoop/common/*
,/home/hadoop/hadoop-2.7.2/share/hadoop/common/lib/*
,/home/hadoop/hadoop-2.7.2/share/hadoop/hdfs/*
,/home/hadoop/hadoop-2.7.2/share/hadoop/mapreduce/*
,/home/hadoop/hadoop-2.7.2/share/hadoop/yarn/*</value>
</property>
<!-- Configurations for NodeManager -->
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>5632</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1408</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>5632</value>
</property>
</configuration>
i.复制hadoop到所有节点
j.创建文件夹/work/data
5.启动hadoop
a.启动zookeeper,zkStart.sh
b.格式化hdfs
hdfs namenode -format
c.启动脚本${HADOOP_HOME}/sbin/start-all.sh (start-dfs.sh、start-yarn.sh)
成功后,master1的进程(jps)如下:
27249 QuorumPeerMain (zookeeper)
27525 NameNode
27783 JournalNode
28184 ResourceManager
27964 DFSZKFailoverController
slave1的进程(jps)如下:
25588 JournalNode
25333 QuorumPeerMain
25701 NodeManager
25480 DataNode
遇到的问题:master2的ResourceManager启动不起来,所以在master2上再执行以下start-all.sh
序号 | 组件 | 服务 | hostname | 操作系统 | CPU | 内存 | 硬盘 |
1 | Hadoop.Master1、 zookeeper | NameNode、SecondaryNameNode、resourcemanager、journalnode、QuorumPeerMain | hadoop.master1 | centos7 | 8 | 16G | 200G |
2 | Hadoop.Master2、 zookeeper | NameNode、SecondaryNameNode、resourcemanager、journalnode、QuorumPeerMain | hadoop.master2 | centos7 | 8 | 16G | 200G |
3 | Hadoop.Slave1、 zookeeper | DataNode、NodeManager、QuorumPeerMain | hadoop.slave1 | centos7 | 8 | 16G | 500G |
4 | Hadoop.Slave2、 zookeeper | DataNode、NodeManager、QuorumPeerMain | hadoop.slave2 | centos7 | 8 | 16G | 500G |
5 | Hadoop.Slave3、 zookeeper | DataNode、NodeManager、QuorumPeerMain | hadoop.slave3 | centos7 | 8 | 16G | 500G |
6 | Hadoop.Slave4、 zookeeper | DataNode、NodeManager、QuorumPeerMain | hadoop.slave4 | centos7 | 8 | 16G | 500G |
7 | Hadoop.Slave5、 zookeeper | DataNode、NodeManager、QuorumPeerMain | hadoop.slave5 | centos7 | 8 | 16G | 500G |
8 | hive | hive | hive.master1 | centos7 | 8 | 16G | 500G |
9 | mysql | mysql | mysql.master1 | centos7 | 8 | 16G | 500G |
a.修改/etc/hosts,将各个机器的主机名列表都加到hosts里面,
b.关闭防火墙CentOS_7 防火墙操作.note
c.创建用户hadoop
d.安装适用jdk,目前都是1.8
e.修改hostname
2. 免密码登录
免密码登录实现 免密码登录,第一次登录需要各个机器间都相互登录一次,手动登录费时费力,利用脚本:
ssh_login.sh
#! /bin/bash
array[0]="hadoop.master1"
array[1]="hadoop.master2"
array[2]="hadoop.slave1"
array[3]="hadoop.slave2"
array[4]="hadoop.slave3"
array[5]="hadoop.slave4"
array[6]="hadoop.slave5"
array[7]="hive.master1"
array[8]="mysql.master1"
for data in ${array[@]}
do
/usr/bin/expect <<-EOF
set timeout -1
#这个是拷贝hadoop到各个机器上,可以将hadoop配置好再执行此步
spawn scp -r /home/hadoop/hadoop-2.7.2 hadoop@${data}:/home/hadoop/
expect {
"connecting (yes/no)" { send "yes\r"; exp_continue}
"y/n" { send "y\r"; exp_continue}
"password:" { send "123456\r"; exp_continue}
}
EOF
done
3.搭建zookeeper集群
a.解压zookeeper-3.4.6.tar.gz到目录/home/hadoop/zookeeper-3.4.6
b.编辑/home/hadoop/zookeeper-3.4.6/conf/zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/work/data/zookeeper
# the port at which the clients will connect
clientPort=2181
server.1=hadoop.master1:2888:3888
server.2=hadoop.master2:2888:3888
server.3=hadoop.slave1:2888:3888
server.4=hadoop.slave2:2888:3888
server.5=hadoop.slave3:2888:3888
server.6=hadoop.slave4:2888:3888
server.7=hadoop.slave5:2888:3888
c.赋值zookeeper到所有节点
d.添加快捷启动脚本/usr/bin/zkStart.sh
#!/bin/bash
ssh hadoop@hadoop.master1 "zkServer.sh start"
ssh hadoop@hadoop.master2 "zkServer.sh start"
ssh hadoop@hadoop.slave1 "zkServer.sh start"
ssh hadoop@hadoop.slave2 "zkServer.sh start"
ssh hadoop@hadoop.slave3 "zkServer.sh start"
ssh hadoop@hadoop.slave4 "zkServer.sh start"
ssh hadoop@hadoop.slave5 "zkServer.sh start"
e.添加快捷停止脚本/usr/bin/zkStop.sh
#!/bin/bash
ssh hadoop@hadoop.master1 "zkServer.sh stop"
ssh hadoop@hadoop.master2 "zkServer.sh stop"
ssh hadoop@hadoop.slave1 "zkServer.sh stop"
ssh hadoop@hadoop.slave2 "zkServer.sh stop"
ssh hadoop@hadoop.slave3 "zkServer.sh stop"
ssh hadoop@hadoop.slave4 "zkServer.sh stop"
ssh hadoop@hadoop.slave5 "zkServer.sh stop"
4.hadoop配置
a.解压hadoop-2.7.2.tar.gz(64位)到目录/home/hadoop/hadoop-2.7.2
b.配置hadoop用户的环境变量 .bashrc :
#set java environment
JAVA_HOME=/usr/local/java/jdk1.8.0_101
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:JAVA_HOME/lib/tools.jar
PATH=$JAVA_HOME/bin:$PATH
export JAVA_HOME CLASS_PATH
#Set hadoop
HADOOP_HOME=/home/hadoop/hadoop-2.7.2
PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
#Set zookeeper
ZK_HOME=/home/hadoop/zookeeper-3.4.6
PATH=$PATH:$ZK_HOME/bin
#Set Hive
HIVE_HOME=/home/hadoop/apache-hive-1.2.1-bin
PATH=$PATH:$HIVE_HOME/bin
export PATH
c.编辑${HADOOP_HOME}/etc/hadoop/hadoop-env.sh,增加以下配置
#java环境变量,可配置在脚本上方位置,个人觉得已经在用户环境变量配置了,这里就不用了
export JAVA_HOME=/usr/local/java/jdk1.8.0_101
#hadoop启动后的pid存放目录,默认在/tmp下,系统会定时清理/tmp下的文件,所以改了位置
export HADOOP_PID_DIR=/work/pids
d.编辑${HADOOP_HOME}/etc/hadoop/slaves 文件,将所有slaves的hostname写入
hadoop.slave1
hadoop.slave2
hadoop.slave3
hadoop.slave4
hadoop.slave5
e.编辑${HADOOP_HOME}/etc/hadoop/core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
<description>设置hdfs集群访问名称</description>
</property>
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
<description>流文件的缓冲区为4K</description>
</property>
<property>
<name>fs.trash.interval</name>
<value>4320</value>
<description>文件废弃标识设定,0为禁止此功能</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/work/data/hadoop/tmp</value>
<description>hadoop的临时目录,如果需要配置多个目录,需要逗号隔开,data目录需要我们自己创建</description>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop.master1:2181,hadoop.master2:2181,hadoop.slave1:2181,hadoop.slave2:2181,hadoop.slave3:2181,hadoop.slave4:2181,hadoop.slave5:2181</value>
<description>配置Zookeeper管理HDFS</description>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
<description>hive在访问hadoop时会用到</description>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
<description>hive在访问hadoop时会用到</description>
</property>
</configuration>
f.编辑${HADOOP_HOME}/etc/hadoop/hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///work/data/hadoop/namenode</value>
<description>单独做一个磁盘,以免以后数据量过大,导致系统启动不了</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///work/data/hadoop/datanode</value>
<description>单独做一个磁盘,以免以后数据量过大,导致系统启动不了</description>
</property>
<property>
<name>dfs.permissions</name>
<value>true</value>
<description>开启文件操作时的权限检查标识。</description>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>true</value>
<description>开启文件操作时的权限检查标识。</description>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>hadoop.master1:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>hadoop.master1:50070</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>hadoop.master2:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>hadoop.master2:50070</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
<description>启动故障自动恢复</description>
</property>
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
<description>指定mycluster出故障时,哪个实现类负责执行故障切换</description>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop.master1:8485;hadoop.master2:8485;hadoop.slave1:8485;hadoop.slave2:8485;hadoop.slave3:8485;hadoop.slave4:8485;hadoop.slave5:8485/mycluster</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/work/data/journaldata/jn</value>
<description>指定JournalNode集群在对nameNode的目录进行共享时,自己存储数据的磁盘路径</description>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>shell(/bin/true)</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>10000</value>
<description>脑裂默认配置</description>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>100</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>134217728</value>
</property>
<property>
<name>dfs.datanode.max.xcievers</name>
<value>2048</value>
</property>
<property>
<name>dfs.hosts.exclude</name>
<value>/home/hadoop/hadoop-2.7.2/etc/hadoop/exclude</value>
<description>exclude文件中是写明哪些datanode禁用</description>
</property>
</configuration>
g.编辑${HADOOP_HOME}/etc/hadoop/mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<final>true</final>
</property>
</configuration>
h.编辑${HADOOP_HOME}/etc/hadoop/yarn-site.xml
<?xml version="1.0"?>
<configuration>
<!--启用RM高可用-->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!--RM集群标识符-->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>mycluster</value>
</property>
<!--指定两台RM主机名标识符-->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>nn1,nn2</value>
</property>
<!--RM故障自动切换-->
<property>
<name>yarn.resourcemanager.ha.automatic-failover.recover.enabled</name>
<value>true</value>
</property>
<!--日志设置-->
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/tmp/logs/yarn-log</value>
</property>
<!--RM故障自动恢复-->
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<!--RM主机1-->
<property>
<name>yarn.resourcemanager.hostname.nn1</name>
<value>hadoop.master1</value>
</property>
<!--RM主机2-->
<property>
<name>yarn.resourcemanager.hostname.nn2</name>
<value>hadoop.master2</value>
</property>
<!--RM状态信息存储方式,一种基于内存(MemStore),另一种基于ZK(ZKStore)-->
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<!--使用ZK集群保存状态信息-->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>hadoop.master1:2181,hadoop.master2:2181,hadoop.slave1:2181,hadoop.slave2:2181,hadoop.slave3:2181,hadoop.slave4:2181,hadoop.slave5:2181</value>
</property>
<!--向RM调度资源地址-->
<property>
<name>yarn.resourcemanager.scheduler.address.nn1</name>
<value>hadoop.master1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.nn2</name>
<value>hadoop.master2:8030</value>
</property>
<!--NodeManager通过该地址交换信息-->
<property>
<name>yarn.resourcemanager.resource-tracker.address.nn1</name>
<value>hadoop.master1:8031</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.nn2</name>
<value>hadoop.master2:8031</value>
</property>
<!--客户端通过该地址向RM提交对应用程序操作-->
<property>
<name>yarn.resourcemanager.address.nn1</name>
<value>hadoop.master1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.address.nn2</name>
<value>hadoop.master2:8032</value>
</property>
<!--管理员通过该地址向RM发送管理命令-->
<property>
<name>yarn.resourcemanager.admin.address.nn1</name>
<value>hadoop.master1:8033</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.nn2</name>
<value>hadoop.master2:8033</value>
</property>
<!--RM HTTP访问地址,查看集群信息-->
<property>
<name>yarn.resourcemanager.webapp.address.nn1</name>
<value>hadoop.master1:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.nn2</name>
<value>hadoop.master2:8088</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<description>Classpath for typical applications.</description>
<name>yarn.application.classpath</name>
<value>/home/hadoop/hadoop-2.7.2/etc/hadoop
,/home/hadoop/hadoop-2.7.2/share/hadoop/common/*
,/home/hadoop/hadoop-2.7.2/share/hadoop/common/lib/*
,/home/hadoop/hadoop-2.7.2/share/hadoop/hdfs/*
,/home/hadoop/hadoop-2.7.2/share/hadoop/mapreduce/*
,/home/hadoop/hadoop-2.7.2/share/hadoop/yarn/*</value>
</property>
<!-- Configurations for NodeManager -->
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>5632</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1408</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>5632</value>
</property>
</configuration>
i.复制hadoop到所有节点
j.创建文件夹/work/data
5.启动hadoop
a.启动zookeeper,zkStart.sh
b.格式化hdfs
hdfs namenode -format
c.启动脚本${HADOOP_HOME}/sbin/start-all.sh (start-dfs.sh、start-yarn.sh)
成功后,master1的进程(jps)如下:
27249 QuorumPeerMain (zookeeper)
27525 NameNode
27783 JournalNode
28184 ResourceManager
27964 DFSZKFailoverController
slave1的进程(jps)如下:
25588 JournalNode
25333 QuorumPeerMain
25701 NodeManager
25480 DataNode
遇到的问题:master2的ResourceManager启动不起来,所以在master2上再执行以下start-all.sh
相关文章推荐
- hadoop+hbase完全分布式环境搭建
- Hadoop2.6.0完全分布式集群搭建实操笔记
- 【那些遇到的坑】—hadoop完全分布式集群搭建执行jps报错:Error occurred during initialization of VM
- Hadoop集群完全分布式搭建教程-CentOS
- Hadoop2.7.3+Spark2.1.0 完全分布式环境 搭建全过程
- 【Hadoop】搭建完全分布式的hadoop
- 学习记录--颤抖吧,hadoop(五)---搭建完全分布式hadoop集群(1)
- 【Hadoop】搭建完全分布式的hadoop
- hadoop2.7.1在vmware上3台centos7虚拟机上的完全分布式集群搭建
- hadoop环境搭建-完全分布式
- Hadoop集群完全分布式搭建教程-CentOS
- Ubuntu Hadoop 完全分布式搭建
- centos下hadoop-2.6.0完全分布式搭建
- Hadoop2.7.3+Spark2.1.0 完全分布式环境 搭建全过程
- Hadoop2.2.0安装配置手册!完全分布式Hadoop集群搭建过程~(心血之作啊~~)
- hadoop完全分布式集群搭建
- DayDayUP_大数据学习课程[1]_hadoop2.6.0完全分布式集群环境和伪分布式集群搭建
- 6.神操作(把master上的三个安装包scp给slave)—Hadoop完全分布式搭建完成
- Ubantu下搭建Hadoop1.x完全分布式集群
- 大数据系列(3)——Hadoop集群完全分布式坏境搭建