您的位置:首页 > 运维架构

hadoop2安装文档

2015-10-04 08:40 429 查看
hadoop2安装文档

1. 安装jdk

每台机器都安装

2. 安装ssh

每台机器都安装

3. 安装zookeeper

随便找奇数台机器上安装。

我的是在:master1:2181,master1ha:2181,master2:2181

4. 配置host、机器规划

机器规划:

master1: 192.168.56.151 (active namenode,RM)

master1-ha: 192.168.56.152 (standby namenode,jn,zk)

master2: 192.168.56.153 (active namenode,jn,RM,zk)

master2-ha: 192.168.56.154 (standby namenode,jn,zk)

slave1: 192.168.56.155 (datanode,nodemanager)

slave2: 192.168.56.156 (datanode,nodemanager)

slave3: 192.168.56.157 (datanode,nodemanager)

配置hosts

192.168.56.151 master1

192.168.56.152 master1ha

192.168.56.153 master2

192.168.56.154 master2ha

192.168.56.155 slave1

192.168.56.156 slave2

192.168.56.157 slave3

5. 安装hadoop2

5.1. 上传

用工具上传hadoop2安装文件

或者命令

su – hadoop

cd /home/hadoop

rz -y

5.2. 解压

tar -zxvf hadoop-2.6.0.tar.gz

5.3. 重命名

mv hadoop-2.2.0 hadoop

5.4. 配置环境变量

su – root

vi /etc/profile

添加配置

export HADOOP_HOME=/home/hadoop/hadoop

export PATH=$PATH:$HADOOP_HOME/sbin

source /etc/peofile

su - hadoop

5.5. 配置hadoop2配置文件

将配置文件上传到/home/hadoop/hadoop/etc/hadoop下

5.5.1. core-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<!-- federation的配置,相当于视图 -->

<property>

<name>fs.defaultFS</name>

<value>viewfs:///</value>

</property>

<!-- 第一套namenode集群 -->

<property>

<name>fs.viewfs.mounttable.default.link./tmp</name>

<value>hdfs://hadoop-cluster1/tmp</value>

</property>

<!-- 第二套namenode集群 -->

<property>

<name>fs.viewfs.mounttable.default.link./tmp1</name>

<value>hdfs://hadoop-cluster2/tmp1</value>

</property>

</configuration>

5.5.2. hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<!-- 两套namenode集群的名字 -->

<property>

<name>dfs.nameservices</name>

<value>hadoop-cluster1,hadoop-cluster2</value>

</property>

<!-- 第一套nemenode集群的两台机器 -->

<property>

<name>dfs.ha.namenodes.hadoop-cluster1</name>

<value>nn1,nn2</value>

</property>

<!-- 第一套namenode的主的数据传输地址 -->

<property>

<name>dfs.namenode.rpc-address.hadoop-cluster1.nn1</name>

<value>master1:9000</value>

</property>

<!-- 第一套namenode的备的数据传输地址 -->

<property>

<name>dfs.namenode.rpc-address.hadoop-cluster1.nn2</name>

<value>master1ha:9000</value>

</property>

<!-- 第一套namenode的主的WEB地址 -->

<property>

<name>dfs.namenode.http-address.hadoop-cluster1.nn1</name>

<value>master1:50070</value>

</property>

<!-- 第一套namenode的备的WEB地址 -->

<property>

<name>dfs.namenode.http-address.hadoop-cluster1.nn2</name>

<value>master1ha:50070</value>

</property>

<!-- 第一套secondarynamenode的主的http地址 -->

<property>

<name>dfs.namenode.secondary.http-address.hadoop-cluster1.nn1</name>

<value>master1:9001</value>

</property>

<!-- 第一套secondarynamenode的备的http地址 -->

<property>

<name>dfs.namenode.secondary.http-address.hadoop-cluster1.nn2</name>

<value>master1ha:9001</value>

</property>

<!-- 第一套namenode的主备切换实现类 -->

<property>

<name>dfs.client.failover.proxy.provider.hadoop-cluster1</name>

<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

</property>

<!-- 第二套secondarynamenode的配置和第一套一样 -->

<property>

<name>dfs.ha.namenodes.hadoop-cluster2</name>

<value>nn3,nn4</value>

</property>

<property>

<name>dfs.namenode.rpc-address.hadoop-cluster2.nn3</name>

<value>master2:9000</value>

</property>

<property>

<name>dfs.namenode.rpc-address.hadoop-cluster2.nn4</name>

<value>master2ha:9000</value>

</property>

<property>

<name>dfs.namenode.http-address.hadoop-cluster2.nn3</name>

<value>master2:50070</value>

</property>

<property>

<name>dfs.namenode.http-address.hadoop-cluster2.nn4</name>

<value>master2ha:50070</value>

</property>

<property>

<name>dfs.namenode.secondary.http-address.hadoop-cluster2.nn3</name>

<value>master2:9001</value>

</property>

<property>

<name>dfs.namenode.secondary.http-address.hadoop-cluster2.nn4</name>

<value>master2ha:9001</value>

</property>

<property>

<name>dfs.client.failover.proxy.provider.hadoop-cluster2</name>

<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

</property>

<!-- namenode的本地文件夹 -->

<property>

<name>dfs.namenode.name.dir</name>

<value>/home/hadoop/hadoop/namedir</value>

</property>

<!-- 第一套namenode的主的journal文件位置 -->

<property>

<name>dfs.namenode.shared.edits.dir.hadoop-cluster1.nn1</name>

<value>qjournal://master1ha:8485;master2:8485;master2ha:8485/cluster1</value>

</property>

<!-- 第一套namenode的备的journal文件位置,合主相同 -->

<property>

<name>dfs.namenode.shared.edits.dir.hadoop-cluster1.nn2</name>

<value>qjournal://master1ha:8485;master2:8485;master2ha:8485/cluster1</value>

</property>

<!-- 第二套namenode的主的journal文件位置 -->

<property>

<name>dfs.namenode.shared.edits.dir.hadoop-cluster2.nn3</name>

<value>qjournal://master1ha:8485;master2:8485;master2ha:8485/cluster2</value>

</property>

<!-- 第二套namenode的备的journal文件位置,合主相同 -->

<property>

<name>dfs.namenode.shared.edits.dir.hadoop-cluster2.nn4</name>

<value>qjournal://master1ha:8485;master2:8485;master2ha:8485/cluster2</value>

</property>

<!-- 数据存放的文件夹 -->

<property>

<name>dfs.datanode.data.dir</name>

<value>/home/hadoop/hadoop/datadir</value>

</property>

<!-- zookeeper地址 -->

<property>

<name>ha.zookeeper.quorum</name>

<value>master1:2181,master1ha:2181,master2:2181</value>

</property>

<!-- ssh采用的方法 -->

<property>

<name>dfs.ha.fencing.methods</name>

<value>sshfence</value>

</property>

<!-- zookeeper超时 -->

<property>

<name>ha.zookeeper.session-timeout.ms</name>

<value>5000</value>

</property>

<!-- 是否namenode主备自动切换 -->

<property>

<name>dfs.ha.automatic-failover.enabled</name>

<value>true</value>

</property>

<!-- journalnode文件夹 -->

<property>

<name>dfs.journalnode.edits.dir</name>

<value>/home/hadoop/hadoop/jndir</value>

</property>

<!-- 备份数 -->

<property>

<name>dfs.replication</name>

<value>2</value>

</property>

<!-- 权限 -->

<property>

<name>dfs.permission</name>

<value>false</value>

</property>

<!-- 是否允许web页面访问hdfs -->

<property>

<name>dfs.webhdfs.enabled</name>

<value>true</value>

</property>

<!-- 是否支持追加 -->

<property>

<name>dfs.support.append</name>

<value>true</value>

</property>

<!-- 临时文件夹 -->

<property>

<name>hadoop.tmp.dir</name>

<value>/home/hadoop/hadoop/tmp</value>

</property>

<!-- hadoop代理用户配置 -->

<property>

<name>hadoop.proxyuser.hduser.hosts</name>

<value>*</value>

</property>

<property>

<name>hadoop.proxyuser.hduser.groups</name>

<value>*</value>

</property>

<!-- ssh私钥位置 -->

<property>

<name>dfs.ha.fencing.ssh.private-key-files</name>

<value>/home/hadoop/.ssh/id_rsa</value>

</property>

</configuration>

5.5.3. mapred-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<!--mapreduce运行平台的名字,mapreduce运行时,需要将之设置为yarn -->

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

<!--mapreduce job history server的IPC传输地址 -->

<property>

<name>mapreduce.jobhistory.address</name>

<value>master1:10020</value>

</property>

<!--mapreduce job history server的WEB地址 -->

<property>

<name>mapreduce.jobhistory.webapp.address</name>

<value>master1:19888</value>

</property>

<!--mapreduce 用于存放数据的文件夹 -->

<property>

<name>mapred.system.dir</name>

<value>/home/hadoop/hadoop/hadoopmrsys</value>

<final>true</final>

</property>

<!--mapreduce 用于存放数据的文件夹-->

<property>

<name>mapred.local.dir</name>

<value>/home/hadoop/hadoop/hadoopmrlocal</value>

<final>true</final>

</property>

</configuration>

5.5.4. yarn-site.xml

<?xml version="1.0"?>

<!--

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License. See accompanying LICENSE file.

-->

<configuration>

<!--rm失联后重新链接的时间-->

<property>

<name>yarn.resourcemanager.connect.retry-interval.ms</name>

<value>2000</value>

</property>

<!--开启resource manager HA,默认为false-->

<property>

<name>yarn.resourcemanager.ha.enabled</name>

<value>true</value>

</property>

<!--配置resource manager -->

<property>

<name>yarn.resourcemanager.ha.rm-ids</name>

<value>rm1,rm2</value>

</property>

<!-- zookeeper地址 -->

<property>

<name>ha.zookeeper.quorum</name>

<value>master1:2181,master1ha:2181,master2:2181</value>

</property>

<!--开启故障自动切换-->

<property>

<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>

<value>true</value>

</property>

<!-- rm1的hostname -->

<property>

<name>yarn.resourcemanager.hostname.rm1</name>

<value>master1</value>

</property>

<!-- rm2的hostname -->

<property>

<name>yarn.resourcemanager.hostname.rm2</name>

<value>master2</value>

</property>

<!--在master1上配置rm1,在master2上配置rm2,注意:一般都喜欢把配置好的文件远程复制到其它机器上,但这个在YARN的另一个机器上一定要修改-->

<property>

<name>yarn.resourcemanager.ha.id</name>

<value>rm1</value>

<description>If we want to launch more than one RM in single node, we need this configuration</description>

</property>

<!--开启自动恢复功能-->

<property>

<name>yarn.resourcemanager.recovery.enabled</name>

<value>true</value>

</property>

<!--配置与zookeeper的连接地址-->

<property>

<name>yarn.resourcemanager.zk-state-store.address</name>

<value>master1:2181,master1ha:2181,master2:2181</value>

</property>

<property>

<name>yarn.resourcemanager.store.class</name>

<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>

</property>

<property>

<name>yarn.resourcemanager.zk-address</name>

<value>master1:2181,master1ha:2181,master2:2181</value>

</property>

<property>

<name>yarn.resourcemanager.cluster-id</name>

<value>hadoop-cluster1-yarn</value>

</property>

<!--schelduler失联等待连接时间-->

<property>

<name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>

<value>5000</value>

</property>

<!--配置rm1-->

<!--rm1对外的IPC传输地址-->

<property>

<name>yarn.resourcemanager.address.rm1</name>

<value>master1:8132</value>

</property>

<!--rm1调度器对外的IPC传输地址-->

<property>

<name>yarn.resourcemanager.scheduler.address.rm1</name>

<value>master1:8130</value>

</property>

<!--rm1对外的WEB访问地址-->

<property>

<name>yarn.resourcemanager.webapp.address.rm1</name>

<value>master1:8188</value>

</property>

<!--rm1的resource-tracker对外的IPC传输地址-->

<property>

<name>yarn.resourcemanager.resource-tracker.address.rm1</name>

<value>master1:8131</value>

</property>

<!--rm1的Admin对外的IPC传输地址-->

<property>

<name>yarn.resourcemanager.admin.address.rm1</name>

<value>master1:8033</value>

</property>

<!--rm1的Admin的ha对外的IPC传输地址-->

<property>

<name>yarn.resourcemanager.ha.admin.address.rm1</name>

<value>master1:23142</value>

</property>

<!--配置rm2,同rm1-->

<property>

<name>yarn.resourcemanager.address.rm2</name>

<value>master2:8132</value>

</property>

<property>

<name>yarn.resourcemanager.scheduler.address.rm2</name>

<value>master2:8130</value>

</property>

<property>

<name>yarn.resourcemanager.webapp.address.rm2</name>

<value>master2:8188</value>

</property>

<property>

<name>yarn.resourcemanager.resource-tracker.address.rm2</name>

<value>master2:8131</value>

</property>

<property>

<name>yarn.resourcemanager.admin.address.rm2</name>

<value>master2:8033</value>

</property>

<property>

<name>yarn.resourcemanager.ha.admin.address.rm2</name>

<value>master2:23142</value>

</property>

<!--附属服务名称,如果使用mapreduce,需将只配置为mapreduce_shuffle-->

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<!--mapreduce_shuffle的Handler类-->

<property>

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

<!--nodemanager存放临时文件的本地目录-->

<property>

<name>yarn.nodemanager.local-dirs</name>

<value>/home/hadoop2/hadoop2/nodemanagerlocal</value>

</property>

<!--nodemanager存放日志的本地目录-->

<property>

<name>yarn.nodemanager.log-dirs</name>

<value>/home/hadoop2/hadoop2/nodemanagerlogs</value>

</property>

<property>

<name>mapreduce.shuffle.port</name>

<value>23080</value>

</property>

<!--故障处理类-->

<property>

<name>yarn.client.failover-proxy-provider</name>

<value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value>

</property>

<property>

<name>yarn.resourcemanager.ha.automatic-failover.zk-base-path</name>

<value>/yarn-leader-election</value>

</property>

</configuration>

5.5.5. slaves

slave1

slave2

slave3

5.5.6. hadoop-env.sh

# The java implementation to use.,wilson:配置jdk环境变量

export JAVA_HOME=/usr/jdk

# The jsvc implementation to use. Jsvc is required to run secure datanodes.

#export JSVC_HOME=${JSVC_HOME}

export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}

# Extra Java CLASSPATH elements. Automatically insert capacity-scheduler.

for f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar; do

if [ "$HADOOP_CLASSPATH" ]; then

export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f

else

export HADOOP_CLASSPATH=$f

fi

done

# The maximum amount of heap to use, in MB. Default is 1000.

#export HADOOP_HEAPSIZE=

#export HADOOP_NAMENODE_INIT_HEAPSIZE=""

# Extra Java runtime options. Empty by default.

export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"

# Command specific options appended to HADOOP_OPTS when specified

export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS"

export HADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS"

export HADOOP_SECONDARYNAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_SECONDARYNAMENODE_OPTS"

# The following applies to multiple commands (fs, dfs, fsck, distcp etc)

export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS"

#HADOOP_JAVA_PLATFORM_OPTS="-XX:-UsePerfData $HADOOP_JAVA_PLATFORM_OPTS"

# On secure datanodes, user to run the datanode as after dropping privileges

export HADOOP_SECURE_DN_USER=${HADOOP_SECURE_DN_USER}

# Where log files are stored. $HADOOP_HOME/logs by default.

#export HADOOP_LOG_DIR=${HADOOP_LOG_DIR}/$USER

# Where log files are stored in the secure data environment.

export HADOOP_SECURE_DN_LOG_DIR=${HADOOP_LOG_DIR}/${HADOOP_HDFS_USER}

# The directory where pid files are stored. /tmp by default.

# NOTE: this should be set to a directory that can only be written to by

# the user that will run the hadoop daemons. Otherwise there is the

# potential for a symlink attack.

export HADOOP_PID_DIR=${HADOOP_PID_DIR}

export HADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR}

# A string representing this instance of hadoop. $USER by default.

export HADOOP_IDENT_STRING=$USER

5.5.7. yarn-env.sh

# User for YARN daemons

export HADOOP_YARN_USER=${HADOOP_YARN_USER:-yarn}

# resolve links - $0 may be a softlink

export YARN_CONF_DIR="${YARN_CONF_DIR:-$HADOOP_YARN_HOME/conf}"

# some Java parameters:wilson,jdk环境变量

export JAVA_HOME=/usr/jdk

if [ "$JAVA_HOME" != "" ]; then

#echo "run java in $JAVA_HOME"

JAVA_HOME=$JAVA_HOME

fi

if [ "$JAVA_HOME" = "" ]; then

echo "Error: JAVA_HOME is not set."

exit 1

fi

JAVA=$JAVA_HOME/bin/java

JAVA_HEAP_MAX=-Xmx1000m

# For setting YARN specific HEAP sizes please use this

# Parameter and set appropriately

# YARN_HEAPSIZE=1000

# check envvars which might override default args

if [ "$YARN_HEAPSIZE" != "" ]; then

JAVA_HEAP_MAX="-Xmx""$YARN_HEAPSIZE""m"

fi

# Resource Manager specific parameters

# Specify the max Heapsize for the ResourceManager using a numerical value

# in the scale of MB. For example, to specify an jvm option of -Xmx1000m, set

# the value to 1000.

# This value will be overridden by an Xmx setting specified in either YARN_OPTS

# and/or YARN_RESOURCEMANAGER_OPTS.

# If not specified, the default value will be picked from either YARN_HEAPMAX

# or JAVA_HEAP_MAX with YARN_HEAPMAX as the preferred option of the two.

#export YARN_RESOURCEMANAGER_HEAPSIZE=1000

# Specify the JVM options to be used when starting the ResourceManager.

# These options will be appended to the options specified as YARN_OPTS

# and therefore may override any similar flags set in YARN_OPTS

#export YARN_RESOURCEMANAGER_OPTS=

# Node Manager specific parameters

# Specify the max Heapsize for the NodeManager using a numerical value

# in the scale of MB. For example, to specify an jvm option of -Xmx1000m, set

# the value to 1000.

# This value will be overridden by an Xmx setting specified in either YARN_OPTS

# and/or YARN_NODEMANAGER_OPTS.

# If not specified, the default value will be picked from either YARN_HEAPMAX

# or JAVA_HEAP_MAX with YARN_HEAPMAX as the preferred option of the two.

#export YARN_NODEMANAGER_HEAPSIZE=1000

# Specify the JVM options to be used when starting the NodeManager.

# These options will be appended to the options specified as YARN_OPTS

# and therefore may override any similar flags set in YARN_OPTS

#export YARN_NODEMANAGER_OPTS=

# so that filenames w/ spaces are handled correctly in loops below

IFS=

# default log directory & file

if [ "$YARN_LOG_DIR" = "" ]; then

YARN_LOG_DIR="$HADOOP_YARN_HOME/logs"

fi

if [ "$YARN_LOGFILE" = "" ]; then

YARN_LOGFILE='yarn.log'

fi

# default policy file for service-level authorization

if [ "$YARN_POLICYFILE" = "" ]; then

YARN_POLICYFILE="hadoop-policy.xml"

fi

# restore ordinary behaviour

unset IFS

YARN_OPTS="$YARN_OPTS -Dhadoop.log.dir=$YARN_LOG_DIR"

YARN_OPTS="$YARN_OPTS -Dyarn.log.dir=$YARN_LOG_DIR"

YARN_OPTS="$YARN_OPTS -Dhadoop.log.file=$YARN_LOGFILE"

YARN_OPTS="$YARN_OPTS -Dyarn.log.file=$YARN_LOGFILE"

YARN_OPTS="$YARN_OPTS -Dyarn.home.dir=$YARN_COMMON_HOME"

YARN_OPTS="$YARN_OPTS -Dyarn.id.str=$YARN_IDENT_STRING"

YARN_OPTS="$YARN_OPTS -Dhadoop.root.logger=${YARN_ROOT_LOGGER:-INFO,console}"

YARN_OPTS="$YARN_OPTS -Dyarn.root.logger=${YARN_ROOT_LOGGER:-INFO,console}"

if [ "x$JAVA_LIBRARY_PATH" != "x" ]; then

YARN_OPTS="$YARN_OPTS -Djava.library.path=$JAVA_LIBRARY_PATH"

fi

YARN_OPTS="$YARN_OPTS -Dyarn.policy.file=$YARN_POLICYFILE"

5.5.8. 创建文件夹

mkdir -m 755 namedir
mkdir -m 755 datadir
mkdir -m 755 tmp
mkdir -m 755 jndir
mkdir -m 755 hadoopmrsys
mkdir -m 755 hadoopmrlocal
mkdir -m 755 nodemanagerlocal
mkdir -m 755 nodemanagerlogs

5.6. 发送到其他节点

scp-r /home/hadoop/hadoop hadoop@192.168.56.152:/home/hadoop/hadoop
scp-r /home/hadoop/hadoop hadoop@192.168.56.153:/home/hadoop/hadoop
scp-r /home/hadoop/hadoop hadoop@192.168.56.154:/home/hadoop/hadoop
scp-r /home/hadoop/hadoop hadoop@192.168.56.155:/home/hadoop/hadoop
scp-r /home/hadoop/hadoop hadoop@192.168.56.156:/home/hadoop/hadoop
scp-r /home/hadoop/hadoop hadoop@192.168.56.157:/home/hadoop/hadoop

5.7. 修改rm2的yarn-site.xml文件

<?xml version="1.0"?>

<!--

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License. See accompanying LICENSE file.

-->

<configuration>

<!--rm失联后重新链接的时间-->

<property>

<name>yarn.resourcemanager.connect.retry-interval.ms</name>

<value>2000</value>

</property>

<!--开启resource manager HA,默认为false-->

<property>

<name>yarn.resourcemanager.ha.enabled</name>

<value>true</value>

</property>

<!--配置resource manager -->

<property>

<name>yarn.resourcemanager.ha.rm-ids</name>

<value>rm1,rm2</value>

</property>

<!-- zookeeper地址 -->

<property>

<name>ha.zookeeper.quorum</name>

<value>master1:2181,master1ha:2181,master2:2181</value>

</property>

<!--开启故障自动切换-->

<property>

<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>

<value>true</value>

</property>

<!-- rm1的hostname -->

<property>

<name>yarn.resourcemanager.hostname.rm1</name>

<value>master1</value>

</property>

<!-- rm2的hostname -->

<property>

<name>yarn.resourcemanager.hostname.rm2</name>

<value>master2</value>

</property>

<!--在master1上配置rm1,在master2上配置rm2,注意:一般都喜欢把配置好的文件远程复制到其它机器上,但这个在YARN的另一个机器上一定要修改-->

<property>

<name>yarn.resourcemanager.ha.id</name>

<value>rm2</value>

<description>If we want to launch more than one RM in single node, we need this configuration</description>

</property>

<!--开启自动恢复功能-->

<property>

<name>yarn.resourcemanager.recovery.enabled</name>

<value>true</value>

</property>

<!--配置与zookeeper的连接地址-->

<property>

<name>yarn.resourcemanager.zk-state-store.address</name>

<value>master1:2181,master1ha:2181,master2:2181</value>

</property>

<property>

<name>yarn.resourcemanager.store.class</name>

<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>

</property>

<property>

<name>yarn.resourcemanager.zk-address</name>

<value>master1:2181,master1ha:2181,master2:2181</value>

</property>

<property>

<name>yarn.resourcemanager.cluster-id</name>

<value>hadoop-cluster1-yarn</value>

</property>

<!--schelduler失联等待连接时间-->

<property>

<name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>

<value>5000</value>

</property>

<!--配置rm1-->

<!--rm1对外的IPC传输地址-->

<property>

<name>yarn.resourcemanager.address.rm1</name>

<value>master1:8132</value>

</property>

<!--rm1调度器对外的IPC传输地址-->

<property>

<name>yarn.resourcemanager.scheduler.address.rm1</name>

<value>master1:8130</value>

</property>

<!--rm1对外的WEB访问地址-->

<property>

<name>yarn.resourcemanager.webapp.address.rm1</name>

<value>master1:8188</value>

</property>

<!--rm1的resource-tracker对外的IPC传输地址-->

<property>

<name>yarn.resourcemanager.resource-tracker.address.rm1</name>

<value>master1:8131</value>

</property>

<!--rm1的Admin对外的IPC传输地址-->

<property>

<name>yarn.resourcemanager.admin.address.rm1</name>

<value>master1:8033</value>

</property>

<!--rm1的Admin的ha对外的IPC传输地址-->

<property>

<name>yarn.resourcemanager.ha.admin.address.rm1</name>

<value>master1:23142</value>

</property>

<!--配置rm2,同rm1-->

<property>

<name>yarn.resourcemanager.address.rm2</name>

<value>master2:8132</value>

</property>

<property>

<name>yarn.resourcemanager.scheduler.address.rm2</name>

<value>master2:8130</value>

</property>

<property>

<name>yarn.resourcemanager.webapp.address.rm2</name>

<value>master2:8188</value>

</property>

<property>

<name>yarn.resourcemanager.resource-tracker.address.rm2</name>

<value>master2:8131</value>

</property>

<property>

<name>yarn.resourcemanager.admin.address.rm2</name>

<value>master2:8033</value>

</property>

<property>

<name>yarn.resourcemanager.ha.admin.address.rm2</name>

<value>master2:23142</value>

</property>

<!--附属服务名称,如果使用mapreduce,需将只配置为mapreduce_shuffle-->

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<!--mapreduce_shuffle的Handler类-->

<property>

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

<!--nodemanager存放临时文件的本地目录-->

<property>

<name>yarn.nodemanager.local-dirs</name>

<value>/home/hadoop2/hadoop2/nodemanagerlocal</value>

</property>

<!--nodemanager存放日志的本地目录-->

<property>

<name>yarn.nodemanager.log-dirs</name>

<value>/home/hadoop2/hadoop2/nodemanagerlogs</value>

</property>

<property>

<name>mapreduce.shuffle.port</name>

<value>23080</value>

</property>

<!--故障处理类-->

<property>

<name>yarn.client.failover-proxy-provider</name>

<value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value>

</property>

<property>

<name>yarn.resourcemanager.ha.automatic-failover.zk-base-path</name>

<value>/yarn-leader-election</value>

</property>

</configuration>

5.8. 初始化配置

5.8.1. 启动zookeeper集群

在2n-1台机器上启动

启动:zkServer.sh start

查看:

进程:jps

状态:zkServer.shstatus

5.8.2. 格式化ZooKeeper集群,目的是在ZooKeeper集群上建立HA的相应节点。

在第一个namenode集群的主(master1)上执行命令:

/home/hadoop/hadoop/bin/hdfs zkfc –formatZK

在第二个namenode集群的主(master2)上执行命令:

/home/hadoop/hadoop/bin/hdfs zkfc -formatZK

5.8.3. 启动JournalNode集群

在安装了journal的机器上启动journalNode集群,例如我安装在master1ha、master2、master2ha这三台机器上,所以在这三台机器上启动。

启动:

在master1ha机器上:

/home/hadoop/hadoop/sbin/hadoop-daemon.sh startjournalnode

在master2机器上:

/home/hadoop/hadoop/sbin/hadoop-daemon.sh startjournalnode

在master2ha机器上:

/home/hadoop/hadoop/sbin/hadoop-daemon.sh startjournalnode

查看进程:

jps

5.8.4. 格式化master1的namenode

在master1节点上执行,clusterId为这个集群的id

/home/hadoop/hadoop/bin/hdfs namenode -format -clusterIdhellokitty

5.8.5. 启动master1上的namenode

/home/hadoop/hadoop/sbin/hadoop-daemon.sh startnamenode

查看:

jps

5.8.6. 在master1ha上执行,将master1上的namenode数据同步到master1ha上

/home/hadoop/hadoop/bin/hdfsnamenode -bootstrapStandby

5.8.7. 启动master1ha的namenode

/home/hadoop/hadoop/sbin/hadoop-daemon.shstart namenode

5.8.8. 将master1的namenode置成active状态

在master1上执行:

/home/hadoop/hadoop/sbin/hadoop-daemon.shstart zkfc

在master1ha上执行:

/home/hadoop/hadoop/sbin/hadoop-daemon.shstart zkfc

5.8.9. 格式化集群2上的master2的namenode

/home/hadoop/hadoop/bin/hdfs namenode-format -clusterId hellokitty

5.8.10. 启动master2的namenode

/home/hadoop/hadoop/sbin/hadoop-daemon.shstart namenode

5.8.11. 在master2ha上执行,将master2的namenode数据同步到master2ha上

/home/hadoop/hadoop/bin/hdfsnamenode -bootstrapStandby

5.8.12. 启动master2ha上的namenode

/home/hadoop/hadoop/sbin/hadoop-daemon.shstart namenode

5.8.13. 将master2置成active状态

在master2上执行:

hadoop-daemon.sh start zkfc

在master2ha上执行:

hadoop-daemon.sh start zkfc

5.8.14. 启动所有的datanode

在三台机器上分别执行:

hadoop-daemon.sh start datanode

5.8.15. 启动yarn

在master1上执行

start-yarn.sh

验证:

通过浏览器访问

http://master1:8188

在master2上执行

yarn-daemon.sh start resourcemanager

验证:

通过浏览器访问

http://master2:8188

5.8.16. 验证yarn

上传文件:

hadoop fs -put /home/hadoop/hadoop/etc/hadoop/core-site.xml/tmp

查看:

hadoop fs –ls /tmp

执行wordcount

hadoop jar/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jarwordcount /tmp/core-site.xml /tmp/out

5.9. 停止

5.9.1. 在master1上执行:

stop-dfs.sh

stop-yarn.sh

5.9.2. 在master2上执行:

stop-yarn.sh

5.9.3. 停止zookeeper

zkServer.sh stop

5.10. 启动

5.10.1. 启动zookeeper

zkServer.sh start

5.10.2. 启动journalnode

在master1ha机器上:

/home/hadoop/hadoop/sbin/hadoop-daemon.shstart journalnode

在master2机器上:

/home/hadoop/hadoop/sbin/hadoop-daemon.sh startjournalnode

在master2ha机器上:

/home/hadoop/hadoop/sbin/hadoop-daemon.sh startjournalnode

5.10.3. 在master1上执行:

start-dfs.sh

start-yarn.sh

5.10.4. 在master2上执行:

yarn-daemon.sh start resourcemanager

5.11. 监控

5.11.1. 监控namenode主从

http://master1:50070

5.11.2. 监控mapreduce任务

http://master1:8188
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: