hadoop2安装文档
2015-10-04 08:40
429 查看
hadoop2安装文档
我的是在:master1:2181,master1ha:2181,master2:2181
配置hosts
或者命令
su – hadoop
cd /home/hadoop
rz -y
vi /etc/profile
添加配置
source /etc/peofile
su - hadoop
scp-r /home/hadoop/hadoop hadoop@192.168.56.153:/home/hadoop/hadoop
scp-r /home/hadoop/hadoop hadoop@192.168.56.154:/home/hadoop/hadoop
scp-r /home/hadoop/hadoop hadoop@192.168.56.155:/home/hadoop/hadoop
scp-r /home/hadoop/hadoop hadoop@192.168.56.156:/home/hadoop/hadoop
scp-r /home/hadoop/hadoop hadoop@192.168.56.157:/home/hadoop/hadoop
启动:zkServer.sh start
查看:
进程:jps
状态:zkServer.shstatus
/home/hadoop/hadoop/bin/hdfs zkfc –formatZK
在第二个namenode集群的主(master2)上执行命令:
/home/hadoop/hadoop/bin/hdfs zkfc -formatZK
启动:
在master1ha机器上:
/home/hadoop/hadoop/sbin/hadoop-daemon.sh startjournalnode
在master2机器上:
/home/hadoop/hadoop/sbin/hadoop-daemon.sh startjournalnode
在master2ha机器上:
/home/hadoop/hadoop/sbin/hadoop-daemon.sh startjournalnode
查看进程:
jps
/home/hadoop/hadoop/bin/hdfs namenode -format -clusterIdhellokitty
查看:
jps
/home/hadoop/hadoop/sbin/hadoop-daemon.shstart zkfc
在master1ha上执行:
/home/hadoop/hadoop/sbin/hadoop-daemon.shstart zkfc
hadoop-daemon.sh start zkfc
在master2ha上执行:
hadoop-daemon.sh start zkfc
hadoop-daemon.sh start datanode
start-yarn.sh
验证:
通过浏览器访问
http://master1:8188
在master2上执行
yarn-daemon.sh start resourcemanager
验证:
通过浏览器访问
http://master2:8188
hadoop fs -put /home/hadoop/hadoop/etc/hadoop/core-site.xml/tmp
查看:
hadoop fs –ls /tmp
执行wordcount
hadoop jar/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jarwordcount /tmp/core-site.xml /tmp/out
stop-yarn.sh
/home/hadoop/hadoop/sbin/hadoop-daemon.shstart journalnode
在master2机器上:
/home/hadoop/hadoop/sbin/hadoop-daemon.sh startjournalnode
在master2ha机器上:
/home/hadoop/hadoop/sbin/hadoop-daemon.sh startjournalnode
start-yarn.sh
1. 安装jdk
每台机器都安装2. 安装ssh
每台机器都安装3. 安装zookeeper
随便找奇数台机器上安装。我的是在:master1:2181,master1ha:2181,master2:2181
4. 配置host、机器规划
机器规划:master1: 192.168.56.151 (active namenode,RM) master1-ha: 192.168.56.152 (standby namenode,jn,zk) master2: 192.168.56.153 (active namenode,jn,RM,zk) master2-ha: 192.168.56.154 (standby namenode,jn,zk) slave1: 192.168.56.155 (datanode,nodemanager) slave2: 192.168.56.156 (datanode,nodemanager) slave3: 192.168.56.157 (datanode,nodemanager) |
192.168.56.151 master1 192.168.56.152 master1ha 192.168.56.153 master2 192.168.56.154 master2ha 192.168.56.155 slave1 192.168.56.156 slave2 192.168.56.157 slave3 |
5. 安装hadoop2
5.1. 上传
用工具上传hadoop2安装文件或者命令
su – hadoop
cd /home/hadoop
rz -y
5.2. 解压
tar -zxvf hadoop-2.6.0.tar.gz5.3. 重命名
mv hadoop-2.2.0 hadoop5.4. 配置环境变量
su – rootvi /etc/profile
添加配置
export HADOOP_HOME=/home/hadoop/hadoop export PATH=$PATH:$HADOOP_HOME/sbin |
su - hadoop
5.5. 配置hadoop2配置文件
将配置文件上传到/home/hadoop/hadoop/etc/hadoop下5.5.1. core-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!-- federation的配置,相当于视图 --> <property> <name>fs.defaultFS</name> <value>viewfs:///</value> </property> <!-- 第一套namenode集群 --> <property> <name>fs.viewfs.mounttable.default.link./tmp</name> <value>hdfs://hadoop-cluster1/tmp</value> </property> <!-- 第二套namenode集群 --> <property> <name>fs.viewfs.mounttable.default.link./tmp1</name> <value>hdfs://hadoop-cluster2/tmp1</value> </property> </configuration> |
5.5.2. hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!-- 两套namenode集群的名字 --> <property> <name>dfs.nameservices</name> <value>hadoop-cluster1,hadoop-cluster2</value> </property> <!-- 第一套nemenode集群的两台机器 --> <property> <name>dfs.ha.namenodes.hadoop-cluster1</name> <value>nn1,nn2</value> </property> <!-- 第一套namenode的主的数据传输地址 --> <property> <name>dfs.namenode.rpc-address.hadoop-cluster1.nn1</name> <value>master1:9000</value> </property> <!-- 第一套namenode的备的数据传输地址 --> <property> <name>dfs.namenode.rpc-address.hadoop-cluster1.nn2</name> <value>master1ha:9000</value> </property> <!-- 第一套namenode的主的WEB地址 --> <property> <name>dfs.namenode.http-address.hadoop-cluster1.nn1</name> <value>master1:50070</value> </property> <!-- 第一套namenode的备的WEB地址 --> <property> <name>dfs.namenode.http-address.hadoop-cluster1.nn2</name> <value>master1ha:50070</value> </property> <!-- 第一套secondarynamenode的主的http地址 --> <property> <name>dfs.namenode.secondary.http-address.hadoop-cluster1.nn1</name> <value>master1:9001</value> </property> <!-- 第一套secondarynamenode的备的http地址 --> <property> <name>dfs.namenode.secondary.http-address.hadoop-cluster1.nn2</name> <value>master1ha:9001</value> </property> <!-- 第一套namenode的主备切换实现类 --> <property> <name>dfs.client.failover.proxy.provider.hadoop-cluster1</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <!-- 第二套secondarynamenode的配置和第一套一样 --> <property> <name>dfs.ha.namenodes.hadoop-cluster2</name> <value>nn3,nn4</value> </property> <property> <name>dfs.namenode.rpc-address.hadoop-cluster2.nn3</name> <value>master2:9000</value> </property> <property> <name>dfs.namenode.rpc-address.hadoop-cluster2.nn4</name> <value>master2ha:9000</value> </property> <property> <name>dfs.namenode.http-address.hadoop-cluster2.nn3</name> <value>master2:50070</value> </property> <property> <name>dfs.namenode.http-address.hadoop-cluster2.nn4</name> <value>master2ha:50070</value> </property> <property> <name>dfs.namenode.secondary.http-address.hadoop-cluster2.nn3</name> <value>master2:9001</value> </property> <property> <name>dfs.namenode.secondary.http-address.hadoop-cluster2.nn4</name> <value>master2ha:9001</value> </property> <property> <name>dfs.client.failover.proxy.provider.hadoop-cluster2</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <!-- namenode的本地文件夹 --> <property> <name>dfs.namenode.name.dir</name> <value>/home/hadoop/hadoop/namedir</value> </property> <!-- 第一套namenode的主的journal文件位置 --> <property> <name>dfs.namenode.shared.edits.dir.hadoop-cluster1.nn1</name> <value>qjournal://master1ha:8485;master2:8485;master2ha:8485/cluster1</value> </property> <!-- 第一套namenode的备的journal文件位置,合主相同 --> <property> <name>dfs.namenode.shared.edits.dir.hadoop-cluster1.nn2</name> <value>qjournal://master1ha:8485;master2:8485;master2ha:8485/cluster1</value> </property> <!-- 第二套namenode的主的journal文件位置 --> <property> <name>dfs.namenode.shared.edits.dir.hadoop-cluster2.nn3</name> <value>qjournal://master1ha:8485;master2:8485;master2ha:8485/cluster2</value> </property> <!-- 第二套namenode的备的journal文件位置,合主相同 --> <property> <name>dfs.namenode.shared.edits.dir.hadoop-cluster2.nn4</name> <value>qjournal://master1ha:8485;master2:8485;master2ha:8485/cluster2</value> </property> <!-- 数据存放的文件夹 --> <property> <name>dfs.datanode.data.dir</name> <value>/home/hadoop/hadoop/datadir</value> </property> <!-- zookeeper地址 --> <property> <name>ha.zookeeper.quorum</name> <value>master1:2181,master1ha:2181,master2:2181</value> </property> <!-- ssh采用的方法 --> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <!-- zookeeper超时 --> <property> <name>ha.zookeeper.session-timeout.ms</name> <value>5000</value> </property> <!-- 是否namenode主备自动切换 --> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <!-- journalnode文件夹 --> <property> <name>dfs.journalnode.edits.dir</name> <value>/home/hadoop/hadoop/jndir</value> </property> <!-- 备份数 --> <property> <name>dfs.replication</name> <value>2</value> </property> <!-- 权限 --> <property> <name>dfs.permission</name> <value>false</value> </property> <!-- 是否允许web页面访问hdfs --> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <!-- 是否支持追加 --> <property> <name>dfs.support.append</name> <value>true</value> </property> <!-- 临时文件夹 --> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/hadoop/tmp</value> </property> <!-- hadoop代理用户配置 --> <property> <name>hadoop.proxyuser.hduser.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hduser.groups</name> <value>*</value> </property> <!-- ssh私钥位置 --> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/hadoop/.ssh/id_rsa</value> </property> </configuration> |
5.5.3. mapred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!--mapreduce运行平台的名字,mapreduce运行时,需要将之设置为yarn --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <!--mapreduce job history server的IPC传输地址 --> <property> <name>mapreduce.jobhistory.address</name> <value>master1:10020</value> </property> <!--mapreduce job history server的WEB地址 --> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master1:19888</value> </property> <!--mapreduce 用于存放数据的文件夹 --> <property> <name>mapred.system.dir</name> <value>/home/hadoop/hadoop/hadoopmrsys</value> <final>true</final> </property> <!--mapreduce 用于存放数据的文件夹--> <property> <name>mapred.local.dir</name> <value>/home/hadoop/hadoop/hadoopmrlocal</value> <final>true</final> </property> </configuration> |
5.5.4. yarn-site.xml
<?xml version="1.0"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <configuration> <!--rm失联后重新链接的时间--> <property> <name>yarn.resourcemanager.connect.retry-interval.ms</name> <value>2000</value> </property> <!--开启resource manager HA,默认为false--> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <!--配置resource manager --> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <!-- zookeeper地址 --> <property> <name>ha.zookeeper.quorum</name> <value>master1:2181,master1ha:2181,master2:2181</value> </property> <!--开启故障自动切换--> <property> <name>yarn.resourcemanager.ha.automatic-failover.enabled</name> <value>true</value> </property> <!-- rm1的hostname --> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>master1</value> </property> <!-- rm2的hostname --> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>master2</value> </property> <!--在master1上配置rm1,在master2上配置rm2,注意:一般都喜欢把配置好的文件远程复制到其它机器上,但这个在YARN的另一个机器上一定要修改--> <property> <name>yarn.resourcemanager.ha.id</name> <value>rm1</value> <description>If we want to launch more than one RM in single node, we need this configuration</description> </property> <!--开启自动恢复功能--> <property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> <!--配置与zookeeper的连接地址--> <property> <name>yarn.resourcemanager.zk-state-store.address</name> <value>master1:2181,master1ha:2181,master2:2181</value> </property> <property> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> </property> <property> <name>yarn.resourcemanager.zk-address</name> <value>master1:2181,master1ha:2181,master2:2181</value> </property> <property> <name>yarn.resourcemanager.cluster-id</name> <value>hadoop-cluster1-yarn</value> </property> <!--schelduler失联等待连接时间--> <property> <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name> <value>5000</value> </property> <!--配置rm1--> <!--rm1对外的IPC传输地址--> <property> <name>yarn.resourcemanager.address.rm1</name> <value>master1:8132</value> </property> <!--rm1调度器对外的IPC传输地址--> <property> <name>yarn.resourcemanager.scheduler.address.rm1</name> <value>master1:8130</value> </property> <!--rm1对外的WEB访问地址--> <property> <name>yarn.resourcemanager.webapp.address.rm1</name> <value>master1:8188</value> </property> <!--rm1的resource-tracker对外的IPC传输地址--> <property> <name>yarn.resourcemanager.resource-tracker.address.rm1</name> <value>master1:8131</value> </property> <!--rm1的Admin对外的IPC传输地址--> <property> <name>yarn.resourcemanager.admin.address.rm1</name> <value>master1:8033</value> </property> <!--rm1的Admin的ha对外的IPC传输地址--> <property> <name>yarn.resourcemanager.ha.admin.address.rm1</name> <value>master1:23142</value> </property> <!--配置rm2,同rm1--> <property> <name>yarn.resourcemanager.address.rm2</name> <value>master2:8132</value> </property> <property> <name>yarn.resourcemanager.scheduler.address.rm2</name> <value>master2:8130</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm2</name> <value>master2:8188</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address.rm2</name> <value>master2:8131</value> </property> <property> <name>yarn.resourcemanager.admin.address.rm2</name> <value>master2:8033</value> </property> <property> <name>yarn.resourcemanager.ha.admin.address.rm2</name> <value>master2:23142</value> </property> <!--附属服务名称,如果使用mapreduce,需将只配置为mapreduce_shuffle--> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!--mapreduce_shuffle的Handler类--> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <!--nodemanager存放临时文件的本地目录--> <property> <name>yarn.nodemanager.local-dirs</name> <value>/home/hadoop2/hadoop2/nodemanagerlocal</value> </property> <!--nodemanager存放日志的本地目录--> <property> <name>yarn.nodemanager.log-dirs</name> <value>/home/hadoop2/hadoop2/nodemanagerlogs</value> </property> <property> <name>mapreduce.shuffle.port</name> <value>23080</value> </property> <!--故障处理类--> <property> <name>yarn.client.failover-proxy-provider</name> <value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value> </property> <property> <name>yarn.resourcemanager.ha.automatic-failover.zk-base-path</name> <value>/yarn-leader-election</value> </property> </configuration> |
5.5.5. slaves
slave1 slave2 slave3 |
5.5.6. hadoop-env.sh
# The java implementation to use.,wilson:配置jdk环境变量 export JAVA_HOME=/usr/jdk # The jsvc implementation to use. Jsvc is required to run secure datanodes. #export JSVC_HOME=${JSVC_HOME} export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"} # Extra Java CLASSPATH elements. Automatically insert capacity-scheduler. for f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar; do if [ "$HADOOP_CLASSPATH" ]; then export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f else export HADOOP_CLASSPATH=$f fi done # The maximum amount of heap to use, in MB. Default is 1000. #export HADOOP_HEAPSIZE= #export HADOOP_NAMENODE_INIT_HEAPSIZE="" # Extra Java runtime options. Empty by default. export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true" # Command specific options appended to HADOOP_OPTS when specified export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS" export HADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS" export HADOOP_SECONDARYNAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_SECONDARYNAMENODE_OPTS" # The following applies to multiple commands (fs, dfs, fsck, distcp etc) export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS" #HADOOP_JAVA_PLATFORM_OPTS="-XX:-UsePerfData $HADOOP_JAVA_PLATFORM_OPTS" # On secure datanodes, user to run the datanode as after dropping privileges export HADOOP_SECURE_DN_USER=${HADOOP_SECURE_DN_USER} # Where log files are stored. $HADOOP_HOME/logs by default. #export HADOOP_LOG_DIR=${HADOOP_LOG_DIR}/$USER # Where log files are stored in the secure data environment. export HADOOP_SECURE_DN_LOG_DIR=${HADOOP_LOG_DIR}/${HADOOP_HDFS_USER} # The directory where pid files are stored. /tmp by default. # NOTE: this should be set to a directory that can only be written to by # the user that will run the hadoop daemons. Otherwise there is the # potential for a symlink attack. export HADOOP_PID_DIR=${HADOOP_PID_DIR} export HADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR} # A string representing this instance of hadoop. $USER by default. export HADOOP_IDENT_STRING=$USER |
5.5.7. yarn-env.sh
# User for YARN daemons export HADOOP_YARN_USER=${HADOOP_YARN_USER:-yarn} # resolve links - $0 may be a softlink export YARN_CONF_DIR="${YARN_CONF_DIR:-$HADOOP_YARN_HOME/conf}" # some Java parameters:wilson,jdk环境变量 export JAVA_HOME=/usr/jdk if [ "$JAVA_HOME" != "" ]; then #echo "run java in $JAVA_HOME" JAVA_HOME=$JAVA_HOME fi if [ "$JAVA_HOME" = "" ]; then echo "Error: JAVA_HOME is not set." exit 1 fi JAVA=$JAVA_HOME/bin/java JAVA_HEAP_MAX=-Xmx1000m # For setting YARN specific HEAP sizes please use this # Parameter and set appropriately # YARN_HEAPSIZE=1000 # check envvars which might override default args if [ "$YARN_HEAPSIZE" != "" ]; then JAVA_HEAP_MAX="-Xmx""$YARN_HEAPSIZE""m" fi # Resource Manager specific parameters # Specify the max Heapsize for the ResourceManager using a numerical value # in the scale of MB. For example, to specify an jvm option of -Xmx1000m, set # the value to 1000. # This value will be overridden by an Xmx setting specified in either YARN_OPTS # and/or YARN_RESOURCEMANAGER_OPTS. # If not specified, the default value will be picked from either YARN_HEAPMAX # or JAVA_HEAP_MAX with YARN_HEAPMAX as the preferred option of the two. #export YARN_RESOURCEMANAGER_HEAPSIZE=1000 # Specify the JVM options to be used when starting the ResourceManager. # These options will be appended to the options specified as YARN_OPTS # and therefore may override any similar flags set in YARN_OPTS #export YARN_RESOURCEMANAGER_OPTS= # Node Manager specific parameters # Specify the max Heapsize for the NodeManager using a numerical value # in the scale of MB. For example, to specify an jvm option of -Xmx1000m, set # the value to 1000. # This value will be overridden by an Xmx setting specified in either YARN_OPTS # and/or YARN_NODEMANAGER_OPTS. # If not specified, the default value will be picked from either YARN_HEAPMAX # or JAVA_HEAP_MAX with YARN_HEAPMAX as the preferred option of the two. #export YARN_NODEMANAGER_HEAPSIZE=1000 # Specify the JVM options to be used when starting the NodeManager. # These options will be appended to the options specified as YARN_OPTS # and therefore may override any similar flags set in YARN_OPTS #export YARN_NODEMANAGER_OPTS= # so that filenames w/ spaces are handled correctly in loops below IFS= # default log directory & file if [ "$YARN_LOG_DIR" = "" ]; then YARN_LOG_DIR="$HADOOP_YARN_HOME/logs" fi if [ "$YARN_LOGFILE" = "" ]; then YARN_LOGFILE='yarn.log' fi # default policy file for service-level authorization if [ "$YARN_POLICYFILE" = "" ]; then YARN_POLICYFILE="hadoop-policy.xml" fi # restore ordinary behaviour unset IFS YARN_OPTS="$YARN_OPTS -Dhadoop.log.dir=$YARN_LOG_DIR" YARN_OPTS="$YARN_OPTS -Dyarn.log.dir=$YARN_LOG_DIR" YARN_OPTS="$YARN_OPTS -Dhadoop.log.file=$YARN_LOGFILE" YARN_OPTS="$YARN_OPTS -Dyarn.log.file=$YARN_LOGFILE" YARN_OPTS="$YARN_OPTS -Dyarn.home.dir=$YARN_COMMON_HOME" YARN_OPTS="$YARN_OPTS -Dyarn.id.str=$YARN_IDENT_STRING" YARN_OPTS="$YARN_OPTS -Dhadoop.root.logger=${YARN_ROOT_LOGGER:-INFO,console}" YARN_OPTS="$YARN_OPTS -Dyarn.root.logger=${YARN_ROOT_LOGGER:-INFO,console}" if [ "x$JAVA_LIBRARY_PATH" != "x" ]; then YARN_OPTS="$YARN_OPTS -Djava.library.path=$JAVA_LIBRARY_PATH" fi YARN_OPTS="$YARN_OPTS -Dyarn.policy.file=$YARN_POLICYFILE" |
5.5.8. 创建文件夹
mkdir -m 755 namedir mkdir -m 755 datadir mkdir -m 755 tmp mkdir -m 755 jndir mkdir -m 755 hadoopmrsys mkdir -m 755 hadoopmrlocal mkdir -m 755 nodemanagerlocal mkdir -m 755 nodemanagerlogs |
5.6. 发送到其他节点
scp-r /home/hadoop/hadoop hadoop@192.168.56.152:/home/hadoop/hadoopscp-r /home/hadoop/hadoop hadoop@192.168.56.153:/home/hadoop/hadoop
scp-r /home/hadoop/hadoop hadoop@192.168.56.154:/home/hadoop/hadoop
scp-r /home/hadoop/hadoop hadoop@192.168.56.155:/home/hadoop/hadoop
scp-r /home/hadoop/hadoop hadoop@192.168.56.156:/home/hadoop/hadoop
scp-r /home/hadoop/hadoop hadoop@192.168.56.157:/home/hadoop/hadoop
5.7. 修改rm2的yarn-site.xml文件
<?xml version="1.0"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <configuration> <!--rm失联后重新链接的时间--> <property> <name>yarn.resourcemanager.connect.retry-interval.ms</name> <value>2000</value> </property> <!--开启resource manager HA,默认为false--> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <!--配置resource manager --> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <!-- zookeeper地址 --> <property> <name>ha.zookeeper.quorum</name> <value>master1:2181,master1ha:2181,master2:2181</value> </property> <!--开启故障自动切换--> <property> <name>yarn.resourcemanager.ha.automatic-failover.enabled</name> <value>true</value> </property> <!-- rm1的hostname --> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>master1</value> </property> <!-- rm2的hostname --> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>master2</value> </property> <!--在master1上配置rm1,在master2上配置rm2,注意:一般都喜欢把配置好的文件远程复制到其它机器上,但这个在YARN的另一个机器上一定要修改--> <property> <name>yarn.resourcemanager.ha.id</name> <value>rm2</value> <description>If we want to launch more than one RM in single node, we need this configuration</description> </property> <!--开启自动恢复功能--> <property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> <!--配置与zookeeper的连接地址--> <property> <name>yarn.resourcemanager.zk-state-store.address</name> <value>master1:2181,master1ha:2181,master2:2181</value> </property> <property> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> </property> <property> <name>yarn.resourcemanager.zk-address</name> <value>master1:2181,master1ha:2181,master2:2181</value> </property> <property> <name>yarn.resourcemanager.cluster-id</name> <value>hadoop-cluster1-yarn</value> </property> <!--schelduler失联等待连接时间--> <property> <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name> <value>5000</value> </property> <!--配置rm1--> <!--rm1对外的IPC传输地址--> <property> <name>yarn.resourcemanager.address.rm1</name> <value>master1:8132</value> </property> <!--rm1调度器对外的IPC传输地址--> <property> <name>yarn.resourcemanager.scheduler.address.rm1</name> <value>master1:8130</value> </property> <!--rm1对外的WEB访问地址--> <property> <name>yarn.resourcemanager.webapp.address.rm1</name> <value>master1:8188</value> </property> <!--rm1的resource-tracker对外的IPC传输地址--> <property> <name>yarn.resourcemanager.resource-tracker.address.rm1</name> <value>master1:8131</value> </property> <!--rm1的Admin对外的IPC传输地址--> <property> <name>yarn.resourcemanager.admin.address.rm1</name> <value>master1:8033</value> </property> <!--rm1的Admin的ha对外的IPC传输地址--> <property> <name>yarn.resourcemanager.ha.admin.address.rm1</name> <value>master1:23142</value> </property> <!--配置rm2,同rm1--> <property> <name>yarn.resourcemanager.address.rm2</name> <value>master2:8132</value> </property> <property> <name>yarn.resourcemanager.scheduler.address.rm2</name> <value>master2:8130</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm2</name> <value>master2:8188</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address.rm2</name> <value>master2:8131</value> </property> <property> <name>yarn.resourcemanager.admin.address.rm2</name> <value>master2:8033</value> </property> <property> <name>yarn.resourcemanager.ha.admin.address.rm2</name> <value>master2:23142</value> </property> <!--附属服务名称,如果使用mapreduce,需将只配置为mapreduce_shuffle--> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!--mapreduce_shuffle的Handler类--> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <!--nodemanager存放临时文件的本地目录--> <property> <name>yarn.nodemanager.local-dirs</name> <value>/home/hadoop2/hadoop2/nodemanagerlocal</value> </property> <!--nodemanager存放日志的本地目录--> <property> <name>yarn.nodemanager.log-dirs</name> <value>/home/hadoop2/hadoop2/nodemanagerlogs</value> </property> <property> <name>mapreduce.shuffle.port</name> <value>23080</value> </property> <!--故障处理类--> <property> <name>yarn.client.failover-proxy-provider</name> <value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value> </property> <property> <name>yarn.resourcemanager.ha.automatic-failover.zk-base-path</name> <value>/yarn-leader-election</value> </property> </configuration> |
5.8. 初始化配置
5.8.1. 启动zookeeper集群
在2n-1台机器上启动启动:zkServer.sh start
查看:
进程:jps
状态:zkServer.shstatus
5.8.2. 格式化ZooKeeper集群,目的是在ZooKeeper集群上建立HA的相应节点。
在第一个namenode集群的主(master1)上执行命令:/home/hadoop/hadoop/bin/hdfs zkfc –formatZK
在第二个namenode集群的主(master2)上执行命令:
/home/hadoop/hadoop/bin/hdfs zkfc -formatZK
5.8.3. 启动JournalNode集群
在安装了journal的机器上启动journalNode集群,例如我安装在master1ha、master2、master2ha这三台机器上,所以在这三台机器上启动。启动:
在master1ha机器上:
/home/hadoop/hadoop/sbin/hadoop-daemon.sh startjournalnode
在master2机器上:
/home/hadoop/hadoop/sbin/hadoop-daemon.sh startjournalnode
在master2ha机器上:
/home/hadoop/hadoop/sbin/hadoop-daemon.sh startjournalnode
查看进程:
jps
5.8.4. 格式化master1的namenode
在master1节点上执行,clusterId为这个集群的id/home/hadoop/hadoop/bin/hdfs namenode -format -clusterIdhellokitty
5.8.5. 启动master1上的namenode
/home/hadoop/hadoop/sbin/hadoop-daemon.sh startnamenode查看:
jps
5.8.6. 在master1ha上执行,将master1上的namenode数据同步到master1ha上
/home/hadoop/hadoop/bin/hdfsnamenode -bootstrapStandby5.8.7. 启动master1ha的namenode
/home/hadoop/hadoop/sbin/hadoop-daemon.shstart namenode5.8.8. 将master1的namenode置成active状态
在master1上执行:/home/hadoop/hadoop/sbin/hadoop-daemon.shstart zkfc
在master1ha上执行:
/home/hadoop/hadoop/sbin/hadoop-daemon.shstart zkfc
5.8.9. 格式化集群2上的master2的namenode
/home/hadoop/hadoop/bin/hdfs namenode-format -clusterId hellokitty5.8.10. 启动master2的namenode
/home/hadoop/hadoop/sbin/hadoop-daemon.shstart namenode5.8.11. 在master2ha上执行,将master2的namenode数据同步到master2ha上
/home/hadoop/hadoop/bin/hdfsnamenode -bootstrapStandby5.8.12. 启动master2ha上的namenode
/home/hadoop/hadoop/sbin/hadoop-daemon.shstart namenode5.8.13. 将master2置成active状态
在master2上执行:hadoop-daemon.sh start zkfc
在master2ha上执行:
hadoop-daemon.sh start zkfc
5.8.14. 启动所有的datanode
在三台机器上分别执行:hadoop-daemon.sh start datanode
5.8.15. 启动yarn
在master1上执行start-yarn.sh
验证:
通过浏览器访问
http://master1:8188
在master2上执行
yarn-daemon.sh start resourcemanager
验证:
通过浏览器访问
http://master2:8188
5.8.16. 验证yarn
上传文件:hadoop fs -put /home/hadoop/hadoop/etc/hadoop/core-site.xml/tmp
查看:
hadoop fs –ls /tmp
执行wordcount
hadoop jar/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jarwordcount /tmp/core-site.xml /tmp/out
5.9. 停止
5.9.1. 在master1上执行:
stop-dfs.shstop-yarn.sh
5.9.2. 在master2上执行:
stop-yarn.sh5.9.3. 停止zookeeper
zkServer.sh stop5.10. 启动
5.10.1. 启动zookeeper
zkServer.sh start5.10.2. 启动journalnode
在master1ha机器上:/home/hadoop/hadoop/sbin/hadoop-daemon.shstart journalnode
在master2机器上:
/home/hadoop/hadoop/sbin/hadoop-daemon.sh startjournalnode
在master2ha机器上:
/home/hadoop/hadoop/sbin/hadoop-daemon.sh startjournalnode
5.10.3. 在master1上执行:
start-dfs.shstart-yarn.sh
5.10.4. 在master2上执行:
yarn-daemon.sh start resourcemanager5.11. 监控
5.11.1. 监控namenode主从
http://master1:500705.11.2. 监控mapreduce任务
http://master1:8188相关文章推荐
- sqoop安装
- 1.4、hadoop安装文档(ok)
- 1.3、centos配置静态ip(ok)
- 1.1.1、 linux防火墙
- Shell Script学习笔记1:循环表达(for, while, until)
- linux的简单使用和软件安装
- hadoop优化
- Linux RPM包安装总结
- CentOS及WLAN驱动的安装
- linux中free命令详解
- wget命令
- Centos更改为163源
- OpenCV初体验
- XAMPP Apache + MySQL + PHP + Perl
- shell 生成指定范围随机数与随机字符串
- OpenGL 缺省视景体是中心在原点,边长为2的立方体
- spring boot实战之内嵌容器tomcat配置
- openwrt路由在中继模式下掉线检测重启脚本
- 网站关键词优化--如何确6定目标关键词
- LFS-构建自己的linux