您的位置:首页 > 运维架构

搭建Hadoop-2.6.0集群

2015-03-15 11:31 295 查看
搭建Hadoop-2.6.0集群

一、硬件配置

四台:IBM System x3650 M4(7915I51)

产品类别:机架式
产品结构:2U

CPU型号:Xeon E5-2650
标配CPU数量:1颗

内存类型:ECC DDR3
内存容量:16GB

硬盘接口类型:SATA/SAS
标配硬盘容量:2TB

详细参数:http://detail.zol.com.cn/331/330619/param.shtml

一台作为master、其余三台作为slaves。

在master上的服务:NameMode、SecondaryNameNode、ResourceManager

在slaves上的服务:DataNode、NodeManager

master和slave1在机架1上,slave2和slave3在机架2上。机架感知,见:配置机架感知

二、集群搭建与配置

1. ssh与cluster shell配置

cluster shell用于在多台机器上执行相同的命令,ssh需要配置为master到slaves节点的无密码登录,用于在master上执行start-dfs.sh、start-yarn.sh等命令。

步骤1:在Master上安装cluster shell,以root用户身份

1) 安装步骤略

2) 配置/etc/clustershell/groups如下:

master: master
slaves: slave[1-3]
hadoop: master @slaves


说明:

master包括master节点的主机名

slaves包括slaves节点的主机名

hadoop组包括集群中所有节点的主机名,包括乐master和slaves节点

3) 修改/etc/hosts文件,将主机名与IP对应起来

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.120	master	master.example.com
192.168.1.121	slave1	slave1.example.com
192.168.1.122	slave2	slave2.example.com
192.168.1.123	slave3	slave3.example.com


步骤2:在hadoop组上root用户的ssh无密码登录

1) 产生密钥对

[root@master ~]# ssh-keygen -t rsa

2) 编写expect脚本~/bin/copy_id.exp,用于以交互方式执行ssh-copy-id命令

#!/usr/bin/expect
set node [lindex $argv 0]
spawn ssh-copy-id root@$node
expect {
"Are you sure you want to continue connecting (yes/no)?" { send "yes\n"; exp_continue }
"*password:" { send "redhat\n" }
}

expect eof
exit


2) 编写shell脚本cluster_copy_id.sh,用于对每个节点执行copy_id.exp

#!/bin/bash
cat /root/bin/hadoop.txt | while read node
do
    echo 'starting copy id to '${node}
    expect copy_id.exp $node
    echo 'finishing copy id to '${node}
done


其中,hadoop.txt包含所有主机名

master
slave1
slave2
slave3


3) 执行shell脚本

[root@master ~]# chmod a+x bin/cluster_copy_id.sh

[root@master ~]# bin/cluster_copy_id.sh

步骤2之后,root用户就能在集群中使用clush命令执行相同命令了!注意-g选项指定的是在哪些组上执行命令,配置已经在/etc/clustershell/groups中,上面已经有所说明。

步骤3:在“hadoop组”上创建hadoop用户

1) [root@master ~]# clush -g hadoop useradd hadoop

2) 编写shell脚本:bin/cluster_passwd_id.sh,用于修改hadoop用户的密码(是因为不能通过clush不支持交互的方式修改密码,所以通过编写脚本,在每个节点上循环执行修改密码操作)

#!/bin/bash
cat /root/bin/hadoop.txt | while read node
do
echo 'starting change passwd to '${node}
expect passwd.exp $node
echo 'finishing change passwd to '${node}
done


3) 编写expect脚本:bin/passwd.exp,交互方式修改密码

#!/usr/bin/expect
set node [lindex $argv 0]
spawn ssh root@$node passwd hadoop
expect "新的 密码:"
send "hadoop\n"
expect "重新输入新的 密码:"
send "hadoop\n"
expect eof
exit


4) 执行shell脚本,更改hadoop用户的密码

[root@master ~]# chmow a+x bin/cluster_passwd_id.sh

[root@master ~]# bin/cluster_passwd_id.sh

5) 配置hadoop用户的sudo,然后执行需要root权限的命令时,就不需要在切换用户了,而是使用sudo COMMAND的方式。

修改/etc/sudoers,添加:

hadoop	ALL=(ALL)	ALL


步骤3之后,所有节点上都有了一个hadoop用户,并且设置乐相同的密码,而且有sudo权限。

步骤4: 在hadoop组上hadoop用户的ssh无密码登录

切换到hadoop用户后,参考步骤2的方式,只是用户名和密码不同而已,这里就不重复说乐。

2. 安装Hadoop

步骤1:挂载NFS共享目录(192.168.1.113不在集群中,提供NFS服务),用于下载需要的软件包等

[hadoop@master ~] sudo clush -g hadoop mkdir /mnt/hadoop-nfs

[root@master ~] sudo clush -g hadoop mount -t nfs 192.168.1.113:/home/wangsch/download /mnt/hadoop-nfs/

步骤2:安装Java

1) 在hadoop组安装

[hadoop@master ~] sudo clush -g hadoop tar -xzf /mnt/hadoop-nfs/jdk-7u75-linux-i586.tar.gz -C /opt/

[hadoop@master ~] sudo clush -g hadoop chown -R hadoop:hadoop /opt/jdk1.7.0_75/

[hadoop@master ~] sudo clush -g slaves yum remove -y java-1.6.0-openjdk*

步骤3:安装配置hadoop

1) 解压

[hadoop@master ~] sudo tar -xvzf /mnt/hadoop-nfs/hadoop-2.6.0.tar.gz -C /opt

[hadoop@master ~] chown -R hadoop:hadoop /opt/hadoop-2.6.0

2) 拷贝hosts文件到slaves组

[hadoop@master ~] sudo clush -g slaves --copy /etc/hosts

3) 配置slaves文件,编辑/opt/hadoop-2.6.0/etc/hadoop/slaves

slave1
slave2
slave3


4) 配置hadoop($HADOOP_HOME/etc/hadoop/hadoop-env.sh)

# 指定JAVA_HOME
export JAVA_HOME=/opt/jdk1.7.0_75

# 修改GC选项
export HADOOP_OPTS="$HADOOP_OPTS -XX:+UseParallelOldGC"

# 修改NameNode和SecondaryNameNode堆内存
export HADOOP_NAMENODE_OPTS="-Xmx2000M $HADOOP_NAMENODE_OPTS"
export HADOOP_SECONDARYNAMENODE_OPTS="-Xmx2000M $HADOOP_SECONDARYNAMENODE_OPTS"

# 修改DataNode堆内存
export HADOOP_DATANODE_OPTS="-Xmx3000M $HADOOP_DATANODE_OPTS"

# 修改pid目录
export HADOOP_PID_DIR=/data/hadoop-pids
export HADOOP_SECURE_DN_PID_DIR=/data/hadoop-pids


5) 配置Yarn($HADOOP_HOME/etc/hadoop/yarn-env.sh)

# 修改ResourceManager堆内存

export YARN_RESOURCEMANAGER_OPTS=-Xmx2000M

# 修改NodeManager堆内存
export YARN_NODEMANAGER_OPTS=-Xmx3000M

# 修改GC选项
YARN_OPTS="$YARN_OPTS -XX:+UseParallelOldGC"

YARN_PID_DIR=/data/hadoop-pids 


6) 配置site

common:/opt/hadoop2.6.0/etc/hadoop/core-site.xml

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/data/tmp</value>
</property>
<property>
<name>topology.script.file.name</name>
<value>/opt/hadoop-2.6.0/rack-aware/rack_aware.py</value>
</property>
</configuration>


创建/opt/hadoop-2.6.0/etc/hadoop/masters文件

master


hdfs:/opt/hadoop2.6.0/etc/hadoop/hdfs-site.xml

<property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/data/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/data/dfs/data</value> </property> <property>
<name>dfs.namenode.checkpoint.dir</name> <value>/data/dfs/namesecondary</value> </property> <property> <name>dfs.blocksize</name> <value>134217728</value> </property> <!-- 配置SecondaryNameNode --> <property> <name>dfs.http.address</name> <value>master:50070</value>
</property> <property> <name>dfs.namenode.secondary.http-address</name> <value>master:50090</value> </property>

mapreduce:/opt/hadoop-2.6.0/etc/hadoop/mapred-site.xml

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>


yarn:/opt/hadoop-2.6.0/etc/hadoop/yarn-site.xml

<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>


7) 配置机架感知

编辑:/opt/hadoop-2.6.0/etc/hadoop/core-site.xml

<property>
<name>topology.script.file.name</name>
<value>/opt/hadoop-2.6.0/rack-aware/rack_aware.py</value>
</property>


编写IP与机架ID的映射

[hadoop@master ~] mkdir /opt/hadoop2.6.0/rack-aware

在rack-aware目录下创建rack_aware.py

#!/usr/bin/python
# -*- coding:utf-8 -*-
import sys

rack = {"slave1":"rack1",
"slave2":"rack2",
"slave3":"rack2",
"192.168.1.121":"rack1",
"192.168.1.122":"rack2",
"192.168.1.123":"rack2"
}

if __name__=="__main__":
print "/" + rack.get(sys.argv[1],"rack0")


[hadoop@master ~] chown hadoop:hadoop /opt/hadoop-2.6.0/rack-aware/rack_aware.py

[hadoop@master ~] chmod a+x /opt/hadoop-2.6.0/rack-aware/rack_aware.py

8) 统一修改环境变量

编辑~/.bash_profile

export JAVA_HOME=/opt/jdk1.7.0_75
export HADOOP_HOME=/opt/hadoop-2.6.0
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

PATH=$PATH:$HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin


分发文件

[hadoop@master ~] sudo clush -g slaves --copy ~/.bash_profile

9) 将hadoop安装到其他节点

[hadoop@master ~] sudo clush -g slaves --copy /opt/hadoop-2.6.0

[hadoop@master ~] sudo clush -g slaves chown -R hadoop:hadoop /opt/hadoop-2.6.0

步骤4:启动Hadoop

1) 格式化NameNode

[hadoop@master ~] cd /opt/hadoop-2.6.0

[hadoop@master hadoop-2.6.0]$ bin/hdfs namenode -format

2) 启动HDFS

[hadoop@master hadoop-2.6.0]$ sbin/start-dfs.sh

3) 启动Yarn

[hadoop@master hadoop-2.6.0]$ sbin/start-yarn.sh

4) 关闭防火墙(解决webUI界面无法查看)

[hadoop@master hadoop-2.6.0]$ sudo clush -g hadoop service iptables stop

[hadoop@master hadoop-2.6.0]$ sudo clush -g hadoop chkconfig iptables off

4) 查看启动状况

[hadoop@master hadoop-2.6.0]$ clush -g hadoop /opt/jdk1.7.0_75/bin/jps | sort

master: 14362 NameNode

master: 14539 SecondaryNameNode

master: 15285 ResourceManager

master: 15585 Jps

slave1: 5469 DataNode

slave1: 5801 NodeManager

slave1: 5932 Jps

slave2: 5005 DataNode

slave2: 5296 NodeManager

slave2: 5427 Jps

slave3: 4889 DataNode

slave3: 5196 NodeManager

slave3: 5327 Jps

5) 查看HDFS

[hadoop@master hadoop-2.6.0]$ hdfs dfsadmin -report

15/03/18 17:21:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Configured Capacity: 61443870720 (57.22 GB)

Present Capacity: 52050423808 (48.48 GB)

DFS Remaining: 52050350080 (48.48 GB)

DFS Used: 73728 (72 KB)

DFS Used%: 0.00%

Under replicated blocks: 0

Blocks with corrupt replicas: 0

Missing blocks: 0

-------------------------------------------------

Live datanodes (3):

Name: 192.168.1.123:50010 (slave3)

Hostname: slave3

Rack: /rack2

Decommission Status : Normal

Configured Capacity: 20481290240 (19.07 GB)

DFS Used: 24576 (24 KB)

Non DFS Used: 3131166720 (2.92 GB)

DFS Remaining: 17350098944 (16.16 GB)

DFS Used%: 0.00%

DFS Remaining%: 84.71%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Wed Mar 18 17:21:15 CST 2015

Name: 192.168.1.122:50010 (slave2)

Hostname: slave2

Rack: /rack2

Decommission Status : Normal

Configured Capacity: 20481290240 (19.07 GB)

DFS Used: 24576 (24 KB)

Non DFS Used: 3131105280 (2.92 GB)

DFS Remaining: 17350160384 (16.16 GB)

DFS Used%: 0.00%

DFS Remaining%: 84.71%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Wed Mar 18 17:21:17 CST 2015

Name: 192.168.1.121:50010 (slave1)

Hostname: slave1

Rack: /rack1

Decommission Status : Normal

Configured Capacity: 20481290240 (19.07 GB)

DFS Used: 24576 (24 KB)

Non DFS Used: 3131174912 (2.92 GB)

DFS Remaining: 17350090752 (16.16 GB)

DFS Used%: 0.00%

DFS Remaining%: 84.71%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Wed Mar 18 17:21:17 CST 2015

-----------------------------------------------------------------------------------------------------------------------------------------------

三、问题记录

1. 格式化时出现:KnownHostException: master.example.com

使用hostname查看,显示master.example.com,master是短名称,同时要配置全名称,所以更新/etc/hosts文件

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.120	master	master.example.com
192.168.1.121	slave1	slave1.example.com
192.168.1.122	slave2	slave2.example.com
192.168.1.123	slave3	slave3.example.com


[hadoop@master hadoop-2.6.0] sudo clush -g slaves --copy /etc/hosts

2. 格式化时出现:org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory is in an inconsistent state: storage directory does not exist or is not accessible.

[hadoop@master hadoop-2.6.0] sudo clush -g hadoop mkdir -p /data/dfs/namesecondary

[hadoop@master hadoop-2.6.0] sudo clush -g hadoop mkdir -p /data/dfs/name

[hadoop@master hadoop-2.6.0] sudo clush -g hadoop mkdir -p /data/dfs/data

[hadoop@master hadoop-2.6.0] sudo clush -g hadoop mkdir -p /data/tmp

[hadoop@master hadoop-2.6.0] sudo clush -g hadoop chown -R hadoop:hadoop /data

3. [root@master hadoop-2.6.0]# sbin/start-dfs.sh

15/03/17 15:58:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Starting namenodes on [Java HotSpot(TM) Client VM warning: You have loaded library /opt/hadoop-2.6.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.

It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.

master]

-c: Unknown cipher type 'cd'

or: ssh: Could not resolve hostname or: Name or service not known

fix: ssh: Could not resolve hostname fix: Name or service not known

have: ssh: Could not resolve hostname have: Name or service not known

with: ssh: Could not resolve hostname with: Name or service not known

with: ssh: Could not resolve hostname with: Name or service not known

VM: ssh: Could not resolve hostname VM: Name or service not known

to: ssh: Could not resolve hostname to: Name or service not known

sed:-e 表达式 #1,字符 6:“s”的未知选项

that: ssh: Could not resolve hostname that: Name or service not known

will: ssh: Could not resolve hostname will: Name or service not known

stack: ssh: Could not resolve hostname stack: Name or service not known

recommended: ssh: Could not resolve hostname recommended: Name or service not known

Java: ssh: Could not resolve hostname Java: Name or service not known

library: ssh: Could not resolve hostname library: Name or service not known

disabled: ssh: Could not resolve hostname disabled: Name or service not known

link: ssh: Could not resolve hostname link: Name or service not known

you: ssh: Could not resolve hostname you: Name or service not known

You: ssh: Could not resolve hostname You: Name or service not known

Client: ssh: Could not resolve hostname Client: Name or service not known

stack: ssh: Could not resolve hostname stack: Name or service not known

guard.: ssh: Could not resolve hostname guard.: Name or service not known

the: ssh: Could not resolve hostname the: Name or service not known

在~/.bash_profile中添加

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

4. SecondaryNameNode启动时要求ssh验证,解决:

1) 创建/opt/hadoop-2.6.0/etc/hadoop/masters文件

master


2) 编辑:hdfs-site.xml

<property>
<name>dfs.http.address</name>
<value>master:50070</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:50090</value>
</property>
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: