您的位置:首页 > 运维架构

Ubuntu14.04集群分布式Hadoop-2.6.0系统安装

2015-05-16 19:41 465 查看
一、安装VIM

sudo apt-get install vim

二、更改主机名(Master, Slave1, Slave2...)

sudo vim /etc/hostname

三、更改主机解析地址

sudo vim /etc/hosts
127.0.0.1       localhost
113.55.112.52    Master
113.55.112.7      Slave1
113.55.112.44    Slave2

四、创建hadoop用户组和hadoop用户

1.新建hadoop用户组

sudo addgroup hadoop-group
2.新建hadoop用户,并加入hadoop-group
sudo adduser --ingroup hadoop-group hadoop
3.给用户hadoop赋予和root一样的权限
sudo vim /etc/sudoers
root   ALL=(ALL:ALL) ALL
hadoop ALL=(ALL:ALL) ALL
五、安装JDK
1.复制jdk到安装目录

1-1.在/usr/local下新建java目录

cd /usr/local
sudo mkdir java
1-2.将jdk-8u40-linux-i586.tar.gz文件解压至目标文件夹
sudo tar -xzvf jdk-8u40-linux-i586.tar.gz -C /usr/local/java

2.配置环境变量

2-1.打开/etc/profile文件

sudo vim /etc/profile

注意:/etc/profile为全局配置文件,配置后对所有用户生效;~/.bashrc为用户局部配置文件,只对当前用户生效。

2-2.添加如下变量

# /etc/profile: system-wide .profile file for the Bourne shell (sh(1))
# and Bourne compatible shells (bash(1), ksh(1), ash(1), ...).
#set java environment
export JAVA_HOME=/usr/local/java/jdk1.8.0_40
export JRE_HOME=/usr/local/java/jdk1.8.0_40/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$JAVA_HOME:$PATH
2-3.使该文件生效
source /etc/profile
3.检查是否安装成功
java -version

六、安装ssh

1.更新apt-get

sudo apt-get update

2.安装openssh-server服务

sudo apt-get install openssh-server
3.建立ssh无密码登录本机
3-1.创建ssh-key

ssh-keygen -t rsa -P ""
3-2.对于Master,进入~/.ssh/目录下,将id_rsa.pub追加到authorized_keys授权文件中
cd ~/.ssh
cat id_rsa.pub >> authorized_keys 或
cp id_rsa.pub authorized_keys

3-3.对于Slave,将Master的authorized_keys复制到~/.ssh/目录下,同时做以下设置

chmod 600 authorized_key   //改变文件的权限为600
chgrp hadoop-group authorized_keys  //改变文件所属用户组
chown hadoop authorized_keys  //改变文件所属用户
3-4.关闭防火墙后重启,安装gufw后将profile的home,office,public防火墙关闭,还要把首选项的ufw logging off

sudo apt-get install gufw
sudo gufw


3-4.登录localhost

ssh localhost
3-5.执行退出命令
exit

3-6.登录Slave

ssh Slave1
3-7.执行退出命令
exit


七、安装Hadoop

1.解压到(根)~目录下,并将名字重命名为hadoop

sudo tar -xzvf hadoop-2.6.0.tar.gz -C ~/
sudo mv hadoop-2.6.0 hadoop
2.修改Hadoop配置文件,进入${HADOOP_HOME}/etc/hadoop/目录
2-1.在hadoop-env.sh中修改Java安装目录

export JAVA_HOME=/usr/local/java/jdk1.8.0_40
2-2.修改core-site.xml,添加如下内容
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://Master:9000</value>
    </property>
    <property>
        <name>io.file.buffer.size</name>
        <value>131072</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>file:/home/hadoop/tmp</value>
        <description>Abase for other temporary directories.</description>
    </property>
    <property>
        <name>hadoop.proxyuser.hduser.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.hduser.groups</name>
        <value>*</value>
    </property>
</configuration>

2-3.修改hdfs-site.xml,添加以下内容

<configuration>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>Master:9001</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/home/hadoop/dfs/name</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/home/hadoop/dfs/data</value>
    </property>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
    </property>
</configuration>


2-4.将mapred-site.xml.template重命名为mapred-site.xml,并添加如下内容
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>Master:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>Master:19888</value>
    </property>
</configuration>
2-5.修改yarn-site.xml,添加以下内容
<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>Master:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>Master:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>Master:8031</value>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>Master:8033</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>Master:8088</value>
    </property>
</configuration>

3.为了方便使用hadoop命令或者start-all.sh等命令,在所有节点上/etc/profile 新增以下内容:

export HADOOP_HOME=/home/hadoop/hadoop
export PATH=$PATH:$HADOOP_HOME/bin


4.在slaves文件中添加slave1, slave2

5.向各节点复制hadoop

sudo scp -r /home/hadoop/hadoop Slave1:/home/hadoop/
sudo scp -r /home/hadoop/hadoop Slave2:/home/hadoop/



6.在Master上格式化HDFS

bin/hdfs namenode -format

7.在Master上启动HDFS,该命令执行后,会自动在slave节点主文件夹下生成dfs文件夹

sbin/start-dfs.sh

8.启动YARN

sbin/start-yarn.sh      

9.上述HDFS和YARN启动完成后,在Master上jps命令查看有以下几个进程:

10308 NameNode
10583 SecondaryNameNode
11255 Jps
10971 ResourceManager

10.在slave上jps命令可以查看到以下几个进程

19217 NodeManager
19474 Jps
18869 DataNode


11.如果失败,检查各节点(包括主节点)上dfs/data/current/VERSION文件中clusterID是否相同,不相同要改成一致。如果要重新格式化HDFS,要先将所有节点的dfs文件删除。

八、测试

1.通过web界面输入http://localhost:50070查看Hadoop Administration信息

Overview 'Master:9000' (active)
Started:	Sun May 17 13:54:04 CST 2015
Version:	2.6.0, re3496499ecb8d220fba99dc5ed4c99c8f9e33bb1
Compiled:	2014-11-13T21:10Z by jenkins from (detached from e349649)
Cluster ID:	CID-4cef2df9-1ae9-4409-9486-a71254487f7e
Block Pool ID:	BP-786922400-113.55.112.196-1431841994289
Datanode Information
In operation
Node	Last contact	Admin State	Capacity	Used	Non DFS Used	Remaining	Blocks	Block pool used	Failed Volumes	Version
slave1 (113.55.112.190:50010)	2	In Service	458.23 GB	24 KB	24.13 GB	434.1 GB	0	24 KB (0%)	0	2.6.0
Slave2 (113.55.112.44:50010)	2	In Service	458.23 GB	24 KB	23.96 GB	434.27 GB	0	24 KB (0%)	0	2.6.0
2.查看DataNode信息:bin/hdfs dfsadmin -report

hadoop@Master:~/hadoop$ bin/hdfs dfsadmin -report
Configured Capacity: 984037638144 (916.46 GB)
Present Capacity: 932398043136 (868.36 GB)
DFS Remaining: 932397993984 (868.36 GB)
DFS Used: 49152 (48 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Live datanodes (2):

Name: 113.55.112.190:50010 (slave1)
Hostname: slave1
Decommission Status : Normal
Configured Capacity: 492018819072 (458.23 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 25911414784 (24.13 GB)
DFS Remaining: 466107379712 (434.10 GB)
DFS Used%: 0.00%
DFS Remaining%: 94.73%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun May 17 14:12:42 CST 2015

Name: 113.55.112.44:50010 (Slave2)
Hostname: Slave2
Decommission Status : Normal
Configured Capacity: 492018819072 (458.23 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 25728180224 (23.96 GB)
DFS Remaining: 466290614272 (434.27 GB)
DFS Used%: 0.00%
DFS Remaining%: 94.77%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun May 17 14:12:42 CST 2015

hadoop@Master:~/hadoop$
3.其他DFSadmin命令:

3.1 使Datanode节点datanodename退役

bin/hadoop dfsadmin -decommission datanodename
3.2 将集群置于安全模式
bin/hadoop dfsadmin -safemode enter
3.3 列出所有当前支持的命令
bin/hadoop dfsadmin-help
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: