您的位置:首页 > 运维架构

Hadoop2.2.0三节点搭建

2014-01-30 19:45 169 查看

Hadoop 2.2下载

  Hadoop2.2.0 我们可以直接从Apache官方网站下载。

  下载地址:http://mirror.esocc.com/apache/hadoop/common/stable/

Hadoop集群环境搭建准备

这里我们搭建一个由三台机器组成的集群:

 

       Ip地址

   用户名   

  主机名

        系统

192.168.1.105  

  hadoop

 master

 Ubuntu  64bit   

192.168.1.120

  hadoop

 slave2    

 CentOS 64bit

192.168.1.104

  hadoop

 slave1

 CentOS 64bit

 

1.通过如下配置master, slave1, slave2。

1)通过vim /etc/hostname修改主机名。

2)通过vim /etc/hosts修改/etc/hosts 文件,增加三台机器的ip和hostname的映射关系。

master,slave1,slave2三台主机安装ssh服务,实现各个节点间的无密码登录!

进入slave1的~/.ssh目录下

$ ssh-keygen
$ chmod 700 ~/.ssh/$ cat id_rsa.pub >> authorized_keys
$ [b]chmod 600 authorized_keys
[/b]$ scp authorized_keys hadoop@slave2:~/.ssh/进入slave2的~/.ssh目录下

$ ssh-keygen
$ chmod 700 ~/.ssh/$ cat id_rsa.pub >> authorized_keys
$ chmod 600 authorized_keys
$ scp authorized_keys hadoop@master:~/.ssh/进入master的~/.ssh目录下

$ ssh-keygen
$ chmod 700 ~/.ssh/$ cat id_rsa.pub >> authorized_keys
$ chmod 600 authorized_keys
$ scp authorized_keys hadoop@slave2:~/.ssh/[b]$ scp [/b]authorized_keys hadoop@slave1:~/.ssh/
经过这样的操作,三节点都能实现相互无密码访问

JDK安装:包含master,slave1, slave2三节点,最好位置统一,例如像本例一样安装在/opt/java/jdk目录下



Hadoop 2.2安装

  由于hadoop集群中每个机器上面的配置基本相同,所以我们先在namenode上面进行配置部署,然后再复制到其他节点。所以这里的安装过程相当于在每台机器上面都要执行。

1.解压文件

        将第一部分中下载的hadoop-2.2.tar.gz解压到/home/u(根据自己的目录进行替换)/路径下(或者将在64位机器上编译的结果存放在此路径下)。然后为了节省空间,可删除此压缩文件,或将其存放于其他地方进行备份。

注意:每台机器的安装路径要相同!!

2.hadoop配置过程

配置之前,需要在master本地文件系统创建以下文件夹:

这里要涉及到的配置文件有6个:

~/hadoop-2.2.0/etc/hadoop/hadoop-env.sh

~/hadoop-2.2.0/etc/hadoop/slaves

~/hadoop-2.2.0/etc/hadoop/core-site.xml

~/hadoop-2.2.0/etc/hadoop/hdfs-site.xml

~/hadoop-2.2.0/etc/hadoop/mapred-site.xml

~/hadoop-2.2.0/etc/hadoop/yarn-site.xml

以上文件默认不存在的,可以复制相应的template文件获得。

配置文件1:hadoop-env.sh(slave1,slave2也需要配置)

修改JAVA_HOME值(export  JAVA_HOME=/opt/java/jdk

配置文件2:slaves (这个文件里面保存所有slave节点)

写入以下内容:

slave1

slave2

配置文件3:core-site.xml

<configuration>
<property>
 <name>hadoop.tmp.dir</name>
 <value>/home/hadoop/tmp/hadoop-${user.name}</value>
 <description>A base for other temporarydirectories.</description>
</property>
<property>
  <name>fs.default.name</name>
  <value>hdfs://master:8010</value>
  <description>The name of the default file system.  A URI whose
          scheme and authority determine the FileSystem implementation.  The
             uri's scheme determines the config property (fs.SCHEME.impl) naming
                the FileSystem implementation class. The uri's authority is used to
                   determine the host, port, etc. for a filesystem.</description>
 </property> 
</configuration>


配置文件4:hdfs-site.xml

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
Theactual number of replications can be specified when the file is created.
 Thedefault is used if replication is not specified in create time.
</description>
</property> 
</configuration>


配置文件5:mapred-site.xml

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:54311</value>
<description>The host and port that the MapReduce job tracker runs
 at.  If "local", thenjobs are run in-process as a single map
  and reduce task.
</description>
</property>
<property>
<name>mapred.map.tasks</name>
<value>10</value>
<description>As a rule of thumb, use 10x the number of slaves(i.e., number of tasktrackers).          
</description>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>2</value>
<description>As a rule of thumb, use 2x the number of slaveprocessors (i.e., number of tasktrackers).
</description>
</property>
</configuration>


配置文件6:yarn-site.xml:暂时无需配置

3.复制到其他节点

master通过scp命令(必须先配置好ssh服务)将配置文件传输给两个子节点slave1,slave2,下面以slave1为例

$ scp -r ~/hadoop-2.2.0 hadoop@slave1:~/
$ scp -r ~/hadoop-2.2.0 hadoop@slave2:~/


4.启动验证

 

1.启动hadoop

进入安装目录:

cd  ~/hadoop-2.2.0/

格式化namenode:

./bin/hdfs namenode –format 

2.启动hdfs: ./sbin/start-dfs.sh

此时在master上面运行的进程有:namenode secondarynamenode

slave1和slave2上面运行的进程有:datanode

3.启动yarn: ./sbin/start-yarn.sh

此时在master上面运行的进程有:namenode secondarynamenoderesourcemanager

slave1和slave2上面运行的进程有:datanode nodemanaget

5.查看集群状态:./bin/hdfs dfsadmin –report

 运行命令:

最终运行结果如下所示,则安装成功!

 

hadoop@master:~/hadoop-2.2.0/bin$ ./hdfs dfsadmin -report
14/03/20 21:24:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Configured Capacity: 139443990528 (129.87 GB)
Present Capacity: 123374796800 (114.90 GB)
DFS Remaining: 123374747648 (114.90 GB)
DFS Used: 49152 (48 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 2 (2 total, 0 dead)

Live datanodes:
Name: 192.168.1.120:50010 (slave2)
Hostname: slave2
Decommission Status : Normal
Configured Capacity: 100111396864 (93.24 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 8277905408 (7.71 GB)
DFS Remaining: 91833466880 (85.53 GB)
DFS Used%: 0.00%
DFS Remaining%: 91.73%
Last contact: Thu Mar 20 21:24:44 CST 2014

Name: 192.168.1.104:50010 (slave1)
Hostname: slave1
Decommission Status : Normal
Configured Capacity: 39332593664 (36.63 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 7791288320 (7.26 GB)
DFS Remaining: 31541280768 (29.38 GB)
DFS Used%: 0.00%
DFS Remaining%: 80.19%
Last contact: Thu Mar 20 21:24:43 CST 2014

 
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: