您的位置:首页 > 运维架构 > Linux

基于linux的hadoop环境搭建

2014-05-15 10:06 531 查看


基于linux的hadoop环境搭建

2013-02-25 22:38:52| 发布者: 领悟书生| 查看:
758

单机模式 伪分布式模式的安装和配置步骤 完全分布式模式 伪分布模式

三种运行模式

单机模式:安装简单,几乎不用作任何配置,但仅限于调试用途

伪分布模式:在单节点上同时启动namenode、datanode、jobtracker、tasktracker、secondary
namenode等5个进程,模拟分布式运行的各个节点【考虑到本人的机器配置,选择这种方式学习】

完全分布式模式:正常的Hadoop集群,由多个各司其职的节点构成

伪分布式模式的安装和配置步骤

下载与解压

下载并解压Hadoop安装包,现在很多教程都是用0.20.2版本,所以我也选择这个版本

http://hadoop.apache.org/releases.html

http://archive.apache.org/dist/hadoop/core/

http://archive.apache.org/dist/hadoop/core/hadoop-0.20.2/

hadoop-0.20.2.tar.gz

root@debian3:/usr/local#tar zxvf hadoop-0.20.2.tar.gz

更改配置文件

进入Hadoop的解压目录,编辑conf/hadoop-env.sh文件(注意0.23版后配置文件的位置有所变化)配置JDK的安装路径

conf/hadoop-env.sh
# The java implementation to use. Required.

export JAVA_HOME=/usr/local/jdk1.6.0_38
JDK的安装请参考:基于debian(ubuntu)的JDK安装与卸载-vps环境搭建实录(一)

编辑conf目录下core-site.xml、hdfs-site.xml和mapred-site.xml三个核心配置文件

core-site.xml
<configuration>

<!--

fs.default.name - 这是一个描述集群中NameNode结点的URI(包括协议、主机名称、端口号),集群里面的每一台机器都需要知道NameNode的地址。DataNode结点会先在NameNode上注册,这样它们的数据才可以被使用。独立的客户端程序通过这个URI跟DataNode交互,以取得文件的块列表。

-->

<property>

<name>fs.default.name</name>

<value>hdfs://192.168.1.102:9000</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/usr/local/hadoop-0.20.2/mytmp</value>

</property>

</configuration>
hdfs-site.xml
<configuration>

<!--

dfs.replication:它决定着系统里面的文件块的数据备份个数。对于一个实际的应用>,它 应该被设为3(这个数字并没有上限,但更多的备份可能并没有作用,而且会占用更多

的空间)。少于三个的备份,可能会影响到数据的可靠性(系统故障时,也许会造成数据丢>

失)

dfs.data.dir:数据节点存储块的目录

-->

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

<property>

<name>dfs.data.dir</name>

<value>/usr/local/hadoop-0.20.2/data</value>

</property>

</configuration>
mapred-site.xml
<configuration>

<!-- mapred.job.tracker -JobTracker的主机(或者IP)和端口。-->

<property>

<name>mapred.job.tracker</name>

<value>192.168.1.102:9001</value>

</property>

</configuration>
编辑conf目录下conf/master和conf/conf/slaves,把IP添加进去

在conf/master中加入master的ip : 192.168.1.102

在conf/slaves中加入slaves的IP: 192.168.1.102

配置ssh

配置ssh,生成密钥,使到ssh可以免密码连192.168.1.102
root@debian3:/usr/local/hadoop-0.20.2# cd /root/

root@debian3:~# ssh-keygen -t rsa

Generating public/private rsa key pair.

Enter file in which to save the key (/root/.ssh/id_rsa):

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Your identification has been saved in /root/.ssh/id_rsa.

Your public key has been saved in /root/.ssh/id_rsa.pub.

The key fingerprint is:

11:25:02:a9:9b:70:2f:52:72:10:96:3e:a6:21:9f:89 root@debian3

The key's randomart image is:

+--[ RSA 2048]----+

|.o. .o. o.. |

|o. . . o |

|.. . . |

|=+= . |

|+X.* S |

|E B . |

| . . |

| |

| |

+-----------------+

root@debian3:~# cd .ssh

root@debian3:~/.ssh# ls

id_rsa id_rsa.pub known_hosts

id_rsa是密钥文件,id_rsa.pub是公钥文件。

root@debian3:~/.ssh# cp id_rsa.pub authorized_keys

root@debian3:~/.ssh# ls

authorized_keys id_rsa id_rsa.pub known_hosts
格式化HDFS
root@debian3:/usr/local/hadoop-0.20.2# bin/hadoop namenode -format

13/02/23 22:57:17 INFO namenode.NameNode: STARTUP_MSG:

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG: host = debian3/127.0.1.1

STARTUP_MSG: args = [-format]

STARTUP_MSG: version = 0.20.2

STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010

************************************************************/

13/02/23 22:57:18 INFO namenode.FSNamesystem: fsOwner=root,root

13/02/23 22:57:18 INFO namenode.FSNamesystem: supergroup=supergroup

13/02/23 22:57:18 INFO namenode.FSNamesystem: isPermissionEnabled=true

13/02/23 22:57:18 INFO common.Storage: Image file of size 94 saved in 0 seconds.

13/02/23 22:57:18 INFO common.Storage: Storage directory /tmp/hadoop-root/dfs/name has been successfully formatted.

13/02/23 22:57:18 INFO namenode.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at debian3/127.0.1.1

************************************************************/
启动Hadoop

使用bin/start-all.sh启动Hadoop
root@debian3:/usr/local/hadoop-0.20.2# bin/start-all.sh

starting namenode, logging to /usr/local/hadoop-0.20.2/bin/../logs/hadoop-root-namenode-debian3.out

192.168.1.102: starting datanode, logging to /usr/local/hadoop-0.20.2/bin/../logs/hadoop-root-datanode-debian3.out

192.168.1.102: starting secondarynamenode, logging to /usr/local/hadoop-0.20.2/bin/../logs/hadoop-root-secondarynamenode-debian3.out

starting jobtracker, logging to /usr/local/hadoop-0.20.2/bin/../logs/hadoop-root-jobtracker-debian3.out

192.168.1.102: starting tasktracker, logging to /usr/local/hadoop-0.20.2/bin/../logs/hadoop-root-tasktracker-debian3.out
检测守护进程启动情况
root@debian3:/usr/local/hadoop-0.20.2# /usr/local/jdk1.6.0_38/bin/jps

9622 DataNode

9872 TaskTracker

9533 NameNode

9781 JobTracker

9711 SecondaryNameNode

9919 Jps
关闭Hadoop

使用bin/stop-all.sh关闭Hadoop
root@debian3:/usr/local/hadoop-0.20.2# bin/stop-all.sh

stopping jobtracker

192.168.1.102: stopping tasktracker

stopping namenode

192.168.1.102: stopping datanode

192.168.1.102: stopping secondarynamenode
本文链接:基于linux的hadoop环境搭建,由领悟书生原创,转载请注明出处http://www.656463.com/article/377
本站采用创作共用版权 CC BY-NC-ND/2.5/CN 许可协议

如非特别注明,本站内容均为领悟书生原创,转载请务必注明作者和原始出处。

本文地址:http://www.656463.com/article/377
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: