您的位置:首页 > 运维架构

Hadoop 安装教程

2015-10-26 10:43 330 查看
*本文主要介绍在 CentOS 安装 CDH,具体版本信息如下:

OS: CentOS 7.0

Java: jdk1.7.0_79

Hadoop: hadoop-2.6.0-cdh5.4.7*

1. 关闭防火墙

-- 关闭防火墙
systemctl stop firewalld.service

-- 禁止firewall开机启动
systemctl disable firewalld.service

-- 关闭SELINUX
cat /etc/selinux/config
SELINUX=disabled
...


2. 网络设置

-- 设置 hosts
cat /etc/hosts
192.168.10.51   hw001
192.168.10.52   hw002
192.168.10.53   hw003

-- 设置 hostname
cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=hw00x

-- 重启网络
service network restart


3. 安装 java 环境

建议安装 1.7 及以上版本,如果已安装,则跳过该步骤。

这里 java 在 root 用户下安装,对所有用户都生效。

-- 卸载 centos7 自带的openjdk
yum autoremove java

-- 下载 jdk http://www.oracle.com/technetwork/cn/java/javase/downloads/jdk7-downloads-1880260.html 
-- 安装
rpm -ivh jdk-7u79-linux-x64.rpm

-- java 默认安装在目录:/usr/java/jdk1.7.0_79
-- 设置环境变量,在 /etc/profile 下增加如下:
# java config.
export JAVA_HOME=/usr/java/jdk1.7.0_79
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$PATH:$JRE_HOME/lib

-- 环境变量生效
source /etc/profile

-- 测试是否成功安装java
java -version


4. 主机互信设置

新建 hadoop 用户,只需要 hadoop 用户互信即可:

-- 创建hadoop 用户
useradd hadoop
passwd hadoop

-- 互信
su - hadoop
ssh-keygen
ssh-copy-id -i ~/.ssh/id_rsa.pub ${remote-host}
ssh ${remote-host}


互信可以参看:http://blog.csdn.net/cjfeii/article/details/47148803

5. 下载
hadoop
安装包

wget http://archive-primary.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.4.7.tar.gz tar xzvf hadoop-2.6.0-cdh5.4.7.tar.gz -C /home/hadoop/


6. 设置环境变量

-- 在 .bashrc 或是 .bash_profile 文件中增加以下设置:
# hadoop config.
export HADOOP_HOME=/home/hadoop/hadoop-2.6.0-cdh5.4.7
export PATH=$PATH:$HADOOP_HOME/bin


7. 修改配置文件

– 修改 hadoop-2.6.0-cdh5.4.7/etc/hadoop/hadoop-env.sh 中的 JAVA_HOME:

export JAVA_HOME=/usr/java/jdk1.7.0_79


– cat hadoop-2.6.0-cdh5.4.7/etc/hadoop/core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://hw001:8020</value>
<final>true</final>
</property>
</configuration>


– cat hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>


note:需要保证这些目录存在,并且 hadoop 用户必须有访问权限,否则报错:/home/name , /home/data

– cat slaves

hw002
hw003


– cat mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>


– cat yarn-site.xml

<?xml version="1.0"?>

<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hw001</value>
</property>

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

</configuration>


8. 启动 hdfs

-- 格式化 namenode:
bin/hadoop namenode -format

-- 启动进程:
sbin/hadoop-daemon.sh start namenode
sbin/hadoop-daemon.sh start datanode

-- or 可以用一个命令启动
sbin/ start-dfs.sh


9. 启动 yarn

-- 启动 yarn 相关进程:
sbin/yarn-daemon.sh start resourcemanager
sbin/yarn-daemon.sh start nodemanager

-- or 可以用一个命令启动
sbin/start-yarn.sh


10. 打开 web 页面

-- 打开 hdfs 展示页面: http://192.168.10.51:50070/ 
-- 打开 yarn 展示界面: http://192.168.10.51:8088/[/code] 

11. 启动一个应用测试

首先是一个计算 PI 的例子:

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.4.7.jar pi 20 10
output:
...
Job Finished in 23.673 seconds
Estimated value of Pi is 3.12000000000000000000


再举一个 wordcount 的例子:

mkdir ./input
cp /etc/profile ./input
bin/hadoop hdfs -copyFromLocal input /input
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.4.7.jar wordcount /input /output
./bin/hadoop dfs -ls /output
Found 2 items
-rw-r--r--   3 hadoop supergroup          0 2015-10-26 10:18 /output/_SUCCESS
-rw-r--r--   3 hadoop supergroup       1587 2015-10-26 10:18 /output/part-r-00000


11. 安装完毕。

ref:

http://dongxicheng.org/mapreduce-nextgen/hadoop-yarn-install/

http://blog.csdn.net/cjfeii/article/details/47148803

http://www.aboutyun.com/thread-9089-1-1.html

源码编译:

http://www.fanqi.org/hadoop-learning-notes-1-64-bit-ubuntu-next-to-recompile-the-hadoop-2-2-0-laundry-list/
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息