apache hadoop-2.6.0-CDH5.4.1 安装笔记
2015-05-17 17:26
369 查看
apache hadoop-2.6.0-CDH5.4.1 安装
1.安装Oracle Java 8
sudo add-apt-repository ppa:webupd8team/javasudo apt-get update
sudo apt-get install oracle-java8-installer
sudo vi /etc/profile
#set java environment export J***A_HOME=/usr/lib/jvm/oracle-java8-installer export JRE_HOME=${J***A_HOME}/jre export CLASSPATH=.:${J***A_HOME}/lib:${JRE_HOME}/lib export PATH=${J***A_HOME}/bin:$PATH
source /etc/profile
验证java环境是否配好
java -version
2.实验机器角色分配
主机名 | ip | 角色 |
---|---|---|
Master | 10.5.0.196 | MameNode、ResourceManager、SecondaryNameNode |
Slave1 | 10.5.0.231 | DataNode、NodeManager |
Slave2 | 10.5.0.232 | DataNode、NodeManager |
Slave3 | 10.5.0.233 | DataNode、NodeManager |
修改 /etc/hostname 内容为Master (Slave1为Slave1…)
/etc/hosts 添加内容行
10.5.0.196 Master 10.5.0.231 Slave1 10.5.0.232 Slave2 10.5.0.233 Slave3
3.建立hadoop用户
useradd hadooppasswd hadoop
设置管理员权限
vi /etc/sudoers
`#` User privilege specification root ALL=(ALL:ALL) ALL hadoop ALL=(ALL:ALL) ALL
4. Master上配置SSH免密码登录
sudo apt-get install openssh-serversudo su hadoop
cd /home/hadoop
ssh-keygen -t rsa(一路回车 生成密钥)
cd .ssh
cp id_rsa.pub authorized_keys
或者 cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
将公钥拷到Slave上
ssh-copy-id -i $HOME/.ssh/id_rsa.pub hadoop@Slave1
#或者 scp authorized_keys hadoop@Slave1:/home/hadoop/.ssh/
5.下载、安装hadoop-cdh
wget http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.4.1.tar.gztar -zxvf hadoop-2.6.0-cdh5.4.1.tar.gz -C /usr/local/cloudera/hadoop-2.6.0-cdh5.4.1
sudo chown hadoop:hadoop -R /usr/local/cloudera
sudo mkdir -p /usr/local/cloudera/hadoop_tmp/hdfs/namenode
sudo mkdir -p /usr/local/cloudera/hadoop_tmp/hdfs/datanode
sudo chown hadoop:hadoop -R /usr/local/hadoop_tmp
配置 ~/.bashrc
sudo vi $HOME/.bashrc
添加下列内容到文件尾部
`#`HADOOP VARIABLES START export J***A_HOME=/usr/lib/jvm/oracle-java8-installer export HADOOP_HOME=/usr/local/cloudera/hadoop-2.6.0-cdh5.4.1 export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native" `#`HADOOP VARIABLES END
source ~/.bashrc
cd $HADOOP_HOME/etc/hadoop
sudo vi hadoop-env.sh
修改J***A_HOME的值
export J***A_HOME=/usr/lib/jvm/oracle-java8-installer
sudo vi slaves
Slave1
Slave2
Slave3
sudo vi masters
Master
6.配置.XML文件
core-site.xml、hdfs-site.xml、yarn-site.xml、mapred-site.xml1. core-site.xml
<configuration> <!-- file system properties --> <property> <name>fs.default.name</name> <value>hdfs://Master:9000</value> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> </property> </configuration>
2. hdfs-site.xml
/hdfs/namenode、/hdfs/datanode 目录需要自己新建<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/cloudera/hadoop_tmp/hdfs/namenode</value> </property> <property> <name>dfs.datanoe.data.dir</name> <value>file:/usr/local/cloudera/hadoop_tmp/hdfs/datanode</value> </property> </configuration>
3. yarn-site.xml
<!-- Site specific YARN configuration properties --> <configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>Master</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>Master:8025</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>Master:8035</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>Master:8050</value> </property> </configuration>
4. mapred-site.xml
<configuration> <property> <name>mapreduce.job.tracker</name> <value>Master:5431</value> </property> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
基本项配置完毕
7.同步hadoop配置文件到其他的Slave节点
sudo apt-get install rsyncsudo rsync -avxP /usr/local/cloudera/ hadoop@Slave1:/usr/local/cloudera/
sudo rsync -avxP /usr/local/cloudera/ hadoop@Slave2:/usr/local/cloudera/
sudo rsync -avxP /usr/local/cloudera/ hadoop@Slave3:/usr/local/cloudera/
8.Hadoop 启动和验证
1. 格式化分布式文件系统
hadoop namenode -format最后显示如下内容表示成功
15/05/15 22:48:01 INFO util.ExitUtil: Exiting with status 0 15/05/15 22:48:01 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at Master/10.5.0.196 ************************************************************/
2. 启动HDFS
./sbin/start-dfs.sh3. 启动YARN
./sbin/start-yarn.sh4. 通过WebUI查看Hadoop状态
访问http://Master:50070
9.安装过程存在的问题
因为 hadoop-2.6.0-cdh5.4.1/lib/native 的静态库文件不存在发现过程
运行hadoop命令的时候会出现如下警告信息15/05/17 10:46:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
开启调试模式
export HADOOP_ROOT_LOGGER=DEBUG,console
在执行hadoop命令的时候出现如下信息
15/05/17 16:46:48 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library... 15/05/17 16:46:48 DEBUG util.NativeCodeLoader: Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path 15/05/17 16:46:48 DEBUG util.NativeCodeLoader: java.library.path=/usr/local/cloudera/hadoop-2.6.0-cdh5.4.1/lib/native 15/05/17 16:46:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/05/17 16:46:48 DEBUG util.PerformanceAdvisory: Falling back to shell based
cd /usr/local/cloudera/hadoop-2.6.0-cdh5.4.1/lib/native
发现hadoop的本地库文件不存在
两种解决办法
下载hadoop-2.6.0-cdh5.4.1-src.tar.gz源码 本地编译 (maven编译,详见里面的BUIDING文件,这个过程是很漫长的,我的机器是64bit)下载apache-hadoop-2.6.0-src.tar.gz,里面有lib/native本地库文件,复制过去就可以了
10. Hadoop 集群测试
运行$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.4.1.jar 这个架包里的WordCount程序,作用是统计单词的个数。1. 随便找一些文本文件
mkdir filecp *.txt /file #将找到的txt文件都复制到input目录
2. 在HDFS系统里创建一个input文件夹
hadoop fs -mkdir /input3. 把file目录下的txt文件上传到HDFS系统的input文件夹下
hadoop fs -put ./file/*.txt /input/4. 查看文件是否上传成功
hadoop fs -ls /input/5. 运行hadoop-mapreduce-examples-2.6.0-cdh5.4.1.jar架包中wordcount程序
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.4.1.jar wordcount /input/ /output/6. 查看运行结果
hadoop fs -ls /output-rw-r--r-- 1 hadoop supergroup 0 2015-05-17 12:27 /output/wordcount/_SUCCESS -rw-r--r-- 1 hadoop supergroup 9190 2015-05-17 12:27 /output/wordcount/part-r-00000
hadoop fs -cat /output/part-r-00000
11. 参考文章
How to install Apache Hadoop 2.6.0 in Ubuntu (Single node setup)How to install Apache Hadoop 2.6.0 in Ubuntu (Multi node/Cluster setup)
相关文章推荐
- Hadoop 2.6.0-cdh5.4.0集群环境搭建和Apache-Hive、Sqoop的安装
- Hadoop 2.6.0 安装笔记
- Hadoop(CDH)分布式集群安装笔记(亲测)
- Cloudera’s Distribution Including Apache Hadoop(CDH)安装过程
- Apache Hadoop 2.6.0安装部署
- Hadoop(CDH)分布式集群安装笔记(亲测)
- Hadoop(CDH)分布式集群安装笔记(亲测)
- Hadoop学习笔记———《Mac OS X 下hadoop2.6.0安装教程》
- Hadoop(CDH)分布式集群安装笔记(亲测)
- CDH5.11.0安装启动hive过程org.apache.hadoop.hive.metastore.HiveMetaException: Failed to load driver
- CentOS7.0分布式安装HADOOP 2.6.0笔记
- Hadoop(CDH)分布式集群安装笔记(亲测)
- Hadoop(CDH)分布式集群安装笔记(亲测)
- Hadoop(CDH)分布式集群安装笔记(亲测)
- Hadoop(CDH)分布式集群安装笔记(亲测)
- hadoop-2.0.0-cdh4.6.0、sqoop-1.4.3-cdh4.6.0、mahout-0.7-cdh4.6.0 安装笔记
- Hadoop(CDH)分布式集群安装笔记(亲测)
- 简单的在Hadoop2.6.0上安装eclipse运行WORDCOUNT的总结笔记
- Centos单机安装hadoop-2.6.0-cdh5.8.5 jdk1.8.0_131
- Hadoop2.6.0伪分布式安装及测试笔记