搭建hadoop分布式集群以及大数据开发环境(配置hdfs,yarn,mapreduce等)
一、hadoop集群
1.节点
master:
master1: ip:192.168.75.137
master2: ip:192.168.75.138
slave:
slave1: ip:192.168.75.139
slave2: ip:192.168.75.140
操作:
(1)查看ip
ifconfig
(2)更改hostname主机名
hostnamectl set-hostname 主机名
(3)添加域名映射
vim /etc/hosts
(4)查看是否存在.ssh
[root@master1 ~]# ls -a
如果有则输入:rm -rf /root/.ssh卸载
(5)生成ssh
ssh-keygen -t rsa
(6)给钥匙
master上执行
scp id_rsa.pub root@master1:/root/
scp id_rsa.pub root@slave1:/root/
等等等
(7)加保险
master和slave上都需要
cat id_rsa.pub>>.ssh/authorized_keys
(8)测试
[root@master1 ~]# ssh slave1
Last login: Tue Jul 17 09:52:38 2018 from 192.168.75.1
[root@slave1 ~]#
2.配置java环境变量
(1)vim /etc/profile
末尾添加:
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.el7_4.x86_64/jre
export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin
(2)执行source /etc/profile生效
(3)查看
$PATH
3.集群搭建
(1)配置文件路径:
配置集群的配置文件目录:cd $HADOOP_HOME/etc/hadoop/conf
(2)增加slave节点
[root@master1 conf]# vim /etc/hadoop/conf/slaves
添加
master1
master2
slave1
slave2
(3)配置集群core-site.xml
[root@master2 ~]# vim /etc/hadoop/conf/core-site.xml
添加:
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/hdp/tmp</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master1:8020</value>
</property>
(4)配置集群hdfs-site.xml
[root@master1 conf]# vim hdfs-site.xml
添加:
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>4</value>(注意:不大于datanode节点的数量)
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/hadoop/hdfs/data</value>
</property>
</configuration>
(5)创建hdfs需要用的文件目录
[root@master1 ~]# mkdir /usr/hdp/tmp -p
[root@master1 ~]# mkdir /hadoop/hdfs/{data,name} -p
[root@master1 ~]# chown -R hdfs:hadoop /hadoop
[root@master1 ~]# chown -R hdfs:hadoop /usr/hdp/tmp
(6)初始化hdfs文件系统
在master1上操作:
[root@master1 ~]# sudo -E -u hdfs hdfs namenode -format
(7)启动hdfs文件系统
启动master1节点上的服务:
[root@master1 ~]# systemctl start hadoop-hdfs-namenode
[root@master1 ~]# systemctl start hadoop-hdfs-datanode
启动master2节点上的服务:
[root@master2 ~]# systemctl start hadoop-hdfs-datanode
[root@master2 ~]# systemctl start hadoop-hdfs-secondarynamenode
启动slave1、slave2节点上的服务:
[root@slave1 ~]# systemctl start hadoop-hdfs-datanode
[root@slave2 ~]# systemctl start hadoop-hdfs-datanode
(8)使用jps命令查看
4.网址查看
192.168.75.137:50070
二、大数据开发环境
1.准备程序运行目录
[root@master1 ~]# su - hdfs
-bash-4.2$ hadoop fs -mkdir /tmp
-bash-4.2$ hadoop fs -chmod -R 1777 /tmp
-bash-4.2$ hadoop fs -mkdir -p /var/log/hadoop-yarn
-bash-4.2$ hadoop fs -chown yarn:mapred /var/log/hadoop-yarn
-bash-4.2$ hadoop fs -mkdir /user
-bash-4.2$ hadoop fs -mkdir /user/hadoop
-bash-4.2$ hadoop fs -mkdir /user/history
-bash-4.2$ hadoop fs -chmod 1777 /user/history
-bash-4.2$ hadoop fs -chown mapred:hadoop /user/history
2.配置yarn-site.xml
[root@master1 conf]# vim yarn-site.xml
添加:
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master2</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>file:///hadoop/yarn/local</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/var/log/hadoop-yarn/containers</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/var/log/hadoop-yarn/apps</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>511</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>2049</value>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>4</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>$HADOOP_CONF_DIR,
/usr/hdp/2.6.3.0-235/hadoop/*,
/usr/hdp/2.6.3.0-235/hadoop/lib/*,
/usr/hdp/2.6.3.0-235/hadoop-hdfs/*,
/usr/hdp/2.6.3.0-235/hadoop-hdfs/lib/*,
/usr/hdp/2.6.3.0-235/hadoop-yarn/*,
/usr/hdp/2.6.3.0-235/hadoop-yarn/lib/*,
/usr/hdp/2.6.3.0-235/hadoop-mapreduce/*,
/usr/hdp/2.6.3.0-235/hadoop-mapreduce/lib/*,
/usr/hdp/2.6.3.0-235/hadoop-httpfs/*,
/usr/hdp/2.6.3.0-235/hadoop-httpfs/lib/*
</value>
</property>
</configuration>
3.配置mapred-site.xml
[root@master1 conf]# vim mapred-site.xml
添加:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>slave1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapps.address</name>
<value>slave1:19888</value>
</property>
<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/user</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>
/etc/hadoop/conf/*,
/usr/hdp/2.6.3.0-235/hadoop/*,
/usr/hdp/2.6.3.0-235/hadoop-hdfs/*,
/usr/hdp/2.6.3.0-235/hadoop-yarn/*,
/usr/hdp/2.6.3.0-235/hadoop-mapreduce/*,
/usr/hdp/2.6.3.0-235/hadoop/lib/*,
/usr/hdp/2.6.3.0-235/hadoop-hdfs/lib/*,
/usr/hdp/2.6.3.0-235/hadoop-yarn/lib/*,
/usr/hdp/2.6.3.0-235/hadoop-mapreduce/lib/*
</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1024M</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>31</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx1024M</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>63</value>
</property>
</configuration>
4.配置yarn的本地目录
[root@master1 ~]# touch /etc/hadoop/conf/yarn-env.sh
[root@master1 ~]# mkdir -p /hadoop/yarn/local
[root@master1 ~]# chown yarn:yarn -R /hadoop/yarn/local
5.启动服务
在master2上开启resourcemanager:
[root@master2 ~]# systemctl start hadoop-yarn-resourcemanager
访问web后台master2:8088
在slave1、slave2上开启historyserver
[root@slave1 ~]# systemctl start hadoop-mapreduce-historyserver
[root@slave2 ~]# systemctl start hadoop-mapreduce-historyserver
在所有启动datanode的节点上开nodemanager
[root@slave2 ~]# systemctl start hadoop-yarn-nodemanager
6.验证
master2:8088
slave1:19888
阅读更多
- hadoop初识之三:搭建hadoop环境(配置HDFS,Yarn及mapreduce 运行在yarn)上及三种运行模式(本地模式,伪分布式和分布式介)
- 【大数据】开发环境搭建(三):hadoop伪分布式集群环境搭建(下)
- hadoop集群分布式搭建以及环境配置
- Hadoop环境搭建之二配置启动HDFS及本地模式运行MapReduce案例(使用HDFS上数据)
- 【大数据】开发环境搭建(二):hadoop伪分布式集群环境搭建(上)
- Hadoop环境搭建三之配置部署启动YARN以及在YARN上运行MapReduce程序
- hadoop - hadoop2.6 分布式 - 集群环境搭建 - Hadoop 2.6 分布式 配置,初始化,启动过程
- Linux上搭建Hadoop2.6.3集群以及WIN7通过Eclipse开发MapReduce的demo
- hadoop - hadoop2.6 分布式 - 集群环境搭建 - Hadoop 2.6 分布式安装配置与启动
- windows下hadoop伪分布式模式开发环境的搭建(Cygwin)以及Eclipse集成开发环境下的搭建
- DayDayUP_大数据学习课程[1]_hadoop2.6.0完全分布式集群环境和伪分布式集群搭建
- 搭建Hadoop2.6.0+Eclipse开发调试环境(以及log4j.properties的配置)
- Linux上搭建Hadoop2.6.3集群以及WIN7通过Eclipse开发MapReduce的demo
- hadoop - hadoop2.6 分布式 - 集群环境搭建 - 系统搭建和网络配置
- 高可用,完全分布式Hadoop集群HDFS和MapReduce安装配置指南
- hadoop - hadoop2.6 分布式 - 集群环境搭建 - JDK安装配置和SSH安装配置与免密码登陆(集群中)
- 基于vmware workstations 10 、centos6.4和hadoop-2.7.1的hadoop完全分布式集群的开发环境搭建
- 配置密码分布式集群环境hadoop、hbase、zookeeper搭建(全)
- 基于HBase Hadoop 分布式集群环境下的MapReduce程序开发
- Hadoop2.6集群环境搭建(HDFS HA+YARN)原来4G内存也能任性一次.