您的位置:首页 > 运维架构

Hadoop2.x伪分布式环境搭建

2017-10-13 14:16 363 查看
1、对于Linux系统的目录进行规划
2、上传所需要的安装包
3、解压JDK配置环境变量
$ tar -zxfjdk-7u67-linux-x64.tar.gz -C /opt/modules/
sudo vi /etc/profile 环境变量配置文件(系统文件)

让文件生效:
su - root
source /etc/profile
验证:$ java -version
4、解压安装Hadoop-2.5.0
如果磁盘空间较为紧张可以删除doc目录,doc目录存放的都是官方英文说明文档
$ rm -rf ./doc/
5、设置Java的安装目录
etc/hadoop/hadoop-env.sh
etc/hadoop/yarn-env.sh
etc/hadoop/mapred-env.sh
exportJAVA_HOME=/opt/modules/jdk1.7.0_67

Prepare to Start the Hadoop Cluster

Unpack the downloaded Hadoop distribution. In the distribution,edit the file etc/hadoop/hadoop-env.sh to define some parameters as follows:

  # set to the root of your Java installation

  export JAVA_HOME=/usr/java/latest

 

  # Assuming your installation directory is /usr/local/hadoop

  export HADOOP_PREFIX=/usr/local/hadoop

Try the following command:

  $ bin/hadoop

This will display the usage documentation for the hadoop script.

 

6.修改自定义配置文件

配置core-site.xml
指定namenode主节点所在的位置以及交互端口号
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop01.com:8020</value>
</property>
更改hadoop.tmp.dir的默认临时目录路径
hadoop.tmp.dir默认的路径在系统根目录下临时文件里/tmp/hadoop-${user.name}
主要存放镜像文件,日志文件,当系统临时文件清空,hadoop会找不到对应文件
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/modules/hadoop-2.5.0/data/tmp</value>
</property>
配置etc/hadoop/slaves
指定datanode从节点所在的位置,slaves文件配置

注意slaves既代表DN又代表NM
hadoop01.com
配置etc/hadoop/hdfs-site.xml
指定副本个数
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
配置默认快大小,实验环境不建议配置
<property>
<name>dfs.blocksize</name>
<value>134217728</value>
</property>
 

7、对于NameNode进行格式化操作
$ bin/hdfs namenode -format
建议:只需要一次格式化,多次格式化会出错
若要再次格式化,或再次格式化出错后,要到hadoop.tmp.dir配置的目录下,删除这个目录下的文件,然后重新格式化
8、启动相关的服务进程
$ sbin/hadoop-daemon.sh startnamenode
$ sbin/hadoop-daemon.sh startdatanode
9、通过web浏览器加上50070端口号访问管理界面
hadoop01.com:50070
10、对于HDFS文件系统进行读写上传下载测试:
$ bin/hdfs dfs -mkdir -p tmp/conf
$ bin/hdfs dfs -putetc/hadoop/core-site.xml /user/frank/tmp/conf
$ bin/hdfs dfs -cat/user/frank/tmp/conf/core-site.xml
$ bin/hdfs dfs -get/user/frank/tmp/conf/core-site.ml /home/frank/bf-site.xml
11、报错要首先去看日志文件的报错信息
hadoop-2.5.0/logs/查看具体的日志文件查看以.log结尾的文件
12、配置yarn
etc/hadoop/yarn-site.xml
reduce获取数据的方式
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
指定ResourceManager的位置
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop01.com</value>
</property>
14、配置etc/hadoop/mapred-site.xml
指定MapReduce运行在YARN上
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
The runtime framework for executing MapReduce jobs. Canbe one of local, classic or yarn.
 
启动yarn
yarn sbin/yarn-daemon.sh startresourcemanager
sbin/yarn-daemon.sh startnodemanager
 
MapReduce程序打成jar包运行在YARN上
$ bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jarwordcount /user/frank/mapreduce/wordcount/input/user/frank/mapreduce/wordcount/output2
 
查看结果
bin/hdfs dfs -cat /user/frank/mapreduce/wordcount/output1/part*
常用的查看命令是 -text
bin/hdfs dfs -cat /user/frank/mapreduce/wordcount/output1/part*
每次运行的输出路径不能相同,当要再次运行时,需要指定新的输出路径,否则会报错
 

日志聚集
配置文件 etc/hadoop/yarn-site.xml
指定是否开启日志聚集功能
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
设置日志在HDFS上保留的时间期限,通常保存7天
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>106800</value>
</property>
配置完成之后,需要重启YARN的服务进程,然后重新跑一次任务就可以查看过往的日志信息,historyserver同样也要重启
sbin/yarn-daemon.sh stop resourcemanager
sbin/yarn-daemon.sh stop nodemanager
sbin/mr-jobhistory-daemon.sh stop historyserver
 
HDFS文件权限检测 -------
设置不启用HDFS文件系统的权限检查,修改hdfs-site.xml
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
配置之后需要重启HDFS相关的进程
 
指定修改Hadoop静态用户名,修改core-site.xml
dr.who 默认用户名 静态用户
<property>
<name>hadoop.http.staticuser.user</name>
<value>frank</value>
</property>
笔记:
[创建文件夹]
sudo mkdir /data
cd /
ls -all
[改变拥有者]
sudo chown neworigin:neworigin /data/

[拷贝]
cp /mnt/hgfs/BigData/第四天/jdk-8u121-linux-x64.tar.gz /data/
[解压]
tar -xzvf jdk-8u121-linux-x64.tar.gz

[查看路径]
pwd

[/etc/environment]

>sudo nano /etc/environment
JAVA_HOME=/data/jdk1.8.0_121
PATH="$PATH:/data/jdk1.8.0_121/bin"

>source /etc/environment

[查看环境]
>java -version

hadoop配置
[拷贝]
cp /mnt/hgfs/BigData/第四天/hadoop-2.7.0.tar.gz /data/
[解压]
tar -xzvf hadoop-2.7.0.tar.gz
[etc/environment]
HADOOP_HOME=/data/hadoop-2.7.0
PATH=$PATH:/data/hadoop-2.7.0/bin:/data/hadoop-2.7.0/sbin

[测试]
>hadoop version

[配置文件]
>cd /data/hadoop-2.7.0/etc/hadoop

配置文件

<?xml version="1.0"?>

<!-- core-site.xml -->

<configuration>

<property>

<name>fs.defaultFS</name>

<value>hdfs://localhost/</value>

</property>

</configuration>

<?xml version="1.0"?>

<!-- hdfs-site.xml -->

<configuration>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

</configuration>

<?xml version="1.0"?>

<!-- mapred-site.xml -->

<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

</configuration>

<?xml version="1.0"?>

<!-- yarn-site.xml -->

<configuration>

<property>

<name>yarn.resourcemanager.hostname</name>

<value>localhost</value>

</property>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

</configuration>

Important Hadoop Daemon Properties

Example 10-1. A typical core-site.xml configuration file

<?xml version="1.0"?>

<!-- core-site.xml -->

<configuration>

<property>

<name>fs.defaultFS</name>

<value>hdfs://namenode/</value>

</property>

</configuration>

Example 10-2. A typical hdfs-site.xml configuration file

<?xml version="1.0"?>

<!-- hdfs-site.xml -->

<configuration>

<property>

<name>dfs.namenode.name.dir</name>

<value>/disk1/hdfs/name,/remote/hdfs/name</value>

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>/disk1/hdfs/data,/disk2/hdfs/data</value>

</property>

<property>

<name>dfs.namenode.checkpoint.dir</name>

<value>/disk1/hdfs/namesecondary,/disk2/hdfs/namesecondary</value>

</property>

</configuration>

Example 10-3. A typical yarn-site.xml configuration file

<?xml version="1.0"?>

<!-- yarn-site.xml -->

<configuration>

<property>

<name>yarn.resourcemanager.hostname</name>

<value>resourcemanager</value>

</property>

<property>

<name>yarn.nodemanager.local-dirs</name>

<value>/disk1/nm-local-dir,/disk2/nm-local-dir</value>

</property>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce.shuffle</value>

</property>

<property>

<name>yarn.nodemanager.resource.memory-mb</name>

<value>16384</value>

</property>

<property>

<name>yarn.nodemanager.resource.cpu-vcores</name>

<value>16</value>

</property>

</configuration>

笔记:
[集群搭建]

3台虚拟机(s100, s101, s102)
s100 --   master
s101 --   slave1
s102 --   slave1

[配置网络]
/etc/network/interfaces

[配置网络映射]
/etc/hosts

[更改主机名称]
/etc/hostname

------------------
[创建目录]
>sudo mkdir /data    (s100, s101, s102)
>sudo chown neworigin:neworigin /data

[无密登录ssh]

[s100, s101, s102]
>sudo apt-get install ssh
//安装
>rm -rf ~/.ssh

[s100]
>ssh-keygen -t rsa -f ~/.ssh/id_rsa
//在s100(生产公钥)
>cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

>ssh-copy-id s101
//把s100的公钥拷贝到s101
>ssh-copy-id s102
//把s100的公钥拷贝到s102

>ssh localhost
>exit
>ssh s101
>exit
>ssh s102
>exit

[s100]
cat ~/.ssh/authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDEy23ilBVz3NmX5SniIBtxgLT/aFDCCxdc5eTApyjfXg4ISHYcfXsYxDAtqtW9SJQD7KIRvVmRn9hO4nA5MWQVAmPINP96bh7k1eDp8i+1ObKxTd1GXBAhG3dUg3Z7NqOjFBZCMJpwovsR6opajI02g5a27d6YAxZqbBP7RCzIgfuaVEuHqn2HtOA5f7A+eXcNpyb3bvJxmbMe4gUrPQtP+gIS9T13wBKK0EibojpQ52ZKEZUXJFMpX5EThymhBanSVe4KUr8/jmHGQRTMsQMqv2sPNRyL4Sq/C3KsneX4lJt8j8ubPZvzdMOiwQxdYFDn32qsp19BOjlioZpv2JkZ
neworigin@s100

[s101] cat ~/.ssh/authorized_keysssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDEy23ilBVz3NmX5SniIBtxgLT/aFDCCxdc5eTApyjfXg4ISHYcfXsYxDAtqtW9SJQD7KIRvVmRn9hO4nA5MWQVAmPINP96bh7k1eDp8i+1ObKxTd1GXBAhG3dUg3Z7NqOjFBZCMJpwovsR6opajI02g5a27d6YAxZqbBP7RCzIgfuaVEuHqn2HtOA5f7A+eXcNpyb3bvJxmbMe4gUrPQtP+gIS9T13wBKK0EibojpQ52ZKEZUXJFMpX5EThymhBanSVe4KUr8/jmHGQRTMsQMqv2sPNRyL4Sq/C3KsneX4lJt8j8ubPZvzdMOiwQxdYFDn32qsp19BOjlioZpv2JkZ
neworigin@s100

[s102]cat ~/.ssh/authorized_keysssh-rsa
AAAAB3NzaC1yc2EAAAADAQABAAABAQDEy23ilBVz3NmX5SniIBtxgLT/aFDCCxdc5eTApyjfXg4ISHYcfXsYxDAtqtW9SJQD7KIRvVmRn9hO4nA5MWQVAmPINP96bh7k1eDp8i+1ObKxTd1GXBAhG3dUg3Z7NqOjFBZCMJpwovsR6opajI02g5a27d6YAxZqbBP7RCzIgfuaVEuHqn2HtOA5f7A+eXcNpyb3bvJxmbMe4gUrPQtP+gIS9T13wBKK0EibojpQ52ZKEZUXJFMpX5EThymhBanSVe4KUr8/jmHGQRTMsQMqv2sPNRyL4Sq/C3KsneX4lJt8j8ubPZvzdMOiwQxdYFDn32qsp19BOjlioZpv2JkZ
neworigin@s100

[多台主机执行相同命令]
[/usr/local/bin/]
>sudo nano xcall
#!/bin/bash
#获取参数个数

pcount=$#
if((pcount<1));then
echo no args;
exit;
fi

for((host=100;host<103;host=host+1));do
echo ------------s$host-----------------
ssh s$host $@
done

[发送文件]
[scp]
>cp -r /home/neworigin/Desktop/1.txt neworigin@s101:/home/neworigin/Desktop/

[rsync]
远程同步工具,主要用于备份和镜像;支持链接,设备等等;速度快,避免复制相同内容的文件数据;不支持两个远程主机间的复制
>rsync -rvl /home/neworigin/Desktop/1.txt neworigin@s101:/home/neworigin/Desktop/

#!/bin/bash
pcount=$#
if((pcount<1));then
echo no args
exit
fi

p1=$1
fname=`basename $p1`
#echo $fname

pdir=`cd -P $(dirname $p1);pwd`
#echo $pdir

cuser=`whoami`
for((host=101;host<103;host=host+1));do
echo -------------s$host---------------
rsync -rvl $pdir/$fname $cuser@s$host:$pdir
done

[免密码]
>sudo passwd
>su root
>sudo nano /etc/sudoers
neworigin       ALL=(ALL:ALL)   NOPASSWD:ALL

[安装jdk]
>xsync /data/jdk/

[/etc/environment]
JAVA_HOME=/data/jdk1.8.0_121
PATH="$PATH:/data/jdk1.8.0_121/bin"

[复制]
>cd /data/hadoop-2.7.0/etc
>cp -rf hadoop/ hadoop_tmp

[core-site.xml]
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://s100/</value>    
</property>
</configuration>

[hdfs-site.xml]
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/hadoop/hdfs/name</value>
</property>

<property>
<name>dfs.datanode.data.dir</name>
<value>/hadoop/hdfs/data</value>
</property>

<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>/hadoop/hdfs/namesecondary</value>
</property>
</configuration>

[s100, s101, s102]
>sudo mkdir -p /hadoop/hdfs/name
>sudo mkdir -p /hadoop/hdfs/data
>sudo mkdir -p /hadoop/hdfs/namesecondary
 
[yarn-site.xml]
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>s100</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/hadoop/nm-local-dir</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>16384</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>16</value>
</property>
</configuration>

[slaves]
s100
s101
s102

[发送]
>xsync /data/hadoop-2.7.0

[配置环境s101, s102]
HADOOP_HOME=/data/hadoop-2.7.0
PATH="$PATH:/data/hadoop-2.7.0/bin:/data/hadoop-2.7.0/sbin"

[修改目录s100, s101, s102]
>sudo chown neworigin:neworigin /hadoop -R 
>sudo chmod 777 /hadoop -R

[启动s100]
>hdfs namenode -format
>start-all.sh
//启动

>stop-all.sh
//暂停
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息