您的位置:首页 > 运维架构

hadoop2.7.1 的集群搭建

2016-06-24 14:01 302 查看
Hadoop 2.7.1 的集群搭建

================

环境和相关软件

================

一个笔记本,启动两个ubuntu的虚拟机

虚拟机:VMware Workstation 12 Pro

操作系统版本:Ubuntu 12 en x64

两个系统 master 10.11.12.45 用户feng

slave 10.11.12.47 用户feng

hadoop:hadoop-2.7.1.tar.gz

JDK:java version "1.7.0_05"

启动第一个虚拟机:实现以下操作

================

一.安装 JDK 1.7

================

1.在 /opt下解压 jdk-7u5-linux-x64.tar.gz

cd /opt

tar -zvxf jdk-7u5-linux-x64.tar.gz

授权给当前用户feng

sudo chown -R feng:root hadoop-2.7.1/

2.修改 /etc/profile

sudo vi /etc/profile

在最尾添加:

export JAVA_HOME=/opt/jdk1.7.0_05

export PATH=$JAVA_HOME/bin:$PATH

export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

3.source /etc/profile 使配置文件生效

验证:java -version

查看:

java version "1.7.0_05"

Java(TM) SE Runtime Environment (build 1.7.0_05-b06)

Java HotSpot(TM) 64-Bit Server VM (build 23.1-b03, mixed mode)

================

二.解压 hadoop-2.7.1.tar.gz 并配置环境变量

================

1.在/opt下解压hadoop-2.7.1.tar.gz

cd /opt

sudo tar -zvxf hadoop-2.7.1.tar.gz

sudo chown -R feng:root hadoop-2.7.1/

2.修改 /etc/profile

sudo vi /etc/profile

在最下面添加:

export HADOOP_PREFIX=/opt/hadoop-2.7.1

export PATH=$HADOOP_PREFIX/bin:$PATH

3.source /etc/profile 使配置文件生效

4.验证:hadoop version

查看:

Hadoop 2.7.1

Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 15ecc87ccf4a0228f35af08fc56de536e6ce657a

Compiled by jenkins on 2015-06-29T06:04Z

Compiled with protoc 2.5.0

From source with checksum fc0a1a23fc1868e4d5ee7fa2b28a58a

This command was run using /opt/hadoop-2.7.1/share/hadoop/common/hadoop-common-2.7.1.jar

================

三.修改 ulimit open file 和 nproc

================

在文件 /etc/security/limits.conf 添加三行,如:

--------------------------

feng - nofile 32768

feng soft nproc 32000

feng hard nproc 32000

--------------------------

说明 feng 为当前用户

在 /etc/pam.d/common-session 加上这一行:

session required pam_limits.so

否则在 /etc/security/limits.conf上的配置不会生效.

还有注销再登录,这些配置才能生效!

================

四.复制虚拟机

================

1.关闭虚拟机系统,复制虚拟机文件。

目的弄出两个虚拟机1个是master另1个为slave

================

五.修改 hostname 和 /etc/hosts

================

1.修改/etc/hostname

sudo vi /etc/hostname

ubuntu 改为 master 或 slave

修改完成后用命令hostname查看

2.分别修改master和slave

主机master的hostname为master

分机slave的hostname为slave

3.分别修改/etc/hosts

sudo vi /etc/hosts

master和slave虚拟机都改成这样

127.0.0.1 localhost

# 127.0.1.1 ubuntu

10.11.12.45 master

10.11.12.47 slave

验证:用ping master 和 ping slave

================

六.设置 ssh 免密码登陆

================

1.两个机器都要做 ssh localhost 免登陆

执行下面两个命令后:

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

用ssh localhost 登陆,应该就不需要输入密码了。

2.两个机器 master和slave 都需要互相ssh登陆免密码

a.master 免密码登陆 slave

1)登陆master,把master生成的 ~/.ssh/id_dsa.pub 复制到slave下, 用scp命令

scp ~/.ssh/id_dsa.pub feng@slave:/home/feng/Downloads

2) 登陆slave 把 id_dsa.pub 放倒 authorized_keys 中

cat ~/Downloads/id_dsa.pub >> ~/.ssh/authorized_keys

3) 在 slave 中检查 ~/.ssh/authorized_keys 中内容,执行命令

more ~/.ssh/authorized_keys

4)在master系统中,登陆slave 看看是能ssh免密码登陆,执行命令

ssh slave

b.slave 免密码登陆 master

1)登陆slave,把slave生成的 ~/.ssh/id_dsa.pub 复制到master下, 用scp命令

scp ~/.ssh/id_dsa.pub feng@master:/home/feng/Downloads

2) 登陆master 把 id_dsa.pub 放倒 authorized_keys 中

cat ~/Downloads/id_dsa.pub >> ~/.ssh/authorized_keys

3) 在 master 中检查 ~/.ssh/authorized_keys 中内容,执行命令

more ~/.ssh/authorized_keys

4)在slave系统中,登陆master 看看是能ssh免密码登陆,执行命令

ssh master

完成了master和slave互相ssh登陆免密码

================

七.修改master和slave的hadoop配置文件

================

master和slave的hadoop配置文件都是这么修改。

1.先把配置文件目录备份,然后再修改

1)cd /opt/hadoop-2.7.1/etc

2)cp -R hadoop/ hadoop#bak

2.修改配置文件etc/hadoop/core-site.xml

在<configuration>节点下添加以下属性

<property>

<name>fs.defaultFS</name>

<value>hdfs://master:9000</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>file:/opt/fengwork/hadoop/tmp</value>

</property>

<property>

<name>io.file.buffer.size</name>

<value>4096</value>

</property>

在/opt下创建fengwork目录,并修改用户和用户组

sudo mkdir /opt/fengwork

sudo chown -R feng:root /opt/fengwork

3.修改配置文件etc/hadoop/hdfs-site.xml

在<configuration>节点下添加以下属性

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

<property>

<name>dfs.namenode.name.dir</name>

<value>file:/opt/fengwork/hadoop/datalog</value>

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>file:/opt/fengwork/hadoop/data</value>

</property>

<property>

<name>dfs.namenode.secondary.http-address</name>

<value>master:9001</value>

</property>

<property>

<name>dfs.webhdfs.enabled</name>

<value>true</value>

</property>

4.修改配置文件etc/hadoop/yarn-site.xml

在<configuration>节点下添加以下属性

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<property>

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

<property>

<name>yarn.resourcemanager.address</name>

<value>master:8032</value>

</property>

<property>

<name>yarn.resourcemanager.scheduler.address</name>

<value>master:8030</value>

</property>

<property>

<name>yarn.resourcemanager.resource-tracker.address</name>

<value>master:8031</value>

</property>

<property>

<name>yarn.resourcemanager.admin.address</name>

<value>master:8033</value>

</property>

<property>

<name>yarn.resourcemanager.webapp.address</name>

<value>master:8088</value>

</property>

4.修改配置文件etc/hadoop/mapred-site.xml

cp mapred-site.xml.template mapred-site.xml

修改mapred-site.xml在<configuration>节点下添加以下属性

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

<property>

<name>mapreduce.jobhistory.address</name>

<value>master:10020</value>

</property>

<property>

<name>mapreduce.jobhistory.webapp.address</name>

<value>master:19888</value>

</property>

5.修改配置文件etc/hadoop/slaves

添加:

master

slave

6.修改配置文件etc/hadoop/hadoop-env.sh

在export JAVA_HOME=${JAVA_HOME}上面添加:

JAVA_HOME=/opt/jdk1.7.0_05

如:

JAVA_HOME=/opt/jdk1.7.0_05

export JAVA_HOME=${JAVA_HOME}

================

八.hadoop命令

================

命令:第一次启动前,需先格式化一次。

$HADOOP_PREFIX/bin/hdfs namenode -format <cluster_name>

如:

hdfs namenode -format hadoop_fengwork



/opt/hadoop-2.7.1/bin/hdfs namenode -format hadoop_fengwork

启动全部:

$HADOOP_PREFIX/sbin/start-all.sh

关闭全部:

$HADOOP_PREFIX/sbin/stop-all.sh

#解除hadoop的安全模式

$HADOOP_PREFIX/bin/hadoop dfsadmin -safemode leave

#进入hadoop的安全模式

$HADOOP_PREFIX/bin/hadoop dfsadmin –savemode enter

验证hadoop安装环境的命令

$HADOOP_PREFIX/bin/hadoop checknative -a

----------

基本操作命令

----------

创建目录

$HADOOP_PREFIX/bin/hadoop fs -mkdir /usr

上传文件:

$HADOOP_PREFIX/bin/hadoop fs -put ~/jdk-8u25-linux-x64.tar.gz /usr/feng

下载文件:

$HADOOP_PREFIX/bin/hadoop fs -get /usr/feng/jdk-8u25-linux-x64.tar.gz ~/Downloads/

***********

可以验证文件的是否被窜改了:

feng@master:~$ md5sum ~/Downloads/jdk-8u25-linux-x64.tar.gz

e145c03a7edc845215092786bcfba77e /home/feng/Downloads/jdk-8u25-linux-x64.tar.gz

feng@master:~$ md5sum ~/jdk-8u25-linux-x64.tar.gz

e145c03a7edc845215092786bcfba77e /home/feng/jdk-8u25-linux-x64.tar.gz

查看md5结果是一个,所以文件是没有问题的。

***********

展示文件列表:

$HADOOP_PREFIX/bin/hadoop fs -ls /

递归展示文件列表:

$HADOOP_PREFIX/bin/hadoop fs -ls -R /

展示文件内容:

$HADOOP_PREFIX/bin/hadoop fs -tail /feng/tmp/1462950193038

展示全部文件内容:注意查看测试的小文件可以,太大文件最好不用

$HADOOP_PREFIX/bin/hadoop fs -cat /feng/tmp/1462950193038

更多命令查看官网:
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html
================

命令 hdfs fsck 和 hdfs classpath

================

1.查看classpath:

$HADOOP_PREFIX/bin/hdfs classpath

结果:

/opt/hadoop-2.7.1/etc/hadoop:/opt/hadoop-2.7.1/share/hadoop/common/lib/*:/opt/hadoop-2.7.1/share/hadoop/common/*:/opt/hadoop-2.7.1/share/hadoop/hdfs:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/*:/opt/hadoop-2.7.1/share/hadoop/hdfs/*:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/*:/opt/hadoop-2.7.1/share/hadoop/yarn/*:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/*:/opt/hadoop-2.7.1/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar

fsck

Usage:

hdfs fsck <path>

[-list-corruptfileblocks |

[-move | -delete | -openforwrite]

[-files [-blocks [-locations | -racks]]]

[-includeSnapshots]

[-storagepolicies] [-blockId <blk_Id>]

COMMAND_OPTION Description

path Start checking from this path.

-delete Delete corrupted files.

-files Print out files being checked.

-files -blocks Print out the block report

-files -blocks -locations Print out locations for every block.

-files -blocks -racks Print out network topology for data-node locations.

-includeSnapshots Include snapshot data if the given path indicates a snapshottable directory or there are snapshottable directories under it.

-list-corruptfileblocks Print out list of missing blocks and files they belong to.

-move Move corrupted files to /lost+found.

-openforwrite Print out files opened for write.

-storagepolicies Print out storage policy summary for the blocks.

-blockId Print out information about the block.

例如执行如下命令:

hdfs fsck / -files -blocks

Connecting to namenode via http://master:50070/fsck?ugi=feng&files=1&blocks=1&path=%2F
FSCK started by feng (auth:SIMPLE) from /10.11.12.45 for path / at Mon May 16 14:00:56 CST 2016

/ <dir>

/usr <dir>

/usr/feng <dir>

/usr/feng/jdk-8u25-linux-x64.tar.gz 160872482 bytes, 2 block(s): OK

0. BP-85890032-10.11.12.45-1463366422938:blk_1073741825_1001 len=134217728 repl=1

1. BP-85890032-10.11.12.45-1463366422938:blk_1073741826_1002 len=26654754 repl=1

Status: HEALTHY

Total size: 160872482 B

Total dirs: 3

Total files: 1

Total symlinks: 0

Total blocks (validated): 2 (avg. block size 80436241 B)

Minimally replicated blocks: 2 (100.0 %)

Over-replicated blocks: 0 (0.0 %)

Under-replicated blocks: 0 (0.0 %)

Mis-replicated blocks: 0 (0.0 %)

Default replication factor: 1

Average block replication: 1.0

Corrupt blocks: 0

Missing replicas: 0 (0.0 %)

Number of data-nodes: 1

Number of racks: 1

FSCK ended at Mon May 16 14:00:56 CST 2016 in 4 milliseconds

The filesystem under path '/' is HEALTHY

================

测试map/reduce

================

1.获取WordCount代码:

来源:http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html

package com.feng.test.mr.example;

import java.io.IOException;

import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

public static class TokenizerMapper

extends Mapper<Object, Text, Text, IntWritable>{

private final static IntWritable one = new IntWritable(1);

private Text word = new Text();

public void map(Object key, Text value, Context context

) throws IOException, InterruptedException {

StringTokenizer itr = new StringTokenizer(value.toString());

while (itr.hasMoreTokens()) {

word.set(itr.nextToken());

context.write(word, one);

}

}

}

public static class IntSumReducer

extends Reducer<Text,IntWritable,Text,IntWritable> {

private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values,

Context context

) throws IOException, InterruptedException {

int sum = 0;

for (IntWritable val : values) {

sum += val.get();

}

result.set(sum);

context.write(key, result);

}

}

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();

Job job = Job.getInstance(conf, "word count");

job.setJarByClass(WordCount.class);

job.setMapperClass(TokenizerMapper.class);

job.setCombinerClass(IntSumReducer.class);

job.setReducerClass(IntSumReducer.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(IntWritable.class);

FileInputFormat.addInputPath(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, new Path(args[1]));

System.exit(job.waitForCompletion(true) ? 0 : 1);

}

}

2.进入eclipse的Java Project中的bin目录:

执行jar打包命令:

jar cf WordCount.jar com/feng/test/mr/example/WordCount*.class

3.把WordCount.jar放到master中

scp WordCount.jar feng@master:/home/feng/Downloads

4.执行map/reduce

hadoop jar ~/Downloads/WordCount.jar com.feng.test.mr.example.WordCount /usr/feng/news.txt /user/feng/output

================

================

================

================

================

================

eclipse Hadoop 插件想使用需要:

linux 客户端:修改 /etc/hosts 添加

10.11.12.45 master

10.11.12.47 slave

=============================

hadoop命令行 与job相关的:

命令行工具 •

1.查看 Job 信息:

hadoop job -list

2.杀掉 Job:

hadoop job –kill job_id

3.指定路径下查看历史日志汇总:

hadoop job -history output-dir

4.作业的更多细节:

hadoop job -history all output-dir

5.打印map和reduce完成百分比和所有计数器:

hadoop job –status job_id

6.杀死任务。被杀死的任务不会不利于失败尝试:

hadoop jab -kill-task <task-id>

7.使任务失败。被失败的任务会对失败尝试不利:

hadoop job -fail-task <task-id>

=============================

打包:

jar cf testsyf.jar com/feng/test/mr/easy/Test*.class

上传:

scp testsyf.jar feng@master:/home/feng/Downloads

执行:

/opt/hadoop-2.7.0/bin/hadoop jar ~/Downloads/testsyf.jar com.feng.test.mr/easy.TestCount /feng/mr/ /user/fengtemp1

=============================

管理信息页面:
http://master:8088 http://master:19888 http://master:50070
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: