hadoop2.7.1 的集群搭建
2016-06-24 14:01
302 查看
Hadoop 2.7.1 的集群搭建
================
环境和相关软件
================
一个笔记本,启动两个ubuntu的虚拟机
虚拟机:VMware Workstation 12 Pro
操作系统版本:Ubuntu 12 en x64
两个系统 master 10.11.12.45 用户feng
slave 10.11.12.47 用户feng
hadoop:hadoop-2.7.1.tar.gz
JDK:java version "1.7.0_05"
启动第一个虚拟机:实现以下操作
================
一.安装 JDK 1.7
================
1.在 /opt下解压 jdk-7u5-linux-x64.tar.gz
cd /opt
tar -zvxf jdk-7u5-linux-x64.tar.gz
授权给当前用户feng
sudo chown -R feng:root hadoop-2.7.1/
2.修改 /etc/profile
sudo vi /etc/profile
在最尾添加:
export JAVA_HOME=/opt/jdk1.7.0_05
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
3.source /etc/profile 使配置文件生效
验证:java -version
查看:
java version "1.7.0_05"
Java(TM) SE Runtime Environment (build 1.7.0_05-b06)
Java HotSpot(TM) 64-Bit Server VM (build 23.1-b03, mixed mode)
================
二.解压 hadoop-2.7.1.tar.gz 并配置环境变量
================
1.在/opt下解压hadoop-2.7.1.tar.gz
cd /opt
sudo tar -zvxf hadoop-2.7.1.tar.gz
sudo chown -R feng:root hadoop-2.7.1/
2.修改 /etc/profile
sudo vi /etc/profile
在最下面添加:
export HADOOP_PREFIX=/opt/hadoop-2.7.1
export PATH=$HADOOP_PREFIX/bin:$PATH
3.source /etc/profile 使配置文件生效
4.验证:hadoop version
查看:
Hadoop 2.7.1
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 15ecc87ccf4a0228f35af08fc56de536e6ce657a
Compiled by jenkins on 2015-06-29T06:04Z
Compiled with protoc 2.5.0
From source with checksum fc0a1a23fc1868e4d5ee7fa2b28a58a
This command was run using /opt/hadoop-2.7.1/share/hadoop/common/hadoop-common-2.7.1.jar
================
三.修改 ulimit open file 和 nproc
================
在文件 /etc/security/limits.conf 添加三行,如:
--------------------------
feng - nofile 32768
feng soft nproc 32000
feng hard nproc 32000
--------------------------
说明 feng 为当前用户
在 /etc/pam.d/common-session 加上这一行:
session required pam_limits.so
否则在 /etc/security/limits.conf上的配置不会生效.
还有注销再登录,这些配置才能生效!
================
四.复制虚拟机
================
1.关闭虚拟机系统,复制虚拟机文件。
目的弄出两个虚拟机1个是master另1个为slave
================
五.修改 hostname 和 /etc/hosts
================
1.修改/etc/hostname
sudo vi /etc/hostname
ubuntu 改为 master 或 slave
修改完成后用命令hostname查看
2.分别修改master和slave
主机master的hostname为master
分机slave的hostname为slave
3.分别修改/etc/hosts
sudo vi /etc/hosts
master和slave虚拟机都改成这样
127.0.0.1 localhost
# 127.0.1.1 ubuntu
10.11.12.45 master
10.11.12.47 slave
验证:用ping master 和 ping slave
================
六.设置 ssh 免密码登陆
================
1.两个机器都要做 ssh localhost 免登陆
执行下面两个命令后:
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
用ssh localhost 登陆,应该就不需要输入密码了。
2.两个机器 master和slave 都需要互相ssh登陆免密码
a.master 免密码登陆 slave
1)登陆master,把master生成的 ~/.ssh/id_dsa.pub 复制到slave下, 用scp命令
scp ~/.ssh/id_dsa.pub feng@slave:/home/feng/Downloads
2) 登陆slave 把 id_dsa.pub 放倒 authorized_keys 中
cat ~/Downloads/id_dsa.pub >> ~/.ssh/authorized_keys
3) 在 slave 中检查 ~/.ssh/authorized_keys 中内容,执行命令
more ~/.ssh/authorized_keys
4)在master系统中,登陆slave 看看是能ssh免密码登陆,执行命令
ssh slave
b.slave 免密码登陆 master
1)登陆slave,把slave生成的 ~/.ssh/id_dsa.pub 复制到master下, 用scp命令
scp ~/.ssh/id_dsa.pub feng@master:/home/feng/Downloads
2) 登陆master 把 id_dsa.pub 放倒 authorized_keys 中
cat ~/Downloads/id_dsa.pub >> ~/.ssh/authorized_keys
3) 在 master 中检查 ~/.ssh/authorized_keys 中内容,执行命令
more ~/.ssh/authorized_keys
4)在slave系统中,登陆master 看看是能ssh免密码登陆,执行命令
ssh master
完成了master和slave互相ssh登陆免密码
================
七.修改master和slave的hadoop配置文件
================
master和slave的hadoop配置文件都是这么修改。
1.先把配置文件目录备份,然后再修改
1)cd /opt/hadoop-2.7.1/etc
2)cp -R hadoop/ hadoop#bak
2.修改配置文件etc/hadoop/core-site.xml
在<configuration>节点下添加以下属性
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/opt/fengwork/hadoop/tmp</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
</property>
在/opt下创建fengwork目录,并修改用户和用户组
sudo mkdir /opt/fengwork
sudo chown -R feng:root /opt/fengwork
3.修改配置文件etc/hadoop/hdfs-site.xml
在<configuration>节点下添加以下属性
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/opt/fengwork/hadoop/datalog</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/opt/fengwork/hadoop/data</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:9001</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
4.修改配置文件etc/hadoop/yarn-site.xml
在<configuration>节点下添加以下属性
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
4.修改配置文件etc/hadoop/mapred-site.xml
cp mapred-site.xml.template mapred-site.xml
修改mapred-site.xml在<configuration>节点下添加以下属性
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
5.修改配置文件etc/hadoop/slaves
添加:
master
slave
6.修改配置文件etc/hadoop/hadoop-env.sh
在export JAVA_HOME=${JAVA_HOME}上面添加:
JAVA_HOME=/opt/jdk1.7.0_05
如:
JAVA_HOME=/opt/jdk1.7.0_05
export JAVA_HOME=${JAVA_HOME}
================
八.hadoop命令
================
命令:第一次启动前,需先格式化一次。
$HADOOP_PREFIX/bin/hdfs namenode -format <cluster_name>
如:
hdfs namenode -format hadoop_fengwork
或
/opt/hadoop-2.7.1/bin/hdfs namenode -format hadoop_fengwork
启动全部:
$HADOOP_PREFIX/sbin/start-all.sh
关闭全部:
$HADOOP_PREFIX/sbin/stop-all.sh
#解除hadoop的安全模式
$HADOOP_PREFIX/bin/hadoop dfsadmin -safemode leave
#进入hadoop的安全模式
$HADOOP_PREFIX/bin/hadoop dfsadmin –savemode enter
验证hadoop安装环境的命令
$HADOOP_PREFIX/bin/hadoop checknative -a
----------
基本操作命令
----------
创建目录
$HADOOP_PREFIX/bin/hadoop fs -mkdir /usr
上传文件:
$HADOOP_PREFIX/bin/hadoop fs -put ~/jdk-8u25-linux-x64.tar.gz /usr/feng
下载文件:
$HADOOP_PREFIX/bin/hadoop fs -get /usr/feng/jdk-8u25-linux-x64.tar.gz ~/Downloads/
***********
可以验证文件的是否被窜改了:
feng@master:~$ md5sum ~/Downloads/jdk-8u25-linux-x64.tar.gz
e145c03a7edc845215092786bcfba77e /home/feng/Downloads/jdk-8u25-linux-x64.tar.gz
feng@master:~$ md5sum ~/jdk-8u25-linux-x64.tar.gz
e145c03a7edc845215092786bcfba77e /home/feng/jdk-8u25-linux-x64.tar.gz
查看md5结果是一个,所以文件是没有问题的。
***********
展示文件列表:
$HADOOP_PREFIX/bin/hadoop fs -ls /
递归展示文件列表:
$HADOOP_PREFIX/bin/hadoop fs -ls -R /
展示文件内容:
$HADOOP_PREFIX/bin/hadoop fs -tail /feng/tmp/1462950193038
展示全部文件内容:注意查看测试的小文件可以,太大文件最好不用
$HADOOP_PREFIX/bin/hadoop fs -cat /feng/tmp/1462950193038
更多命令查看官网:
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html
================
命令 hdfs fsck 和 hdfs classpath
================
1.查看classpath:
$HADOOP_PREFIX/bin/hdfs classpath
结果:
/opt/hadoop-2.7.1/etc/hadoop:/opt/hadoop-2.7.1/share/hadoop/common/lib/*:/opt/hadoop-2.7.1/share/hadoop/common/*:/opt/hadoop-2.7.1/share/hadoop/hdfs:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/*:/opt/hadoop-2.7.1/share/hadoop/hdfs/*:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/*:/opt/hadoop-2.7.1/share/hadoop/yarn/*:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/*:/opt/hadoop-2.7.1/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar
fsck
Usage:
hdfs fsck <path>
[-list-corruptfileblocks |
[-move | -delete | -openforwrite]
[-files [-blocks [-locations | -racks]]]
[-includeSnapshots]
[-storagepolicies] [-blockId <blk_Id>]
COMMAND_OPTION Description
path Start checking from this path.
-delete Delete corrupted files.
-files Print out files being checked.
-files -blocks Print out the block report
-files -blocks -locations Print out locations for every block.
-files -blocks -racks Print out network topology for data-node locations.
-includeSnapshots Include snapshot data if the given path indicates a snapshottable directory or there are snapshottable directories under it.
-list-corruptfileblocks Print out list of missing blocks and files they belong to.
-move Move corrupted files to /lost+found.
-openforwrite Print out files opened for write.
-storagepolicies Print out storage policy summary for the blocks.
-blockId Print out information about the block.
例如执行如下命令:
hdfs fsck / -files -blocks
Connecting to namenode via http://master:50070/fsck?ugi=feng&files=1&blocks=1&path=%2F
FSCK started by feng (auth:SIMPLE) from /10.11.12.45 for path / at Mon May 16 14:00:56 CST 2016
/ <dir>
/usr <dir>
/usr/feng <dir>
/usr/feng/jdk-8u25-linux-x64.tar.gz 160872482 bytes, 2 block(s): OK
0. BP-85890032-10.11.12.45-1463366422938:blk_1073741825_1001 len=134217728 repl=1
1. BP-85890032-10.11.12.45-1463366422938:blk_1073741826_1002 len=26654754 repl=1
Status: HEALTHY
Total size: 160872482 B
Total dirs: 3
Total files: 1
Total symlinks: 0
Total blocks (validated): 2 (avg. block size 80436241 B)
Minimally replicated blocks: 2 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 1
Average block replication: 1.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 1
Number of racks: 1
FSCK ended at Mon May 16 14:00:56 CST 2016 in 4 milliseconds
The filesystem under path '/' is HEALTHY
================
测试map/reduce
================
1.获取WordCount代码:
来源:http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html
package com.feng.test.mr.example;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
2.进入eclipse的Java Project中的bin目录:
执行jar打包命令:
jar cf WordCount.jar com/feng/test/mr/example/WordCount*.class
3.把WordCount.jar放到master中
scp WordCount.jar feng@master:/home/feng/Downloads
4.执行map/reduce
hadoop jar ~/Downloads/WordCount.jar com.feng.test.mr.example.WordCount /usr/feng/news.txt /user/feng/output
================
================
================
================
================
================
eclipse Hadoop 插件想使用需要:
linux 客户端:修改 /etc/hosts 添加
10.11.12.45 master
10.11.12.47 slave
=============================
hadoop命令行 与job相关的:
命令行工具 •
1.查看 Job 信息:
hadoop job -list
2.杀掉 Job:
hadoop job –kill job_id
3.指定路径下查看历史日志汇总:
hadoop job -history output-dir
4.作业的更多细节:
hadoop job -history all output-dir
5.打印map和reduce完成百分比和所有计数器:
hadoop job –status job_id
6.杀死任务。被杀死的任务不会不利于失败尝试:
hadoop jab -kill-task <task-id>
7.使任务失败。被失败的任务会对失败尝试不利:
hadoop job -fail-task <task-id>
=============================
打包:
jar cf testsyf.jar com/feng/test/mr/easy/Test*.class
上传:
scp testsyf.jar feng@master:/home/feng/Downloads
执行:
/opt/hadoop-2.7.0/bin/hadoop jar ~/Downloads/testsyf.jar com.feng.test.mr/easy.TestCount /feng/mr/ /user/fengtemp1
=============================
管理信息页面:
http://master:8088 http://master:19888 http://master:50070
================
环境和相关软件
================
一个笔记本,启动两个ubuntu的虚拟机
虚拟机:VMware Workstation 12 Pro
操作系统版本:Ubuntu 12 en x64
两个系统 master 10.11.12.45 用户feng
slave 10.11.12.47 用户feng
hadoop:hadoop-2.7.1.tar.gz
JDK:java version "1.7.0_05"
启动第一个虚拟机:实现以下操作
================
一.安装 JDK 1.7
================
1.在 /opt下解压 jdk-7u5-linux-x64.tar.gz
cd /opt
tar -zvxf jdk-7u5-linux-x64.tar.gz
授权给当前用户feng
sudo chown -R feng:root hadoop-2.7.1/
2.修改 /etc/profile
sudo vi /etc/profile
在最尾添加:
export JAVA_HOME=/opt/jdk1.7.0_05
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
3.source /etc/profile 使配置文件生效
验证:java -version
查看:
java version "1.7.0_05"
Java(TM) SE Runtime Environment (build 1.7.0_05-b06)
Java HotSpot(TM) 64-Bit Server VM (build 23.1-b03, mixed mode)
================
二.解压 hadoop-2.7.1.tar.gz 并配置环境变量
================
1.在/opt下解压hadoop-2.7.1.tar.gz
cd /opt
sudo tar -zvxf hadoop-2.7.1.tar.gz
sudo chown -R feng:root hadoop-2.7.1/
2.修改 /etc/profile
sudo vi /etc/profile
在最下面添加:
export HADOOP_PREFIX=/opt/hadoop-2.7.1
export PATH=$HADOOP_PREFIX/bin:$PATH
3.source /etc/profile 使配置文件生效
4.验证:hadoop version
查看:
Hadoop 2.7.1
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 15ecc87ccf4a0228f35af08fc56de536e6ce657a
Compiled by jenkins on 2015-06-29T06:04Z
Compiled with protoc 2.5.0
From source with checksum fc0a1a23fc1868e4d5ee7fa2b28a58a
This command was run using /opt/hadoop-2.7.1/share/hadoop/common/hadoop-common-2.7.1.jar
================
三.修改 ulimit open file 和 nproc
================
在文件 /etc/security/limits.conf 添加三行,如:
--------------------------
feng - nofile 32768
feng soft nproc 32000
feng hard nproc 32000
--------------------------
说明 feng 为当前用户
在 /etc/pam.d/common-session 加上这一行:
session required pam_limits.so
否则在 /etc/security/limits.conf上的配置不会生效.
还有注销再登录,这些配置才能生效!
================
四.复制虚拟机
================
1.关闭虚拟机系统,复制虚拟机文件。
目的弄出两个虚拟机1个是master另1个为slave
================
五.修改 hostname 和 /etc/hosts
================
1.修改/etc/hostname
sudo vi /etc/hostname
ubuntu 改为 master 或 slave
修改完成后用命令hostname查看
2.分别修改master和slave
主机master的hostname为master
分机slave的hostname为slave
3.分别修改/etc/hosts
sudo vi /etc/hosts
master和slave虚拟机都改成这样
127.0.0.1 localhost
# 127.0.1.1 ubuntu
10.11.12.45 master
10.11.12.47 slave
验证:用ping master 和 ping slave
================
六.设置 ssh 免密码登陆
================
1.两个机器都要做 ssh localhost 免登陆
执行下面两个命令后:
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
用ssh localhost 登陆,应该就不需要输入密码了。
2.两个机器 master和slave 都需要互相ssh登陆免密码
a.master 免密码登陆 slave
1)登陆master,把master生成的 ~/.ssh/id_dsa.pub 复制到slave下, 用scp命令
scp ~/.ssh/id_dsa.pub feng@slave:/home/feng/Downloads
2) 登陆slave 把 id_dsa.pub 放倒 authorized_keys 中
cat ~/Downloads/id_dsa.pub >> ~/.ssh/authorized_keys
3) 在 slave 中检查 ~/.ssh/authorized_keys 中内容,执行命令
more ~/.ssh/authorized_keys
4)在master系统中,登陆slave 看看是能ssh免密码登陆,执行命令
ssh slave
b.slave 免密码登陆 master
1)登陆slave,把slave生成的 ~/.ssh/id_dsa.pub 复制到master下, 用scp命令
scp ~/.ssh/id_dsa.pub feng@master:/home/feng/Downloads
2) 登陆master 把 id_dsa.pub 放倒 authorized_keys 中
cat ~/Downloads/id_dsa.pub >> ~/.ssh/authorized_keys
3) 在 master 中检查 ~/.ssh/authorized_keys 中内容,执行命令
more ~/.ssh/authorized_keys
4)在slave系统中,登陆master 看看是能ssh免密码登陆,执行命令
ssh master
完成了master和slave互相ssh登陆免密码
================
七.修改master和slave的hadoop配置文件
================
master和slave的hadoop配置文件都是这么修改。
1.先把配置文件目录备份,然后再修改
1)cd /opt/hadoop-2.7.1/etc
2)cp -R hadoop/ hadoop#bak
2.修改配置文件etc/hadoop/core-site.xml
在<configuration>节点下添加以下属性
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/opt/fengwork/hadoop/tmp</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
</property>
在/opt下创建fengwork目录,并修改用户和用户组
sudo mkdir /opt/fengwork
sudo chown -R feng:root /opt/fengwork
3.修改配置文件etc/hadoop/hdfs-site.xml
在<configuration>节点下添加以下属性
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/opt/fengwork/hadoop/datalog</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/opt/fengwork/hadoop/data</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:9001</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
4.修改配置文件etc/hadoop/yarn-site.xml
在<configuration>节点下添加以下属性
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
4.修改配置文件etc/hadoop/mapred-site.xml
cp mapred-site.xml.template mapred-site.xml
修改mapred-site.xml在<configuration>节点下添加以下属性
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
5.修改配置文件etc/hadoop/slaves
添加:
master
slave
6.修改配置文件etc/hadoop/hadoop-env.sh
在export JAVA_HOME=${JAVA_HOME}上面添加:
JAVA_HOME=/opt/jdk1.7.0_05
如:
JAVA_HOME=/opt/jdk1.7.0_05
export JAVA_HOME=${JAVA_HOME}
================
八.hadoop命令
================
命令:第一次启动前,需先格式化一次。
$HADOOP_PREFIX/bin/hdfs namenode -format <cluster_name>
如:
hdfs namenode -format hadoop_fengwork
或
/opt/hadoop-2.7.1/bin/hdfs namenode -format hadoop_fengwork
启动全部:
$HADOOP_PREFIX/sbin/start-all.sh
关闭全部:
$HADOOP_PREFIX/sbin/stop-all.sh
#解除hadoop的安全模式
$HADOOP_PREFIX/bin/hadoop dfsadmin -safemode leave
#进入hadoop的安全模式
$HADOOP_PREFIX/bin/hadoop dfsadmin –savemode enter
验证hadoop安装环境的命令
$HADOOP_PREFIX/bin/hadoop checknative -a
----------
基本操作命令
----------
创建目录
$HADOOP_PREFIX/bin/hadoop fs -mkdir /usr
上传文件:
$HADOOP_PREFIX/bin/hadoop fs -put ~/jdk-8u25-linux-x64.tar.gz /usr/feng
下载文件:
$HADOOP_PREFIX/bin/hadoop fs -get /usr/feng/jdk-8u25-linux-x64.tar.gz ~/Downloads/
***********
可以验证文件的是否被窜改了:
feng@master:~$ md5sum ~/Downloads/jdk-8u25-linux-x64.tar.gz
e145c03a7edc845215092786bcfba77e /home/feng/Downloads/jdk-8u25-linux-x64.tar.gz
feng@master:~$ md5sum ~/jdk-8u25-linux-x64.tar.gz
e145c03a7edc845215092786bcfba77e /home/feng/jdk-8u25-linux-x64.tar.gz
查看md5结果是一个,所以文件是没有问题的。
***********
展示文件列表:
$HADOOP_PREFIX/bin/hadoop fs -ls /
递归展示文件列表:
$HADOOP_PREFIX/bin/hadoop fs -ls -R /
展示文件内容:
$HADOOP_PREFIX/bin/hadoop fs -tail /feng/tmp/1462950193038
展示全部文件内容:注意查看测试的小文件可以,太大文件最好不用
$HADOOP_PREFIX/bin/hadoop fs -cat /feng/tmp/1462950193038
更多命令查看官网:
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html
================
命令 hdfs fsck 和 hdfs classpath
================
1.查看classpath:
$HADOOP_PREFIX/bin/hdfs classpath
结果:
/opt/hadoop-2.7.1/etc/hadoop:/opt/hadoop-2.7.1/share/hadoop/common/lib/*:/opt/hadoop-2.7.1/share/hadoop/common/*:/opt/hadoop-2.7.1/share/hadoop/hdfs:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/*:/opt/hadoop-2.7.1/share/hadoop/hdfs/*:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/*:/opt/hadoop-2.7.1/share/hadoop/yarn/*:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/*:/opt/hadoop-2.7.1/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar
fsck
Usage:
hdfs fsck <path>
[-list-corruptfileblocks |
[-move | -delete | -openforwrite]
[-files [-blocks [-locations | -racks]]]
[-includeSnapshots]
[-storagepolicies] [-blockId <blk_Id>]
COMMAND_OPTION Description
path Start checking from this path.
-delete Delete corrupted files.
-files Print out files being checked.
-files -blocks Print out the block report
-files -blocks -locations Print out locations for every block.
-files -blocks -racks Print out network topology for data-node locations.
-includeSnapshots Include snapshot data if the given path indicates a snapshottable directory or there are snapshottable directories under it.
-list-corruptfileblocks Print out list of missing blocks and files they belong to.
-move Move corrupted files to /lost+found.
-openforwrite Print out files opened for write.
-storagepolicies Print out storage policy summary for the blocks.
-blockId Print out information about the block.
例如执行如下命令:
hdfs fsck / -files -blocks
Connecting to namenode via http://master:50070/fsck?ugi=feng&files=1&blocks=1&path=%2F
FSCK started by feng (auth:SIMPLE) from /10.11.12.45 for path / at Mon May 16 14:00:56 CST 2016
/ <dir>
/usr <dir>
/usr/feng <dir>
/usr/feng/jdk-8u25-linux-x64.tar.gz 160872482 bytes, 2 block(s): OK
0. BP-85890032-10.11.12.45-1463366422938:blk_1073741825_1001 len=134217728 repl=1
1. BP-85890032-10.11.12.45-1463366422938:blk_1073741826_1002 len=26654754 repl=1
Status: HEALTHY
Total size: 160872482 B
Total dirs: 3
Total files: 1
Total symlinks: 0
Total blocks (validated): 2 (avg. block size 80436241 B)
Minimally replicated blocks: 2 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 1
Average block replication: 1.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 1
Number of racks: 1
FSCK ended at Mon May 16 14:00:56 CST 2016 in 4 milliseconds
The filesystem under path '/' is HEALTHY
================
测试map/reduce
================
1.获取WordCount代码:
来源:http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html
package com.feng.test.mr.example;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
2.进入eclipse的Java Project中的bin目录:
执行jar打包命令:
jar cf WordCount.jar com/feng/test/mr/example/WordCount*.class
3.把WordCount.jar放到master中
scp WordCount.jar feng@master:/home/feng/Downloads
4.执行map/reduce
hadoop jar ~/Downloads/WordCount.jar com.feng.test.mr.example.WordCount /usr/feng/news.txt /user/feng/output
================
================
================
================
================
================
eclipse Hadoop 插件想使用需要:
linux 客户端:修改 /etc/hosts 添加
10.11.12.45 master
10.11.12.47 slave
=============================
hadoop命令行 与job相关的:
命令行工具 •
1.查看 Job 信息:
hadoop job -list
2.杀掉 Job:
hadoop job –kill job_id
3.指定路径下查看历史日志汇总:
hadoop job -history output-dir
4.作业的更多细节:
hadoop job -history all output-dir
5.打印map和reduce完成百分比和所有计数器:
hadoop job –status job_id
6.杀死任务。被杀死的任务不会不利于失败尝试:
hadoop jab -kill-task <task-id>
7.使任务失败。被失败的任务会对失败尝试不利:
hadoop job -fail-task <task-id>
=============================
打包:
jar cf testsyf.jar com/feng/test/mr/easy/Test*.class
上传:
scp testsyf.jar feng@master:/home/feng/Downloads
执行:
/opt/hadoop-2.7.0/bin/hadoop jar ~/Downloads/testsyf.jar com.feng.test.mr/easy.TestCount /feng/mr/ /user/fengtemp1
=============================
管理信息页面:
http://master:8088 http://master:19888 http://master:50070
相关文章推荐
- 添物零基础到大型全栈架构师 Java实战及解析(实战篇)- 概述
- 添物零基础到大型全栈架构师 Java实战及解析(实战篇)- 概述
- 抛砖引玉:使用docker对Meteor应用进行产品级部署(入门篇)
- 零基础到架构师 不花钱学JavaEE(基础篇)- 概述
- 零基础到架构师 不花钱学JavaEE(基础篇)- 概述
- Linux下重启tomcat
- fopen()
- Linux 添加用户 分配权限
- RHEL7下Nginx配置文件详解(二)
- shell 监控cpu,memory,load average
- Linux查看关机时间
- linux中nginx重定向方法总结
- Eclipse上使用arm-xilinx-linux-gnueabi-gcc遇到的The selection cannot be launched的问题
- centos7开放端口
- CentOs中mysql的安装与配置
- 阿里云服务器 CentOS 7.0 64位 安装wdcp V3.0
- altium designer 中的top/bottom solder和top/bottom paste mask
- Nginx的连接处理方法
- Linux 工具-------搜狗输入法for linux
- keepalived+nginx+tomcat高可用