您的位置:首页 > 运维架构

验证hadoop伪分布式

2015-07-08 09:02 357 查看
启动hadoop,调用jps命令,会看到总共有6个进程在运行。首先介绍下 hadoop进程的作用及地位意义

1)ResourceManager YARN(Yet Another Resource Negotiate)的老大

2)SecondaryNameNode NameNode的助理,

3)NameNode HDFS的老大,“仓库管理员”

4)DataNode HDFS的小弟,“具体的仓库”

5)Jps

6)NodeManager YARN(Yet Another Resource Negotiate)的小弟

这些进程在一台机器上并不好,会互相争抢资源,最后是分布在不同的机器上。

为了验证HDFS好不好用,试试上传一个文件。操作hadoop的命令在bin目录下,sbin目录下是hadoop的启动停止命令。

如下命令,转到hadoop的目录文件下

[root@itcast01 ~]# cd /itcast/hadoop-2.4.1/bin/
[root@itcast01 bin]# ls
container-executor  hdfs      mapred.cmd               yarn
hadoop              hdfs.cmd  rcc                      yarn.cmd
hadoop.cmd          mapred    test-container-executor
可以看到里面有很多脚本,老的程序员习惯用 hadoop ,现在我们可以用 hdfs 、 yarn 等命令

不知道怎么样,可以用帮助,例如想要查看hadoop怎么用如下

[root@itcast01 bin]# hadoop
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
fs                   run a generic filesystem user client
version              print the version
jar <jar>            run a jar file
checknative [-a|-h]  check native hadoop and compression libraries availability
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath            prints the class path needed to get the
Hadoop jar and the required libraries
daemonlog            get/set the log level for each daemon
or
CLASSNAME            run the class named CLASSNAME

Most commands print help when invoked w/o parameters.
在这里,可以调用 hadoop version 命令,查看hadoop的版本信息,可以用hadoop fs

查看本机HDFS系统含有的文件,发现在上传文件之前,一个文件夹都没有。

[root@itcast01 bin]# hadoop fs -ls hdfs://itcast01:9000/
[root@itcast01 bin]#


上传本地的/root/install.log文件到HDFS并且改名字为log.txt,然后查看 hdfs://itcast01:9000/下的文件,会看到这个log.txt文件,这就将一个文件从本地文件系统上传到HDFS上了

[root@itcast01 bin]# hadoop fs -put /root/install.log hdfs://itcast01:9000/log.txt
[root@itcast01 bin]# hadoop fs -ls hdfs://itcast01:9000/
Found 1 items
-rw-r--r--   1 root supergroup      49448 2015-07-08 13:43 hdfs://itcast01:9000/log.txt
[root@itcast01 bin]#


还可以通过hdfs的文件管理界面来查看,在浏览器输入 192.168.8.118:50070 会看到 hdfs的管理界面,然后,在utility-------browser the file system 里面看到里面有一个叫log.txt的文件。主要的参数包括

permission:权限

owner:拥有着,属于哪个用户

group:组

Size:文件大小

Replication:副本 这里副本的数量实在hdfs-set.xml里面设置的副本数量为1,因为是伪分布式,现在只有一台机器,因此只保存了一个副本。

我们尝试现在这个log.txt 点开log.txt -------download后,发现先网址跳转到http://itcast01:50075/webhdfs/v1/log.txt?op=OPEN&namenoderpcaddress=itcast01:9000&offset=0,无法打开该页,发现跳转后的网址访问的是itcast而不是IP地址了,所以需要我们配置windows下的etc文件。地址是C:\Windows\System32\drivers\etc\hosts文件 在最后一行追加
“192.168.8.118 itcast01” 注意修改完host文件后,文件类型不要改变。改完以后,点击download就能够下载文件了。

将文件从HDFS下载到本机
[root@itcast01 bin]# hadoop fs -get hdfs://itcast01:9000/log.txt /home/123.txt         下载HDFS上的文件到
[root@itcast01 bin]# cd /home
[root@itcast01 home]# ls
123.txt  lost+found  wec                                                                有了123.txt说明已经下载完成,可以通过 more 123.txt命令查看具体内容
[root@itcast01 home]#


至此为止,HDFS文件的上传、查看、下载均没有问题了,说明HDFS是可以使用的,下面验证YARN能否使用

通过管理界面来验证YARN ,管理界面为192.168.8.118:8088

在管理界面,会看到活跃节点 Active Nodes 有1活跃的节点,这个节点代表子节点,也就是NodeManager。YARN的小弟叫做NodeManager。YARN的老大叫做ResourceManager

验证MapReduce的统计功能

Linux自带的wc命令可以完成统计 贴代码如下

首选是新建一个word.txt文件
[root@itcast01 ~]# vim word.txt
hello tom
hello jerry
hello tom
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
"word.txt" 3L, 32C written


统计word.txt里面有多少个单词,看代码

[root@itcast01 ~]# wc word.txt
3  6 32 word.txt                                              3行  6个单词  32个字符


hadoop可以完成词频的统计功能,也就是统计相同词频出现的次数

下面用mapreduce统计这些单词,因为MapReduce设计之初是打算用来做海量统计的,而海量数据应该存在HDFS上面,因此应该将word.txt上传到HDFS上。代码如下
[root@itcast01 ~]# hadoop fs -put /root/word.txt hdfs://itcast01:9000/word.avi              linux下文件没有格式,因此.avi .txt其实都是文本文件u
[root@itcast01 ~]#
仅仅是应mapReduce统计词频,发现很慢,主要是因为需要启动,需要读取文件,操作对象是海量数据,当有海量数据的时候,MR的优势就会显现出来。

spark是内存计算,可以运行在yarn上,发现yarn很牛X,

利用MR统计次词频,命令原码以及结果如下图

[root@itcast01 ~]# wc word.txt                    linux自带的统计文本词语的命令
3  6 32 word.txt
[root@itcast01 ~]# hadoop fs -put /root/word.txt hdfs://itcast01:9000/word.avi        上传文件到HHDFS
[root@itcast01 ~]# cd /itcast/hadoop-2.4.1/share/hadoop/                              转到HHadoop文件夹
[root@itcast01 hadoop]# ls
common  hdfs  httpfs  mapreduce  tools  yarn
[root@itcast01 hadoop]# cd mapreduce/                                                 转到MR文件夹
[root@itcast01 mapreduce]# ls                                                         发现有很多JAR包
hadoop-mapreduce-client-app-2.4.1.jar
hadoop-mapreduce-client-common-2.4.1.jar
hadoop-mapreduce-client-core-2.4.1.jar
hadoop-mapreduce-client-hs-2.4.1.jar
hadoop-mapreduce-client-hs-plugins-2.4.1.jar
hadoop-mapreduce-client-jobclient-2.4.1.jar
hadoop-mapreduce-client-jobclient-2.4.1-tests.jar
hadoop-mapreduce-client-shuffle-2.4.1.jar
hadoop-mapreduce-examples-2.4.1.jar
lib
lib-examples
sources
[root@itcast01 mapreduce]# hadoop                                                     看看hadoop文件下有什么命令,输入hadoop回车,会有命令提示的
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
fs                   run a generic filesystem user client
version              print the version
jar <jar>            run a jar file
checknative [-a|-h]  check native hadoop and compression libraries availability
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath            prints the class path needed to get the
Hadoop jar and the required libraries
daemonlog            get/set the log level for each daemon
or
CLASSNAME            run the class named CLASSNAME

Most commands print help when invoked w/o parameters.
[root@itcast01 mapreduce]# hadoop jar                                                 看需要什么参数
RunJar jarFile [mainClass] args...
[root@itcast01 mapreduce]# hadoop jar hadoop-mapreduce-examples-2.4.1.jar
An example program must be given as the first argument.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
dbcount: An example job that count the pageview counts from a database.
distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.
wordmean: A map/reduce program that counts the average length of the words in the input files.
wordmedian: A map/reduce program that counts the median length of the words in the input files.
wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
[root@itcast01 mapreduce]# hadoop jar hadoop-mapreduce-examples-2.4.1.jar wordcount
Usage: wordcount <in> <out>
[root@itcast01 mapreduce]# hadoop jar hadoop-mapreduce-examples-2.4.1.jar wordcount hdfs://itcast01:9000/word.avi hdfs://itcast01:9000/out  <span style="color:#ff0000;">此时立刻克隆一个连接,打开jps,后面有对此的介绍</span>
15/07/08 21:12:25 INFO client.RMProxy: Connecting to ResourceManager at itcast01/192.168.8.118:8032
15/07/08 21:12:48 INFO input.FileInputFormat: Total input paths to process : 1
15/07/08 21:12:56 INFO mapreduce.JobSubmitter: number of splits:1
15/07/08 21:13:01 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1436259707754_0001
15/07/08 21:13:23 INFO impl.YarnClientImpl: Submitted application application_1436259707754_0001
15/07/08 21:13:23 INFO mapreduce.Job: The url to track the job: http://itcast01:8088/proxy/application_1436259707754_0001/ 15/07/08 21:13:23 INFO mapreduce.Job: Running job: job_1436259707754_0001
15/07/08 21:14:48 INFO mapreduce.Job: Job job_1436259707754_0001 running in uber mode : false
15/07/08 21:14:48 INFO mapreduce.Job:  map 0% reduce 0%
15/07/08 21:16:06 INFO mapreduce.Job:  map 100% reduce 0%
15/07/08 21:16:56 INFO mapreduce.Job:  map 100% reduce 100%
15/07/08 21:16:59 INFO mapreduce.Job: Job job_1436259707754_0001 completed successfully
15/07/08 21:17:00 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=40
FILE: Number of bytes written=185833
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=126
HDFS: Number of bytes written=22
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=76880
Total time spent by all reduces in occupied slots (ms)=41905
Total time spent by all map tasks (ms)=76880
Total time spent by all reduce tasks (ms)=41905
Total vcore-seconds taken by all map tasks=76880
Total vcore-seconds taken by all reduce tasks=41905
Total megabyte-seconds taken by all map tasks=78725120
Total megabyte-seconds taken by all reduce tasks=42910720
Map-Reduce Framework
Map input records=3
Map output records=6
Map output bytes=56
Map output materialized bytes=40
Input split bytes=94
Combine input records=6
Combine output records=3
Reduce input groups=3
Reduce shuffle bytes=40
Reduce input records=3
Reduce output records=3
Spilled Records=6
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=539
CPU time spent (ms)=5880
Physical memory (bytes) snapshot=320163840
Virtual memory (bytes) snapshot=1685929984
Total committed heap usage (bytes)=136122368
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=32
File Output Format Counters
Bytes Written=22
[root@itcast01 mapreduce]#


至此,结果统计完毕

下面是查看结果文件

[root@itcast01 mapreduce]# hadoop fs -ls hdfs://itcast01:9000/                            转到HDFS文件上,看内容
Found 4 items
-rw-r--r--   1 root supergroup      49448 2015-07-08 13:43 hdfs://itcast01:9000/log.txt
drwxr-xr-x   - root supergroup          0 2015-07-08 21:16 hdfs://itcast01:9000/out
drwx------   - root supergroup          0 2015-07-08 21:12 hdfs://itcast01:9000/tmp
-rw-r--r--   1 root supergroup         32 2015-07-08 19:57 hdfs://itcast01:9000/word.avi
[root@itcast01 mapreduce]# hadoop fs -ls hdfs://itcast01:9000/out                         前面设置的输出到out文件夹
Found 2 items
-rw-r--r--   1 root supergroup          0 2015-07-08 21:16 hdfs://itcast01:9000/out/_SUCCESS
-rw-r--r--   1 root supergroup         22 2015-07-08 21:16 hdfs://itcast01:9000/out/part-r-00000
[root@itcast01 mapreduce]# hadoop fs -cat hdfs://itcast01:9000/out/part-r-00000           out文件夹下的part-r-00000是结果文件
hello   3
jerry   1
tom     2
[root@itcast01 mapreduce]#


发现统计结果是对的 hello 3个,jerry 1个,tom2个

发现统计的时候很慢,因为用的是伪分布式,从始至终就还是一台机器在运行,还是一个人干活,因此伪分布式还是不能利用更多的资源,所以还是慢的

在上面运行hadoop的wordcount的时候,我们打开了一个链接,来看看她的JPS进程。

jps的意思就是查看java进程运行状态。

5423 NameNode
5832 ResourceManager
5515 DataNode
14498 RunJar        使用hadoop jar命令,启动一个java程序
5927 NodeManager
14604 Jps
5683 SecondaryNameNode


资源的分配交给 resource Manager 进程

任务的监控交给 MRAppMaster 进程

YarnChild进程我没有找到呢。。。里面运行着map对象会在reduce对象
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: