hadoop 2.7.3本地环境运行官方wordcount
2017-01-06 18:38
615 查看
hadoop 2.7.3本地环境运行官方wordcount
基本环境:系统:win7
虚机环境:virtualBox
虚机:centos 7
hadoop版本:2.7.3
本次先以独立模式(本地模式)来运行。
参考:
hadoop docs
1 hadoop 安装
java环境yum install java-1.8.0-openjdk
hadoop下载压缩包并安装
mkdir ~/hadoop/ cd ~/hadoop/ # http://apache.fayea.com/hadoop/common/hadoop-2.7.3/ curl http://apache.fayea.com/apache/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz -O # 如果下载出现中断,则可以使用-C参数继续下载 ls -l #-rw-rw-r--. 1 jungle jungle 165297920 Jan 6 13:10 hadoop-2.7.3.tar.gz curl http://apache.fayea.com/apache/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz -C 165297920 -O # ** Resuming transfer from byte position 165297920 …… # download checksum curl http://apache.fayea.com/hadoop/core/hadoop-2.7.3/hadoop-2.7.3.tar.gz.mds -O # check cat hadoop-2.7.3.tar.gz.mds md5sum hadoop-2.7.3.tar.gz sha256sum hadoop-2.7.3.tar.gz tar -zxf hadoop-2.7.3.tar.gz mv hadoop-2.7.3 hadoop-local
2 配置环境
因为是使用本地模式,需要配置的项非常少,只需要涉及环境变量。# java path whereis java java: /usr/bin/java /usr/lib/java /etc/java /usr/share/java ls -l /usr/bin/java lrwxrwxrwx. 1 root root 22 Dec 30 12:26 /usr/bin/java -> /etc/alternatives/java ls -l /etc/alternatives/java lrwxrwxrwx. 1 root root 73 Dec 30 12:26 /etc/alternatives/java -> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.111-2.b15.el7_3.x86_64/jre/bin/java
在~/.bashrc中增加如下三行
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.111-2.b15.el7_3.x86_64/jre export HADOOP_INSTALL=/home/jungle/hadoop/hadoop-local export PATH=$PATH:$HADOOP_INSTALL/bin:$HADOOP_INSTALL/sbin
确认hadoop可用:
hadoop version Hadoop 2.7.3 Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccff Compiled by root on 2016-08-18T01:41Z Compiled with protoc 2.5.0 From source with checksum 2e4ce5f957ea4db193bce3734ff29ff4 This command was run using /home/jungle/hadoop/hadoop-local/share/hadoop/common/hadoop-common-2.7.3.jar
2 使用linux文件系统做测试
直接使用linux的文件系统做测试,即不使用hadoop fs相关命令。2.1 wordcount
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar An example program must be given as the first argument. Valid program names are: aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files. aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files. bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi. dbcount: An example job that count the pageview counts from a database. # ... wordcount: A map/reduce program that counts the words in the input files. # ... bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount # Usage: wordcount <in> [<in>...] <out>
2.2 准备数据
mkdir -p dataLocal/input/ cd dataLocal/input/ echo "hello world, I am jungle. bye world" > file1.txt echo "hello hadoop. hello jungle. bye hadoop." > file2.txt echo "the great software is hadoop." >> file2.txt
2.3 运行
cd /home/jungle/hadoop/hadoop-local/ hadoop jar /home/jungle/hadoop/hadoop-local/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount dataLocal/input/ dataLocal/outout # dataLocal/outout 当前不存在,由程序生成 echo $? ls -la dataLocal/outout/ total 12 drwxrwxr-x. 2 jungle jungle 84 Jan 6 16:53 . drwxrwxr-x. 4 jungle jungle 31 Jan 6 16:53 .. -rw-r--r--. 1 jungle jungle 82 Jan 6 16:53 part-r-00000 -rw-r--r--. 1 jungle jungle 12 Jan 6 16:53 .part-r-00000.crc -rw-r--r--. 1 jungle jungle 0 Jan 6 16:53 _SUCCESS -rw-r--r--. 1 jungle jungle 8 Jan 6 16:53 ._SUCCESS.crc # 结果 cat dataLocal/outout//part-r-00000 I 1 am 1 bye 2 great 1 hadoop. 3 hello 3 is 1 jungle. 2 software 1 the 1 world. 2
2.4 最后是运行日志
通过日志,可以了解运行时的一些常用参数和配置。17/01/06 16:53:26 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 17/01/06 16:53:26 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 17/01/06 16:53:26 INFO input.FileInputFormat: Total input paths to process : 2 17/01/06 16:53:26 INFO mapreduce.JobSubmitter: number of splits:2 17/01/06 16:53:27 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1147390429_0001 17/01/06 16:53:27 INFO mapreduce.Job: The url to track the job: http://localhost:8080/ 17/01/06 16:53:27 INFO mapreduce.Job: Running job: job_local1147390429_0001 17/01/06 16:53:27 INFO mapred.LocalJobRunner: OutputCommitter set in config null 17/01/06 16:53:27 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 17/01/06 16:53:27 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter 17/01/06 16:53:27 INFO mapred.LocalJobRunner: Waiting for map tasks 17/01/06 16:53:27 INFO mapred.LocalJobRunner: Starting task: attempt_local1147390429_0001_m_000000_0 17/01/06 16:53:27 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 17/01/06 16:53:27 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 17/01/06 16:53:27 INFO mapred.MapTask: Processing split: file:/home/jungle/hadoop/hadoop-local/dataLocal/input/file2.txt:0+70 17/01/06 16:53:27 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) 17/01/06 16:53:27 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100 17/01/06 16:53:27 INFO mapred.MapTask: soft limit at 83886080 17/01/06 16:53:27 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600 17/01/06 16:53:27 INFO mapred.MapTask: kvstart = 26214396; length = 6553600 17/01/06 16:53:27 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 17/01/06 16:53:27 INFO mapred.LocalJobRunner: 17/01/06 16:53:27 INFO mapred.MapTask: Starting flush of map output 17/01/06 16:53:27 INFO mapred.MapTask: Spilling map output 17/01/06 16:53:27 INFO mapred.MapTask: bufstart = 0; bufend = 114; bufvoid = 104857600 17/01/06 16:53:27 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214356(104857424); length = 41/6553600 17/01/06 16:53:27 INFO mapred.MapTask: Finished spill 0 17/01/06 16:53:27 INFO mapred.Task: Task:attempt_local1147390429_0001_m_000000_0 is done. And is in the process of committing 17/01/06 16:53:27 INFO mapred.LocalJobRunner: map 17/01/06 16:53:27 INFO mapred.Task: Task 'attempt_local1147390429_0001_m_000000_0' done. 17/01/06 16:53:27 INFO mapred.LocalJobRunner: Finishing task: attempt_local1147390429_0001_m_000000_0 17/01/06 16:53:27 INFO mapred.LocalJobRunner: Starting task: attempt_local1147390429_0001_m_000001_0 17/01/06 16:53:27 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 17/01/06 16:53:27 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 17/01/06 16:53:27 INFO mapred.MapTask: Processing split: file:/home/jungle/hadoop/hadoop-local/dataLocal/input/file1.txt:0+37 17/01/06 16:53:27 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) 17/01/06 16:53:27 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100 17/01/06 16:53:27 INFO mapred.MapTask: soft limit at 83886080 17/01/06 16:53:27 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600 17/01/06 16:53:27 INFO mapred.MapTask: kvstart = 26214396; length = 6553600 17/01/06 16:53:27 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 17/01/06 16:53:27 INFO mapred.LocalJobRunner: 17/01/06 16:53:27 INFO mapred.MapTask: Starting flush of map output 17/01/06 16:53:27 INFO mapred.MapTask: Spilling map output 17/01/06 16:53:27 INFO mapred.MapTask: bufstart = 0; bufend = 65; bufvoid = 104857600 17/01/06 16:53:27 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214372(104857488); length = 25/6553600 17/01/06 16:53:27 INFO mapred.MapTask: Finished spill 0 17/01/06 16:53:27 INFO mapred.Task: Task:attempt_local1147390429_0001_m_000001_0 is done. And is in the process of committing 17/01/06 16:53:27 INFO mapred.LocalJobRunner: map 17/01/06 16:53:27 INFO mapred.Task: Task 'attempt_local1147390429_0001_m_000001_0' done. 17/01/06 16:53:27 INFO mapred.LocalJobRunner: Finishing task: attempt_local1147390429_0001_m_000001_0 17/01/06 16:53:27 INFO mapred.LocalJobRunner: map task executor complete. 17/01/06 16:53:27 INFO mapred.LocalJobRunner: Waiting for reduce tasks 17/01/06 16:53:27 INFO mapred.LocalJobRunner: Starting task: attempt_local1147390429_0001_r_000000_0 17/01/06 16:53:27 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 17/01/06 16:53:27 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 17/01/06 16:53:27 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@2aa26fdb 17/01/06 16:53:27 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=363285696, maxSingleShuffleLimit=90821424, mergeThreshold=239768576, ioSortFactor=10, memToMemMergeOutputsThreshold=10 17/01/06 16:53:27 INFO reduce.EventFetcher: attempt_local1147390429_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events 17/01/06 16:53:27 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1147390429_0001_m_000000_0 decomp: 98 len: 102 to MEMORY 17/01/06 16:53:27 INFO reduce.InMemoryMapOutput: Read 98 bytes from map-output for attempt_local1147390429_0001_m_000000_0 17/01/06 16:53:27 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 98, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->98 17/01/06 16:53:27 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1147390429_0001_m_000001_0 decomp: 68 len: 72 to MEMORY 17/01/06 16:53:27 WARN io.ReadaheadPool: Failed readahead on ifile EBADF: Bad file descriptor at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native Method) at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267) at org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146) at org.apache.hadoop.io.ReadaheadPool$ReadaheadRequestImpl.run(ReadaheadPool.java:206) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 17/01/06 16:53:27 INFO reduce.InMemoryMapOutput: Read 68 bytes from map-output for attempt_local1147390429_0001_m_000001_0 17/01/06 16:53:27 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 68, inMemoryMapOutputs.size() -> 2, commitMemory -> 98, usedMemory ->166 17/01/06 16:53:27 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning 17/01/06 16:53:27 WARN io.ReadaheadPool: Failed readahead on ifile EBADF: Bad file descriptor at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native Method) at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267) at org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146) at org.apache.hadoop.io.ReadaheadPool$ReadaheadRequestImpl.run(ReadaheadPool.java:206) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 17/01/06 16:53:27 INFO mapred.LocalJobRunner: 2 / 2 copied. 17/01/06 16:53:27 INFO reduce.MergeManagerImpl: finalMerge called with 2 in-memory map-outputs and 0 on-disk map-outputs 17/01/06 16:53:27 INFO mapred.Merger: Merging 2 sorted segments 17/01/06 16:53:27 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 156 bytes 17/01/06 16:53:27 INFO reduce.MergeManagerImpl: Merged 2 segments, 166 bytes to disk to satisfy reduce memory limit 17/01/06 16:53:27 INFO reduce.MergeManagerImpl: Merging 1 files, 168 bytes from disk 17/01/06 16:53:27 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce 17/01/06 16:53:27 INFO mapred.Merger: Merging 1 sorted segments 17/01/06 16:53:27 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 160 bytes 17/01/06 16:53:27 INFO mapred.LocalJobRunner: 2 / 2 copied. 17/01/06 16:53:27 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords 17/01/06 16:53:27 INFO mapred.Task: Task:attempt_local1147390429_0001_r_000000_0 is done. And is in the process of committing 17/01/06 16:53:27 INFO mapred.LocalJobRunner: 2 / 2 copied. 17/01/06 16:53:27 INFO mapred.Task: Task attempt_local1147390429_0001_r_000000_0 is allowed to commit now 17/01/06 16:53:27 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1147390429_0001_r_000000_0' to file:/home/jungle/hadoop/hadoop-local/dataLocal/outout/_temporary/0/task_local1147390429_0001_r_000000 17/01/06 16:53:27 INFO mapred.LocalJobRunner: reduce > reduce 17/01/06 16:53:27 INFO mapred.Task: Task 'attempt_local1147390429_0001_r_000000_0' done. 17/01/06 16:53:27 INFO mapred.LocalJobRunner: Finishing task: attempt_local1147390429_0001_r_000000_0 17/01/06 16:53:27 INFO mapred.LocalJobRunner: reduce task executor complete. 17/01/06 16:53:28 INFO mapreduce.Job: Job job_local1147390429_0001 running in uber mode : false 17/01/06 16:53:28 INFO mapreduce.Job: map 100% reduce 100% 17/01/06 16:53:28 INFO mapreduce.Job: Job job_local1147390429_0001 completed successfully 17/01/06 16:53:28 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=889648 FILE: Number of bytes written=1748828 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 Map-Reduce Framework Map input records=3 Map output records=18 Map output bytes=179 Map output materialized bytes=174 Input split bytes=256 Combine input records=18 Combine output records=14 Reduce input groups=11 Reduce shuffle bytes=174 Reduce input records=14 Reduce output records=11 Spilled Records=28 Shuffled Maps =2 Failed Shuffles=0 Merged Map outputs=2 GC time elapsed (ms)=43 Total committed heap usage (bytes)=457912320 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=107 File Output Format Counters Bytes Written=94
EBADF: Bad file descriptor。这两个错误,从网上看可以忽略。
相关文章推荐
- hadoop 2.7.3本地环境运行官方wordcount-基于HDFS
- hadoop 2.7.3伪分布式环境运行官方wordcount
- Linux CentOS 7下在Hadoop2.7.3全分布式环境编译运行WordCount.java
- hadoop2.7.3 Windows eclipse开发环境搭建及WordCount实例运行
- Hadoop MapReduce案例word count本地环境运行时遇到的一些问题
- Hadoop2.x实战:Eclipse本地开发环境搭建与本地运行wordcount实例
- hadoop on yarn 入门系列1-伪分布式环境搭建并运行wordcount
- 安装Hadoop,搭建jdk环境,运行wordcount程序
- Windows下Cygwin环境的Hadoop安装(3)- 运行hadoop中的wordcount实例遇到的问题和解决方法
- Hadoop 1.x 使用eclipse集成环境运行WordCount程序
- ubuntu系统下eclipse配置hadoop开发环境并运行wordcount程序
- hadoop 安装+本地运行wordCount
- Windows下Cygwin环境的Hadoop安装(3)- 运行hadoop中的wordcount实例遇到的问题和解决方法
- 配置 hadoop 开发环境+运行 wordcount 程序
- hadoop2.7.3 编译运行WordCount.java
- eclipse配置hadoop开发环境并运行WordCount小程序
- Windows下Cygwin环境的Hadoop安装(3)- 运行hadoop中的wordcount实例遇到的问题和解决方法
- Windows 使用Eclipse配置连接hadoop,编译运行MapReduce --本地调试WordCount
- hadoop基础----hadoop实战(四)-----myeclipse开发MapReduce---myeclipse搭建hadoop开发环境并运行wordcount
- hadoop2.6.3学习第三节:win7+myeclipse2014配置开发环境,和运行WordCount