Hadoop中自带的examples之wordcount应用案例
2015-07-29 16:12
316 查看
大家都知道hadoop中自带了很多例子,那么怎么用呢,今天主要测试下hadoop中的wordcount程序jar包:
1、首先启动hadoop
2、准备数据:vim words, 写入
hello tom
hello jerry
hello kitty
hello tom
hello bbb
3、将数据上传到HDFS
hadoop fs -put words /user/guest/words.txt
4、运行examples中自带的wordcount程序jar包
guest@master:/usr/hadoop/share/hadoop/mapreduce$ hadoop jar hadoop-mapreduce-examples-2.4.0.jar
An example program must be given as the first argument.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
dbcount: An example job that count the pageview counts from a database.
distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.
wordmean: A map/reduce program that counts the average length of the words in the input files.
wordmedian: A map/reduce program that counts the median length of the words in the input files.
wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
从这里可以看到wordcount程序;然后执行:
hadoop jar hadoop-mapreduce-examples-2.4.0.jar wordcount /user/guest/words.txt /user/guest/wordcount
查看结果:hello 5 Jerry 1 kitty 1 tom 2 bbb 1
1、首先启动hadoop
2、准备数据:vim words, 写入
hello tom
hello jerry
hello kitty
hello tom
hello bbb
3、将数据上传到HDFS
hadoop fs -put words /user/guest/words.txt
4、运行examples中自带的wordcount程序jar包
guest@master:/usr/hadoop/share/hadoop/mapreduce$ hadoop jar hadoop-mapreduce-examples-2.4.0.jar
An example program must be given as the first argument.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
dbcount: An example job that count the pageview counts from a database.
distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.
wordmean: A map/reduce program that counts the average length of the words in the input files.
wordmedian: A map/reduce program that counts the median length of the words in the input files.
wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
从这里可以看到wordcount程序;然后执行:
hadoop jar hadoop-mapreduce-examples-2.4.0.jar wordcount /user/guest/words.txt /user/guest/wordcount
查看结果:hello 5 Jerry 1 kitty 1 tom 2 bbb 1
相关文章推荐
- 详解HDFS Short Circuit Local Reads
- Hadoop_2.1.0 MapReduce序列图
- 使用Hadoop搭建现代电信企业架构
- 单机版搭建Hadoop环境图文教程详解
- hadoop常见错误以及处理方法详解
- Android 中动态加载.jar的实现步骤
- hadoop 单机安装配置教程
- java使用命令行打包JAR
- hadoop的hdfs文件操作实现上传文件到hdfs
- 用代码更新你的jar包
- hadoop实现grep示例分享
- 用Java连接sqlserver数据库时候几个jar包的区别分析
- 基于将Android工程做成jar包和资源文件的解决方法
- jar包双击执行程序的方法
- Apache Hadoop版本详解
- linux下搭建hadoop环境步骤分享
- 从Java的jar文件中读取数据的方法
- hadoop client与datanode的通信协议分析
- Jar打包用法详解
- 基于Java的打包jar、war、ear包的作用与区别详解