spark从hdfs上读取文件运行wordcount
2014-11-30 16:09
489 查看
1.配置环境说明
hadoop配置节点:sg202(namenode SecondaryNameNode) sg206(datanode) sg207(datanode) sg208(datanode)
spark配置节点:sg201(Master) sg211(Worker)
2.从hdfs上读取文件并运行wordcount
a. 登录hadoop的主节点sg202 将要进行wordcount的文件上传到hdfs上
[html]
view plaincopyprint?
[root@sg202 hadoop-1.0.4]# hadoop fs -put /home/hadoop-1.0.4/README.txt input
b. 登录spark的Master节点(sg201)进入sparkshell
[html]
view plaincopyprint?
[root@sg201 spark-0.7.3]# MASTER=spark://172.16.48.201:7077 ./spark-shell
c. 运行wordcount
[html]
view plaincopyprint?
scala> val
file=sc.textFile("hdfs://172.16.48.202:9000/user/root/input/README.txt")
[html]
view plaincopyprint?
scala> val
count=file.flatMap(line => line.split(" ")).map(word
=> (word,1)).reduceByKey(_+_)
[html]
view plaincopyprint?
scala> count.collect()
hadoop配置节点:sg202(namenode SecondaryNameNode) sg206(datanode) sg207(datanode) sg208(datanode)
spark配置节点:sg201(Master) sg211(Worker)
2.从hdfs上读取文件并运行wordcount
a. 登录hadoop的主节点sg202 将要进行wordcount的文件上传到hdfs上
[html]
view plaincopyprint?
[root@sg202 hadoop-1.0.4]# hadoop fs -put /home/hadoop-1.0.4/README.txt input
[root@sg202 hadoop-1.0.4]# hadoop fs -put /home/hadoop-1.0.4/README.txt input
b. 登录spark的Master节点(sg201)进入sparkshell
[html]
view plaincopyprint?
[root@sg201 spark-0.7.3]# MASTER=spark://172.16.48.201:7077 ./spark-shell
[root@sg201 spark-0.7.3]# MASTER=spark://172.16.48.201:7077 ./spark-shell
c. 运行wordcount
[html]
view plaincopyprint?
scala> val
file=sc.textFile("hdfs://172.16.48.202:9000/user/root/input/README.txt")
scala> val file=sc.textFile("hdfs://172.16.48.202:9000/user/root/input/README.txt")
[html]
view plaincopyprint?
scala> val
count=file.flatMap(line => line.split(" ")).map(word
=> (word,1)).reduceByKey(_+_)
scala> val count=file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)
[html]
view plaincopyprint?
scala> count.collect()
相关文章推荐
- spark1.6从hdfs上读取文件运行wordcount
- spark从hdfs上读取文件运行wordcount
- spark streaming 的wordcount程序,从hdfs上读取文件中的内容并计数
- Spark读取HDFS文件,文件格式为GB2312,实现WordCount示例
- spark读hdfs文件实现wordcount并将结果存回hdfs
- Spark1.4从HDFS读取文件运行Java语言WordCounts
- Spark来监控hdfs里的文件,并用wordcount计算
- Spark1.4从HDFS读取文件运行Java语言WordCounts
- spark从hdfs上读取文件运行wordcount
- Idea创建maven工程 上传提交Spark运行 WordCount 配置依赖插件文件 全步骤
- Spark WordCount 读写hdfs文件 (read file from hadoop hdfs and write output to hdfs)
- Spark1.4从HDFS读取文件运行Java语言WordCounts并将结果保存至HDFS
- Spark1.4从HDFS读取文件运行Java语言WordCounts并将结果保存至HDFS
- spark读hdfs文件实现wordcount并将结果存回hdfs
- Spark wordcount开发并提交到集群运行
- Spark学习笔记-如何运行wordcount(使用jar包)
- 【Spark亚太研究院系列丛书】Spark实战高手之路-第一章 构建Spark集群-配置Hadoop-伪分布模式并运行Wordcount(2)
- 在Spark上运行WordCount程序
- idea运行spark的wordcount与eclipse运行spark的wordcount示例,及本地运行sparkpi
- Spark教程-构建Spark集群-配置Hadoop单机模式并运行Wordcount(2)