您的位置:首页 > 大数据 > 人工智能

spark pairRDD基本操作(三)——附带wordcount程序

2017-01-06 11:50 615 查看

spark pairRDD基本操作(三)——附带wordcount程序

由于pairRDD也是RDD,或者说是RDD的子类,所以pairRDD也有RDD的功能,下面是一个综合的例子,首先使用了filtermap,然后是一个简单的mapreduce程序,最后是一个简单的wordcount小程序。
本文主要参考书籍《O Reilly Learning spark》
好,下面上货。
val a = sc.parallelize(Array((1,2),(3,4),(3,6)))
a.collect().foreach(x => print(x + " "))
println(" ")
//进行filter操作
val b = a.filter({
case (key,value) => {value<5 && key < 2}
})
b.collect().foreach(x => print(x + " "))
println(" ")

val c = sc.parallelize(Array(("panda",0),("pink",3),("pirate",3),("panda",1),("pink",4)))
c.collect().foreach(x => print(x + " "))
println(" ")
val d = c.mapValues(x => (x,1))
d.collect().foreach(x => print(x + " "))
println(" ")
val e = d.reduceByKey((x,y) => (x._1 + y._1, x._2 + y._2))
e.collect().foreach(x => print(x + " "))
println(" ")
//wordcount exmaple
val input = sc.textFile("hdfs://192.168.1.221:9000/wordcountinput/123")
input.collect().foreach(x => print(x + ","))
println(" ")
//分步生成
val words = input.flatMap(x => x.split(" "))
words.collect().foreach(x => print(x + ","))
println(" ")

val result1 = words.map(x => (x, 1))
result1.collect().foreach(x => print(x + " "))
println(" ")

val result2 = result1.reduceByKey((x,y)=>x+y)
result2.collect().foreach(x => print(x + " "))
println(" ")
//直接生成
val result3 = input.flatMap(x => x.split(" ")).map(x => (x,1)).reduceByKey((x,y)=>x + y)
result3.collect().foreach(x => print(x + " "))
println(" ")


下面是运行截图:



其中的mapreduce的解释在书中用图示进行了说明,这里不再赘述,请看图:

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  scala spark