您的位置:首页 > 其它

scala 实现类似reduceByWindown的效果

2015-12-10 09:46 453 查看
reduceByKey

spark 的做法,

package com.linewell.streaming

import org.apache.spark.{SparkContext, SparkConf}

/**
* Created by ctao on 2015/12/9.
*/
object WordCount  extends  App{
val conf = new SparkConf().setMaster("local").setAppName("wordcount")
val sc = new SparkContext(conf)
val seq = Seq("hello","world","hello","scala","scala","good")
sc.parallelize(seq).map(x => (x,1)).reduceByKey(_ + _).foreach(println)
}


纯scala做法

val seq = Seq("hello","world","hello","scala","scala","good")
for((k,v)<- seq.map(x => (x,1)).groupBy(_._1)) yield k -> v.map(_._2).sum
seq.map(x => (x,1)).groupBy(_._1).map(x => x._1 -> x._2.map(_._2).sum)


reduceByWindow

在这里也不是完全的reduceByWindow,

源数据:(“a”, 1), (“a”, 1), (“a”, 1), (“b”, 1), (“b”, 1), (“c”, 1), (“c”, 1), (“c”, 1), (“a”, 1), (“b”, 1), (“b”, 1)

目标数据

(a,3)

(b,2)

(c,3)

(a,1)

(b,2)

需求分析:

连续聚类,如果出现连续一致的key,则合并,中间出现中断则作为两个kv对出现

纯scala做法:

利用模式匹配

如果是第一个元素,则构建一个list

如果是一个List和最后一个元素的时候,比较list的最后一个元素的k和最后一个元素的k是否一致,一致则进行value合并

/**
* Created by ctao on 2015/12/8.
*/
object Test extends App {
val source = Seq(("a", 1), ("a", 1), ("a", 1), ("b", 1), ("b", 1), ("c", 1), ("c", 1), ("c", 1), ("a", 1), ("b", 1), ("b", 1))

source.foldLeft(List[(String,Int)]()) {
case (Nil, t) => List(t)
case (xs,j) => if(xs.last._1 == j._1) {
xs.init :+ (xs.last._1,xs.last._2+j._2)
} else  xs :+ j
}.foreach(println)
}
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  scala