spark--transform算子--mapPartitionsWithIndex
2017-07-18 21:52
417 查看
import org.apache.spark.{SparkConf, SparkContext} import scala.collection.mutable.ArrayBuffer /** * Created by liupeng on 2017/6/15. */ object T_mapPartitionsWithIndex { System.setProperty("hadoop.home.dir","F:\\hadoop-2.6.5") def fun_index(index : Int, iter : Iterator[String]) : Iterator[String] = { var list = ArrayBuffer[String]() while (iter.hasNext) { val name : String = iter.next() var fs = index + ":" + name list += fs println(fs) } return list.iterator } def main(args: Array[String]): Unit = { val conf = new SparkConf().setAppName("mapPartitionsWithIndex_test").setMaster("local") val sc = new SparkContext(conf) //准备一下数据 val names: List[String] = List("liupeng", "xuliuxi", "xiaoma") val nameRDD = sc.parallelize(names, 2) // 按照分区以及索引遍历 //如果想知道谁分到了一起,mapPartitionsWithIndex这个算子可以拿到每个partition的index val nameWithPartionIndex = nameRDD.mapPartitionsWithIndex(fun_index) println(nameWithPartionIndex.count()) } }
运行结果:
0:liupeng
1:xuliuxi
1:xiaoma
3
相关文章推荐
- spark--transform算子--groupByKey
- spark--transform算子--reduceByKey
- spark--transform算子--cartesian
- spark--transform算子--intersection
- spark--transform算子--coalesce
- spark--transform算子--repartition
- spark--transform算子--cogroup
- spark--transform算子--join
- spark--transform算子--map
- spark--transform算子--sample
- spark--transform算子--mapPartitions
- spark--transform算子--union
- Spark的Transform算子和Action算子列举和示例
- spark--transform算子--parallelized
- 【Spark篇】---SparkStreaming算子操作transform和updateStateByKey
- spark--transform算子--distinct
- spark--transform算子--filter
- spark--transform算子--flatMap
- Spark算子:RDD基本转换操作(1)–map、flagMap、distinct
- spark算子介绍