您的位置：首页 > 其它

RDD基础学习-[2]RDD分区

2016-12-05 22:45 211 查看

简介

[1]coalesce:对RDD重新分区

def coalesce(numPartitions : scala.Int, shuffle : scala.Boolean = { /* compiledcode */ })(implicit ord : scala.Ordering[T](1)若减少分区,直接设置新的分区数即可(2)若增加分区个数,设置shuffle = true 应用:当将大数据集过滤处理后,分区中数据很小,可以减少分区数当将小文件保存到外部存储系统时,将分区数设置为1, 将文件保存在一个文件中当分区数小会造成CUP的浪费,适当增加分区

[2]repartition 和 coalesce 相似,只是将shuffle默认设置为true

def repartition(numPartitions : scala.Int)(implicit ord : scala.Ordering[T]package com.dt.spark.main.RDDLearn.RDDPartitionAPIimport org.apache.spark.{SparkConf, SparkContext}/*** Created by on 16/7/17.*///==========================================/*[1]coalesce:对RDD重新分区def coalesce(numPartitions : scala.Int, shuffle : scala.Boolean = { /* compiled code */ })(implicit ord : scala.Ordering[T](1)若减少分区,直接设置新的分区数即可(2)若增加分区个数,设置shuffle = true应用:当将大数据集过滤处理后,分区中数据很小,可以减少分区数当将小文件保存到外部存储系统时,将分区数设置为1, 将文件保存在一个文件中当分区数小会造成CUP的浪费,适当增加分区[2]repartition 和 coalesce 相似,只是将shuffle默认设置为truedef repartition(numPartitions : scala.Int)(implicit ord : scala.Ordering[T]*/object RDDPartitionAPI {def main(args: Array[String]) {val conf = new SparkConf()conf.setAppName("test")conf.setMaster("local")val sc = new SparkContext(conf)//==========================================/*构建RDD[1]从scala数据集构建RDD: parallelize()*/val listRDDExample = sc.parallelize(List("1","2","3"),2)//==========================================/*获取分区数*/val partitionsSzie = listRDDExample.partitions.sizeprintln(partitionsSzie)//2//==========================================/*coalesce:对RDD重新分区def coalesce(numPartitions : scala.Int, shuffle : scala.Boolean = { /* compiled code */ })(implicit ord : scala.Ordering[T](1)若减少分区,直接设置新的分区数即可(2)若增加分区个数,设置shuffle = true*/val rePartitionsSzie = listRDDExample.coalesce(1).partitions.sizeprintln(rePartitionsSzie)//1val rePartitionsSzie2 = listRDDExample.coalesce(3,true).partitions.sizeprintln(rePartitionsSzie2)//3sc.stop()}}

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： scala

相关文章推荐

新的分享

章节导航