您的位置:首页 > 其它

spark:学习杂记--24

2015-04-11 23:39 351 查看
1.在代码中定义SetMaster:

.setMaster(“spark://192.168.30.129:7077”)有可能出现内存不足

2.代替“/”运算符:

val x = new Rational(1,2)

X:Rational = 1/2

3.不建议在定义标识符结尾时使用下划线

4.Rational:

implicit def intToRational(x: Int) = new Rational(x)
val r = new Rational(2, 3)
r: Rational = 2/3
2 * r
rel: Rational = 4/3


5.把左面的格式转换为右面···

(int: x) => x + 1 或 increase = (x: Int) => {println(```)

`````````````

x + 1 }

6.所有集合类都能用到foreach方法,它以函数作为入参,并对每个元素调用该函数打印集合类中所有元素。

*foreach方法被定义在特质Iterable中,它是List,Set,Array和Map的共有特质

7.“_”占位符。

scala> val d = sum _
d: (Int, Int, Int) => Int = <function>
scala> d(10, 20, 30)
rel : Int = 60


8.求中位数:

package akria

import org.apache.log4j.{Level, Logger}
import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.SparkContext.rddToPairRDDFunctions
import scala.collection.mutable.ListBuffer

/**
* Created by sendoh on 2015/4/11.
*/
object Median {
def main(args: Array[String]): Unit ={
//
Logger.getLogger("org.apache.spark").setLevel(Level.WARN)
Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.OFF)
//
if(args.length != 3){
println("Usage: java -jar code.jar dependency_jars file_location save_location")
System.exit(0)
}
val jars = ListBuffer[String]()
args(0).split(',').map(jars += _)
val conf = new SparkConf().setAppName("Median").setSparkHome("/usr/local/spark-1.2.0-bin-hadoop2.4").setJars(jars)
val sc = new SparkContext(conf)
//
val data = sc.textFile("hdfs://localhost:9000/datatnt/textwordc.txt")
//将数据逻辑划分为10个区域,统计每个区域的数据量
val mappeddata = data.map(num => {(num / 1000, num)})
val count = mappeddata.reduceByKey((a, b) => {a + b}).collect()
//根据总的数据量,依次根据划分的区域序号由低到高依次累加,判断中位数落在哪个区域,并获取到中位数在区域中的偏移量
val sum_count = count.map(data => {data._2}).sum
var temp = 0
var index = 0
var mid = sum_count / 2
for(i <- 0 to 10){
temp = temp + count(i)
if(temp >= mid){
index = i
break
}
}
//中位数在区域中的偏移量
val offset = temp - mid
//获取中位数所在区域的偏移量为offset的数,也就是中位数
val result = mappeddata.filter(num => num._1 == index).takeOrdered(offset)
println("Median is " + result(offset))
sc.stop()
}

}
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: