您的位置：首页 > 数据库

第71课：Spark SQL窗口函数解密与实战

2016-05-23 00:00 260 查看

摘要: Spark 学习

伯克利官网介绍：

https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html

Using Window Functions

Spark SQL supports three kinds of window functions: ranking functions, analytic functions, and aggregate functions. The available ranking functions and analytic functions are summarized in the table below. For aggregate functions, users can use any existing aggregate function as a window function.

	SQL	DataFrame API
Ranking functions	rank	rank
dense_rank	denseRank
percent_rank	percentRank
ntile	ntile
row_number	rowNumber
Analytic functions	cume_dist	cumeDist
first_value	firstValue
last_value	lastValue
lag	lag
lead	lead

相关代码如下：

[code=language-scala]import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.{SparkConf, SparkContext}

/**
* Created by Bindy on 16-5-13.
*
* @author DT大数据梦工厂-学员
*
*/
object s71_SparkSQLWindowFunctionOPS {

def main(args: Array[String]) {
val conf = new SparkConf()
.setAppName("SparkSQLWindowFunction")
.setMaster("spark://cloud001:7077")
val sc = new SparkContext(conf)
sc.setLogLevel("WARN")

val hiveContext = new HiveContext(sc)

/**
* 如果要创建的表存在的话就删除，然后创建我们要导入数据的表
*/
hiveContext.sql("use default")
hiveContext.sql("DROP TABLE IF EXISTS scores") //删除同名的Table
hiveContext.sql("CREATE TABLE IF NOT EXISTS scores(name STRING, score INT) " +
"ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' LINES TERMINATED BY '\\n'") //创建自定义的Table
hiveContext.sql("LOAD DATA LOCAL INPATH '/home/hduser/IMF_Study/testData/topNGroup.txt' " +
"INTO TABLE scores")

/**
* 使用子查询的方式完成目标数据的提取，在目标数据内部使用窗口函数row_number来进行分组排序：
* partition by：指定窗口函数分组的key；
* order by：分组后进行排序；
*/
val result = hiveContext.sql("select name,score " +
"from (" +
"select " +
"name," +
"score," +
"row_number() over (partition by name order by score desc) rank " +
"from scores" +
") sub_scores " +
"where rank <=4")
result.show()
//把数据保存到Hive数据库中
hiveContext.sql("DROP TABLE IF EXISTS sortResultScores") //删除同名的Table
result.write.saveAsTable("sortResultScores")
}
}

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： spark hadoop scala 窗口函数

相关文章推荐

新的分享

章节导航