Spark-Avro学习5之使用AvroReadSpecifyName存储AVRO文件时指定name和namespace
2016-05-02 11:29
405 查看
更多Spark学习examples代码请见:https://github.com/xubo245/SparkLearning
1.制定avro存储时的name和namespace
2.代码:
/**
* @author xubo
* @time 20160502
* ref https://github.com/databricks/spark-avro */
package org.apache.spark.avro.learning
import org.apache.spark.sql.SQLContext
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import java.text.SimpleDateFormat
import java.util.Date
import com.databricks.spark.avro._
/**
* specify the record name and namespace
*/
object AvroReadSpecifyName {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("AvroReadSpecifyName").setMaster("local")
val sc = new SparkContext(conf)
// import needed for the .avro method to be added
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
val df = sqlContext.read.avro("file/data/avro/input/episodes.avro")
df.show
val name = "AvroTest"
val namespace = "com.databricks.spark.avro"
val parameters = Map("recordName" -> name, "recordNamespace" -> namespace)
val iString = new SimpleDateFormat("yyyyMMddHHmmssSSS").format(new Date())
df.write.options(parameters).avro("file/data/avro/output/episodes/AvroReadSpecifyName" + iString)
val dfread = sqlContext.read
.format("com.databricks.spark.avro")
.load("file/data/avro/output/episodes/AvroReadSpecifyName" + iString)
dfread.show
val dfread2 = sqlContext.read.avro("file/data/avro/output/episodes/AvroReadSpecifyName" + iString)
dfread2.show
}
}
3.结果:
+--------------------+----------------+------+
| title| air_date|doctor|
+--------------------+----------------+------+
| The Eleventh Hour| 3 April 2010| 11|
| The Doctor's Wife| 14 May 2011| 11|
| Horror of Fang Rock|3 September 1977| 4|
| An Unearthly Child|23 November 1963| 1|
|The Mysterious Pl...|6 September 1986| 6|
| Rose| 26 March 2005| 9|
|The Power of the ...| 5 November 1966| 2|
| Castrolava| 4 January 1982| 5|
+--------------------+----------------+------+
+--------------------+----------------+------+
| title| air_date|doctor|
+--------------------+----------------+------+
| The Eleventh Hour| 3 April 2010| 11|
| The Doctor's Wife| 14 May 2011| 11|
| Horror of Fang Rock|3 September 1977| 4|
| An Unearthly Child|23 November 1963| 1|
|The Mysterious Pl...|6 September 1986| 6|
| Rose| 26 March 2005| 9|
|The Power of the ...| 5 November 1966| 2|
| Castrolava| 4 January 1982| 5|
+--------------------+----------------+------+
+--------------------+----------------+------+
| title| air_date|doctor|
+--------------------+----------------+------+
| The Eleventh Hour| 3 April 2010| 11|
| The Doctor's Wife| 14 May 2011| 11|
| Horror of Fang Rock|3 September 1977| 4|
| An Unearthly Child|23 November 1963| 1|
|The Mysterious Pl...|6 September 1986| 6|
| Rose| 26 March 2005| 9|
|The Power of the ...| 5 November 1966| 2|
| Castrolava| 4 January 1982| 5|
+--------------------+----------------+------+
4.文件内容:
Objavro.codecsnappyavro.schema�{"type":"record","name":"AvroTest","namespace":"com.databricks.spark.avro","fields":[{"name":"title","type":["string","null"]},{"name":"air_date","type":["string","null"]},{"name":"doctor","type":["int","null"]}]}
主要是文件内容里面指定了
1.制定avro存储时的name和namespace
2.代码:
/**
* @author xubo
* @time 20160502
* ref https://github.com/databricks/spark-avro */
package org.apache.spark.avro.learning
import org.apache.spark.sql.SQLContext
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import java.text.SimpleDateFormat
import java.util.Date
import com.databricks.spark.avro._
/**
* specify the record name and namespace
*/
object AvroReadSpecifyName {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("AvroReadSpecifyName").setMaster("local")
val sc = new SparkContext(conf)
// import needed for the .avro method to be added
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
val df = sqlContext.read.avro("file/data/avro/input/episodes.avro")
df.show
val name = "AvroTest"
val namespace = "com.databricks.spark.avro"
val parameters = Map("recordName" -> name, "recordNamespace" -> namespace)
val iString = new SimpleDateFormat("yyyyMMddHHmmssSSS").format(new Date())
df.write.options(parameters).avro("file/data/avro/output/episodes/AvroReadSpecifyName" + iString)
val dfread = sqlContext.read
.format("com.databricks.spark.avro")
.load("file/data/avro/output/episodes/AvroReadSpecifyName" + iString)
dfread.show
val dfread2 = sqlContext.read.avro("file/data/avro/output/episodes/AvroReadSpecifyName" + iString)
dfread2.show
}
}
3.结果:
+--------------------+----------------+------+
| title| air_date|doctor|
+--------------------+----------------+------+
| The Eleventh Hour| 3 April 2010| 11|
| The Doctor's Wife| 14 May 2011| 11|
| Horror of Fang Rock|3 September 1977| 4|
| An Unearthly Child|23 November 1963| 1|
|The Mysterious Pl...|6 September 1986| 6|
| Rose| 26 March 2005| 9|
|The Power of the ...| 5 November 1966| 2|
| Castrolava| 4 January 1982| 5|
+--------------------+----------------+------+
+--------------------+----------------+------+
| title| air_date|doctor|
+--------------------+----------------+------+
| The Eleventh Hour| 3 April 2010| 11|
| The Doctor's Wife| 14 May 2011| 11|
| Horror of Fang Rock|3 September 1977| 4|
| An Unearthly Child|23 November 1963| 1|
|The Mysterious Pl...|6 September 1986| 6|
| Rose| 26 March 2005| 9|
|The Power of the ...| 5 November 1966| 2|
| Castrolava| 4 January 1982| 5|
+--------------------+----------------+------+
+--------------------+----------------+------+
| title| air_date|doctor|
+--------------------+----------------+------+
| The Eleventh Hour| 3 April 2010| 11|
| The Doctor's Wife| 14 May 2011| 11|
| Horror of Fang Rock|3 September 1977| 4|
| An Unearthly Child|23 November 1963| 1|
|The Mysterious Pl...|6 September 1986| 6|
| Rose| 26 March 2005| 9|
|The Power of the ...| 5 November 1966| 2|
| Castrolava| 4 January 1982| 5|
+--------------------+----------------+------+
4.文件内容:
Objavro.codecsnappyavro.schema�{"type":"record","name":"AvroTest","namespace":"com.databricks.spark.avro","fields":[{"name":"title","type":["string","null"]},{"name":"air_date","type":["string","null"]},{"name":"doctor","type":["int","null"]}]}
主要是文件内容里面指定了
"name":"AvroTest","namespace":"com.databricks.spark.avro"
相关文章推荐
- Zoj 3947 Very Happy Great BG【水】
- 第三周编程题-数字特征值
- 动态添加Fragment
- HDU-ACM2051
- matlab冒号的用法
- setup django 4 test
- 第三周编程题-奇偶个数
- 设计模式初探(一)
- 文章标题
- 判断一个整系数高阶方程的无理根的个数(区域赛)
- 迷宫问题(maze problem)——深度优先(DFS)与广度优先搜索(BFS)求解
- 浅析字母识别的算法
- 第二周编程题-信号报告
- 迷宫问题(maze problem)——深度优先(DFS)与广度优先搜索(BFS)求解
- HDU-ACM2050
- 设计模式——原型模式
- 利用脚本获取mysql的tps,qps等状态信息
- Spark-Avro学习4之使用AvroWritePartitioned存储AVRO文件时进行划分
- 第二周编程题-时间换算
- Win7下编译Qt5.4OCI驱动