您的位置:首页 > 数据库

Spark SQL Working with JSON

2017-12-27 16:08 495 查看

Working with JSON

JSON data is most easily read-in as line delimied   JSON数据最容易以行分隔符的格式进行读取
Schema is inferred upon load     Schema信息能自动推导
If you want to flatten your JSON data,use the explode method 如果您想要使JSON数据变平,使用explode 方法
Access
nested-objects with dot syntax   可以使用点的语法访问内嵌的对象
scala> val json = spark.read.format("json").load("file:///home/hadoop/data/test.json")
json: org.apache.spark.sql.DataFrame = [name: string, nums: array<bigint>]

scala> json.show
+--------+----------------+
|    name|            nums|
+--------+----------------+
|zhangsan| [1, 2, 3, 4, 5]|
|    lisi|[6, 7, 8, 9, 10]|
+--------+----------------+
scala> json.createOrReplaceTempView("json_table")

scala>  spark.sql("select * from json_table").show
+--------+----------------+
|    name|       nums|
+--------+----------------+
|zhangsan| [1, 2, 3, 4, 5]|
|    lisi|[6, 7, 8, 9, 10]|
+--------+----------------+
scala>  spark.sql("select name,nums[1] from json_table").show
+--------+----------+
|  name | nums[1]  |
+--------+-------+
|zhangsan|      2|
|   lisi|      7|
+--------+-------+
#If you want to flatten your JSON data,use the explode method
scala>spark.sql("select name,explode(nums) from json_table").show
+--------+---+
|    name|col|
+--------+---+
|zhangsan|  1|
|zhangsan|  2
4000
|
|zhangsan|  3|
|zhangsan|  4|
|zhangsan|  5|
|   lisi|  6|
|   lisi|  7|
|   lisi|  8|
|   lisi|  9|
|   lisi|  10|
+--------+---+
test2.json数据
cat test2.json
{"name":"Yin", "address":{"city":"Columbus","state":"Ohio"}}
{"name":"Michael", "address":{"city":null, "state":"California"}}
scala> val json = spark.read.format("json").load("file:///home/hadoop/data/test2.json")
json: org.apache.spark.sql.DataFrame = [address: struct<city: string, state: string>, name: string]

scala> json.show
+-----------------+-------+
|      address |  name|
+-----------------+-------+
|  [Columbus,Ohio]|   Yin|
|[null,California]|Michael|
+-----------------+-------+
scala> json.createOrReplaceTempView("json_table")

scala>spark.sql("select * from json_table").show
+-----------------+-------+
|      address |  name|
+-----------------+-------+
|  [Columbus,Ohio]|   Yin|
|[null,California]|Michael|
+-----------------+-------+
#Access nested-objects with dot syntax 

scala>spark.sql("select name,address.city,address.state from json_table").show
+-------+--------+----------+
|  name|   city|   state|
+-------+--------+----------+
|   Yin|Columbus|    Ohio|
|Michael|   null|California|
+-------+--------+----------+
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: