Spark SQL Working with JSON
2017-12-27 16:08
495 查看
Working with JSON
JSON data is most easily read-in as line delimied JSON数据最容易以行分隔符的格式进行读取Schema is inferred upon load Schema信息能自动推导
If you want to flatten your JSON data,use the explode method 如果您想要使JSON数据变平,使用explode 方法
Access
nested-objects with dot syntax 可以使用点的语法访问内嵌的对象
scala> val json = spark.read.format("json").load("file:///home/hadoop/data/test.json") json: org.apache.spark.sql.DataFrame = [name: string, nums: array<bigint>] scala> json.show +--------+----------------+ | name| nums| +--------+----------------+ |zhangsan| [1, 2, 3, 4, 5]| | lisi|[6, 7, 8, 9, 10]| +--------+----------------+
scala> json.createOrReplaceTempView("json_table") scala> spark.sql("select * from json_table").show +--------+----------------+ | name| nums| +--------+----------------+ |zhangsan| [1, 2, 3, 4, 5]| | lisi|[6, 7, 8, 9, 10]| +--------+----------------+
scala> spark.sql("select name,nums[1] from json_table").show +--------+----------+ | name | nums[1] | +--------+-------+ |zhangsan| 2| | lisi| 7| +--------+-------+
#If you want to flatten your JSON data,use the explode method
scala>spark.sql("select name,explode(nums) from json_table").show +--------+---+ | name|col| +--------+---+ |zhangsan| 1| |zhangsan| 2 4000 | |zhangsan| 3| |zhangsan| 4| |zhangsan| 5| | lisi| 6| | lisi| 7| | lisi| 8| | lisi| 9| | lisi| 10| +--------+---+test2.json数据
cat test2.json {"name":"Yin", "address":{"city":"Columbus","state":"Ohio"}} {"name":"Michael", "address":{"city":null, "state":"California"}}
scala> val json = spark.read.format("json").load("file:///home/hadoop/data/test2.json") json: org.apache.spark.sql.DataFrame = [address: struct<city: string, state: string>, name: string] scala> json.show +-----------------+-------+ | address | name| +-----------------+-------+ | [Columbus,Ohio]| Yin| |[null,California]|Michael| +-----------------+-------+
scala> json.createOrReplaceTempView("json_table") scala>spark.sql("select * from json_table").show +-----------------+-------+ | address | name| +-----------------+-------+ | [Columbus,Ohio]| Yin| |[null,California]|Michael| +-----------------+-------+
#Access nested-objects with dot syntax scala>spark.sql("select name,address.city,address.state from json_table").show +-------+--------+----------+ | name| city| state| +-------+--------+----------+ | Yin|Columbus| Ohio| |Michael| null|California| +-------+--------+----------+
相关文章推荐
- Working with SQL Server LocalDB
- spark sql udf 解析json数据表的嵌套数组
- SparkSQL On Yarn with Hive,操作和访问Hive表
- 7.Spark SQL:JSON数据源
- IMF SparkSQLwithJoin 测试sparksql
- Working with JSON in C#
- SparkSQL On Yarn with Hive,操作和访问Hive表
- Lesson 05: Working with Disconnected Data - The DataSet and SqlDataAdapter(使用断开数据--DataSet和SqlDataAdapter)
- 7.Spark SQL:JSON数据源
- spark sql加载parquet格式和json格式数据
- spark sql中将数据保存成parquet,json格式
- Spark SQL 读取json 里面的数据 ,jason 是 结构的数据
- SparkSQL On Yarn with Hive,操作和访问Hive表
- spark sql json处理代码修改记录
- SQL to JSON Data Modeling with Hackolade
- 7.Spark SQL:JSON数据源
- Working with Numbers in PL/SQL(在PL/SQL中使用数字)
- SparkSQL On Yarn with Hive,操作和访问Hive表
- Working with tempdb in SQL Server 2005
- 7.Spark SQL:JSON数据源