您的位置:首页 > 数据库

Spark SQL的DataFrame会不会存储相同的数据

2017-02-24 11:04 253 查看
spark版本 2.1.0

实验结果是会存储相同的结果

实验

实验代码使用spark中代码示例JavaSparkSQLExample.java

代码路径:$SPARK_HOME/examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSQLExample.java

该代码示例中的部分代码粘贴如下

// $example on:create_df$
Dataset<Row> df = spark.read().json("examples/src/main/resources/people.json");

//省略部分代码

// $example on:run_sql$
// Register the DataFrame as a SQL temporary view
df.createOrReplaceTempView("people");

Dataset<Row> sqlDF = spark.sql("SELECT * FROM people");
sqlDF.show();


实验数据一:原始people.json内容如下:

{"name":"Michael"}
{"name":"Andy", "age":30}
{"name":"Justin", "age":19}


输出结果为:

17/02/24 10:56:00 INFO scheduler.DAGScheduler: Job 10 finished: show at JavaSparkSQLExample.java:171, took 0.028799 s
+----+-------+
| age|   name|
+----+-------+
|null|Michael|
|  30|   Andy|
|  19| Justin|
+----+-------+


实验数据二:people.json内容如下:

{"name":"Michael"}
{"name":"Andy", "age":30}
{"name":"Justin", "age":19}
{"name":"Justin", "age":19}
{"name":"Justin", "age":19}
{"name":"Michael"}
{"name":"Michael"}


输出结果为:

17/02/24 11:00:58 INFO scheduler.DAGScheduler: Job 10 finished: show at JavaSparkSQLExample.java:171, took 0.023679 s
+----+-------+
| age|   name|
+----+-------+
|null|Michael|
|null|Michael|
|  30|   Andy|
|  19| Justin|
|  19| Justin|
|  19| Justin|
|null|Michael|
|null|Michael|
+----+-------+


由以上实验可以看出,DataFrame会存储相同的数据。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  spark spark-sql