Spark SQL Example
2016-02-27 10:44
295 查看
This example demonstrates how to use sqlContext.sql to create and load a table and select rows from the table into a DataFrame. The next steps use the
DataFrame API to filter the rows for salaries greater than 150,000 and show the resulting DataFrame.
At the command-line, copy the Hue sample_07 data to HDFS:
where HUE_HOME defaults to /opt/cloudera/parcels/CDH/lib/hue (parcel installation) or /usr/lib/hue (package
installation).
Start spark-shell:
Create a Hive table:
Load data from HDFS into the table:
Create a DataFrame containing the contents of the sample_07 table:
Show all rows with salary greater than 150,000:
The output should be:
Spark SQL Example
This example demonstrates how to use sqlContext.sql to create and load a table and select rows from the table into a DataFrame. The next steps use theDataFrame API to filter the rows for salaries greater than 150,000 and show the resulting DataFrame.
At the command-line, copy the Hue sample_07 data to HDFS:
$ hdfs dfs -put HUE_HOME/apps/beeswax/data/sample_07.csv /user/hdfs
where HUE_HOME defaults to /opt/cloudera/parcels/CDH/lib/hue (parcel installation) or /usr/lib/hue (package
installation).
Start spark-shell:
$ spark-shell
Create a Hive table:
scala> sqlContext.sql("CREATE TABLE sample_07 (code string,description string,total_emp int,salary int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TextFile")
Load data from HDFS into the table:
scala> sqlContext.sql("LOAD DATA INPATH '/user/hdfs/sample_07.csv' OVERWRITE INTO TABLE sample_07")
Create a DataFrame containing the contents of the sample_07 table:
scala> val df = sqlContext.sql("SELECT * from sample_07")
Show all rows with salary greater than 150,000:
scala> df.filter(df("salary") > 150000).show()
The output should be:
+-------+--------------------+---------+------+ | code| description|total_emp|salary| +-------+--------------------+---------+------+ |11-1011| Chief executives| 299160|151370| |29-1022|Oral and maxillof...| 5040|178440| |29-1023| Orthodontists| 5350|185340| |29-1024| Prosthodontists| 380|169360| |29-1061| Anesthesiologists| 31030|192780| |29-1062|Family and genera...| 113250|153640| |29-1063| Internists, general| 46260|167270| |29-1064|Obstetricians and...| 21340|183600| |29-1067| Surgeons| 50260|191410| |29-1069|Physicians and su...| 237400|155150| +-------+--------------------+---------+------+
相关文章推荐
- mysql开启GTID环境使用xtrabackup备份搭建复制环境
- mysql 线程等待时间,解决sleep进程过多的办法
- mysql学习笔记
- Nosql的一些疑问
- mysql --- 存储过程
- SQL Server中的游标CURSOR
- mysql ---自定义函数
- 教你sql查询分析器执行存储过程
- mysql批量删除前缀相同的表
- 数据库---mysql 事务
- SQL2005批量删除字段说明描述
- mysql开启慢SQL并分析原因
- 【Web API系列教程】3.9 — 实战:处理数据(添加新条目到数据库)
- shopnc数据库 批量修改商品价格
- IOS Sqlite的使用方法
- 民族数据库表SQL语句
- SQLServer查询所有记录数大于0的用户表
- 提高SQL查询效率
- redis replication问题一解
- 利用redis的分布式爬虫