spark-shell 调hive sql的例子
2018-02-17 18:52
423 查看
[spark@master ~]$ spark-shell --master yarn-client --jars /app/soft/hive/lib/mysql-connector-java-5.1.44-bin.jar scala> import org.apache.spark.sql.SQLContext import org.apache.spark.sql.SQLContext scala> val sqlContext = new SQLContext(sc) warning: there was one deprecation warning; re-run with -deprecation for details sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@432a6a69 scala> val res = sqlContext.sql("select * from lb") res: org.apache.spark.sql.DataFrame = [cookieid: string, createtime: string ... 1 more field] scala> res.show() +--------+----------+---+ |cookieid|createtime| pv| +--------+----------+---+ | cookie1|2015-11-11| 1| | cookie1|2015-11-12| 4| | cookie1|2015-11-13| 5| | cookie1|2015-11-14| 4| | cookie2|2015-11-11| 7| | cookie2|2015-11-12| 3| | cookie2|2015-11-13| 8| | cookie2|2015-11-14| 2| +--------+----------+---+
建表
scala> val path = "hdfs://master:9000/data/Romeo_and_Juliet.txt" path: String = hdfs://master:9000/data/Romeo_and_Juliet.txt scala> val df2 = spark.sparkContext.textFile(path).flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).toDF("word","count") df2: org.apache.spark.sql.DataFrame = [word: string, count: int] scala> df2.write.mode("overwrite").saveAsTable("badou.test_a") 18/01/28 08:15:10 WARN metastore.HiveMetaStore: Location: hdfs://master:9000/user/hive/warehouse/badou.db/test_a specified for non-external table:test_a -------------------- hive> use badou; hive> show tables; hive> select * from test_a order by count desc limit 10; Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1516801273097_0045, Tracking URL = http://master:8088/proxy/application_1516801273097_0045/ Kill Command = /app/soft/hadoop/bin/hadoop job -kill job_1516801273097_0045 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2018-01-28 09:08:22,144 Stage-1 map = 0%, reduce = 0% 2018-01-28 09:08:29,615 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.37 sec 2018-01-28 09:08:37,987 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.18 sec MapReduce Total cumulative CPU time: 3 seconds 180 msec Ended Job = job_1516801273097_0045 MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 Cumulative CPU: 3.18 sec HDFS Read: 54970 HDFS Write: 69 SUCCESS Total MapReduce CPU Time Spent: 3 seconds 180 msec OK 4132 the 614 I 531 and 462 to 449 a 392 of 364 my 313 is 290 in 282 Time taken: 28.159 seconds, Fetched: 10 row(s)
相关文章推荐
- Spark-shell启动的时候报Error while instantiating ‘org.apache.spark.sql.hive.HiveSessionStateBuilder’错误
- spark-shell引用不了hive数据表的问题 sqlContext hiveContext
- Spark-Sql整合hive,在spark-sql命令和spark-shell命令下执行sql命令和整合调用hive
- [Spark][Hive][Python][SQL]Spark 读取Hive表的小例子
- 实战 - Spark SQL 整合Hive时,报错找不到mysql驱动
- SparkSQL之Hive操作
- Spark SQL与Hive on Spark的比较
- Spark SQL 与 Spark SQL on Hive 区别
- spark.sql.hive.convertMetastoreParquet参数优化
- 一起学spark(10) -- spark SQL中的结构化数据之一 : Apache Hive
- 基于Hive+sparkSQL的人力资源系统实例
- Spark2.0.1 on yarn with hue 集群安装部署(六)hue+hive+sparksql
- 【Spark篇】---SparkSQL on Hive的配置和使用
- Hive和SparkSQL: 基于 Hadoop 的数据仓库工具
- Spark SQL访问Hive,MySQL
- spark:--spark-shell运行简单语句、用Idea编写例子--8
- SparkSQL On Yarn with Hive,操作和访问Hive表
- spark-sql读取hive挂载alluxio
- 第57课 spark sql on hive实战
- SparkSQL与Hive on Spark的比较