SparkSQL简单测试
2015-11-15 17:25
295 查看
osx + idea15跑单机测试。
环境:
JDK
安装scala2.10.6(注意,这里不能使用2.11.x版本,与Spark1.5.x不兼容)
导入Libraies:spark-assembly-1.5.0-hadoop2.6.0.jar
使用示例
编写简单的scala程序,从文本文件中加载用户数据并从数据集中创建一个DataFrame对象。然后运行DataFrame函数,执行特定的数据选择查询。
文本文件customers.txt中的内容如下:
编写Scala代码:
Edit Configuration:
Run Result:
SparkSQL与Hive on Spark比较
不同点:
Spark SQL是Spark官方也就是Databricks的项目,原先Databricks主推的是Shark,后来改为Spark SQL。
Hive on Spark属于Apache Hive产品线,是Hive on MapReduce演进而来,使得生成Spark Job而不是MR Job,充分利用Spark的快速执行能力来缩短HiveQL的响应时间。Hive on Spark现在是Hive组件(从Hive1.1 release之后)的一部分,Cloudera主导启动了Hive On Spark,并且该项目得到了IBM,Intel和MapR的支持(但是没有Databricks)。
相同点:
两个产品都是处于上面的
环境:
JDK
安装scala2.10.6(注意,这里不能使用2.11.x版本,与Spark1.5.x不兼容)
导入Libraies:spark-assembly-1.5.0-hadoop2.6.0.jar
使用示例
编写简单的scala程序,从文本文件中加载用户数据并从数据集中创建一个DataFrame对象。然后运行DataFrame函数,执行特定的数据选择查询。
文本文件customers.txt中的内容如下:
Tom,12 Mike,13 Tony,34 Lili,8 David,21 Nike,18 Bush,29 Candy,42
编写Scala代码:
import org.apache.spark._ object Hello { // 创建一个表示用户的自定义类 case class Person(name: String, age: Int) def main(args: Array[String]) { val conf = new SparkConf().setAppName("SparkSQL Demo") val sc = new SparkContext(conf) // 首先用已有的Spark Context对象创建SQLContext对象 val sqlContext = new org.apache.spark.sql.SQLContext(sc) // 导入语句,可以隐式地将RDD转化成DataFrame import sqlContext.implicits._ // 用数据集文本文件创建一个Person对象的DataFrame val people = sc.textFile("/Users/urey/data/input2.txt").map(_.split(",")).map(p => Person(p(0), p(1).trim.toInt)).toDF() // 将DataFrame注册为一个表 people.registerTempTable("people") // SQL查询 val teenagers = sqlContext.sql("SELECT name, age FROM people WHERE age >= 13 AND age <= 19") // 输出查询结果,按照顺序访问结果行的各个列。 teenagers.map(t => "Name: " + t(0)).collect().foreach(println) sc.stop() } }
Edit Configuration:
-Dspark.master=local
Run Result:
/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/bin/java -Dspark.master=local -Didea.launcher.port=7532 "-Didea.launcher.bin.path=/Applications/IntelliJ IDEA 15.app/Contents/bin" -Dfile.encoding=UTF-8 -classpath "/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/lib/ant-javafx.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/lib/dt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/lib/javafx-mx.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/lib/jconsole.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/lib/packager.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/lib/sa-jdi.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/lib/tools.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/jre/lib/charsets.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/jre/lib/deploy.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/jre/lib/javaws.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/jre/lib/jce.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/jre/lib/jfr.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/jre/lib/jfxswt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/jre/lib/jsse.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/jre/lib/management-agent.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/jre/lib/plugin.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/jre/lib/resources.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/jre/lib/rt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/jre/lib/ext/cldrdata.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/jre/lib/ext/dnsns.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/jre/lib/ext/jfxrt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/jre/lib/ext/localedata.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/jre/lib/ext/nashorn.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/jre/lib/ext/sunec.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/jre/lib/ext/sunjce_provider.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/jre/lib/ext/sunpkcs11.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/jre/lib/ext/zipfs.jar:/Users/urey/data/SparkDemo/out/production/SparkDemo:/usr/local/share/scala-2.10.6/lib/scala-actors-migration.jar:/usr/local/share/scala-2.10.6/lib/scala-actors.jar:/usr/local/share/scala-2.10.6/lib/scala-library.jar:/usr/local/share/scala-2.10.6/lib/scala-reflect.jar:/usr/local/share/scala-2.10.6/lib/scala-swing.jar:/Users/urey/Downloads/spark-1.5.0-bin-hadoop2.6/lib/spark-assembly-1.5.0-hadoop2.6.0.jar:/Applications/IntelliJ IDEA 15.app/Contents/lib/idea_rt.jar" com.intellij.rt.execution.application.AppMain Demo /hadoopLearning/spark-1.5.0-bin-hadoop2.4/README.md Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 15/11/15 17:08:51 INFO SparkContext: Running Spark version 1.5.0 15/11/15 17:08:52 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/11/15 17:08:52 INFO SecurityManager: Changing view acls to: urey 15/11/15 17:08:52 INFO SecurityManager: Changing modify acls to: urey 15/11/15 17:08:52 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(urey); users with modify permissions: Set(urey) 15/11/15 17:08:53 INFO Slf4jLogger: Slf4jLogger started 15/11/15 17:08:53 INFO Remoting: Starting remoting 15/11/15 17:08:53 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@10.12.24.154:49984] 15/11/15 17:08:53 INFO Utils: Successfully started service 'sparkDriver' on port 49984. 15/11/15 17:08:53 INFO SparkEnv: Registering MapOutputTracker 15/11/15 17:08:53 INFO SparkEnv: Registering BlockManagerMaster 15/11/15 17:08:53 INFO DiskBlockManager: Created local directory at /private/var/folders/24/mfkwkygj31vbpnsfws1063f80000gn/T/blockmgr-bfe8e8b1-abc9-4827-959b-462d4b8211cf 15/11/15 17:08:53 INFO MemoryStore: MemoryStore started with capacity 1966.1 MB 15/11/15 17:08:53 INFO HttpFileServer: HTTP File server directory is /private/var/folders/24/mfkwkygj31vbpnsfws1063f80000gn/T/spark-efbbbd67-b6d4-4175-a0ca-1b744e31af2e/httpd-4c90da26-08ed-4ce6-a422-253690f96177 15/11/15 17:08:53 INFO HttpServer: Starting HTTP Server 15/11/15 17:08:53 INFO Utils: Successfully started service 'HTTP file server' on port 49985. 15/11/15 17:08:53 INFO SparkEnv: Registering OutputCommitCoordinator 15/11/15 17:08:53 INFO Utils: Successfully started service 'SparkUI' on port 4040. 15/11/15 17:08:53 INFO SparkUI: Started SparkUI at http://10.12.24.154:4040 15/11/15 17:08:53 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 15/11/15 17:08:53 INFO Executor: Starting executor ID driver on host localhost 15/11/15 17:08:54 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 49986. 15/11/15 17:08:54 INFO NettyBlockTransferService: Server created on 49986 15/11/15 17:08:54 INFO BlockManagerMaster: Trying to register BlockManager 15/11/15 17:08:54 INFO BlockManagerMasterEndpoint: Registering block manager localhost:49986 with 1966.1 MB RAM, BlockManagerId(driver, localhost, 49986) 15/11/15 17:08:54 INFO BlockManagerMaster: Registered BlockManager 15/11/15 17:08:55 INFO MemoryStore: ensureFreeSpace(130448) called with curMem=0, maxMem=2061647216 15/11/15 17:08:55 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 127.4 KB, free 1966.0 MB) 15/11/15 17:08:55 INFO MemoryStore: ensureFreeSpace(14276) called with curMem=130448, maxMem=2061647216 15/11/15 17:08:55 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 13.9 KB, free 1966.0 MB) 15/11/15 17:08:55 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:49986 (size: 13.9 KB, free: 1966.1 MB) 15/11/15 17:08:55 INFO SparkContext: Created broadcast 0 from textFile at Demo.scala:21 15/11/15 17:08:57 INFO FileInputFormat: Total input paths to process : 1 15/11/15 17:08:57 INFO SparkContext: Starting job: collect at Demo.scala:29 15/11/15 17:08:57 INFO DAGScheduler: Got job 0 (collect at Demo.scala:29) with 1 output partitions 15/11/15 17:08:57 INFO DAGScheduler: Final stage: ResultStage 0(collect at Demo.scala:29) 15/11/15 17:08:57 INFO DAGScheduler: Parents of final stage: List() 15/11/15 17:08:57 INFO DAGScheduler: Missing parents: List() 15/11/15 17:08:57 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[7] at map at Demo.scala:29), which has no missing parents 15/11/15 17:08:57 INFO MemoryStore: ensureFreeSpace(8152) called with curMem=144724, maxMem=2061647216 15/11/15 17:08:57 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 8.0 KB, free 1966.0 MB) 15/11/15 17:08:57 INFO MemoryStore: ensureFreeSpace(4226) called with curMem=152876, maxMem=2061647216 15/11/15 17:08:57 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 4.1 KB, free 1966.0 MB) 15/11/15 17:08:57 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:49986 (size: 4.1 KB, free: 1966.1 MB) 15/11/15 17:08:57 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:861 15/11/15 17:08:57 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[7] at map at Demo.scala:29) 15/11/15 17:08:57 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks 15/11/15 17:08:57 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 2141 bytes) 15/11/15 17:08:57 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 15/11/15 17:08:57 INFO HadoopRDD: Input split: file:/Users/urey/data/input2.txt:0+63 15/11/15 17:08:57 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id 15/11/15 17:08:57 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 15/11/15 17:08:57 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap 15/11/15 17:08:57 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition 15/11/15 17:08:57 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id 15/11/15 17:08:57 INFO GeneratePredicate: Code generated in 117.020103 ms 15/11/15 17:08:57 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2284 bytes result sent to driver 15/11/15 17:08:57 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 210 ms on localhost (1/1) 15/11/15 17:08:57 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 15/11/15 17:08:57 INFO DAGScheduler: ResultStage 0 (collect at Demo.scala:29) finished in 0.221 s 15/11/15 17:08:57 INFO DAGScheduler: Job 0 finished: collect at Demo.scala:29, took 0.276326 s Name: Mike Name: Nike 15/11/15 17:08:57 INFO SparkUI: Stopped Spark web UI at http://10.12.24.154:4040 15/11/15 17:08:57 INFO DAGScheduler: Stopping DAGScheduler 15/11/15 17:08:57 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 15/11/15 17:08:57 INFO MemoryStore: MemoryStore cleared 15/11/15 17:08:57 INFO BlockManager: BlockManager stopped 15/11/15 17:08:58 INFO BlockManagerMaster: BlockManagerMaster stopped 15/11/15 17:08:58 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 15/11/15 17:08:58 INFO SparkContext: Successfully stopped SparkContext 15/11/15 17:08:58 INFO ShutdownHookManager: Shutdown hook called 15/11/15 17:08:58 INFO ShutdownHookManager: Deleting directory /private/var/folders/24/mfkwkygj31vbpnsfws1063f80000gn/T/spark-efbbbd67-b6d4-4175-a0ca-1b744e31af2e Process finished with exit code 0
SparkSQL与Hive on Spark比较
不同点:
Spark SQL是Spark官方也就是Databricks的项目,原先Databricks主推的是Shark,后来改为Spark SQL。
Hive on Spark属于Apache Hive产品线,是Hive on MapReduce演进而来,使得生成Spark Job而不是MR Job,充分利用Spark的快速执行能力来缩短HiveQL的响应时间。Hive on Spark现在是Hive组件(从Hive1.1 release之后)的一部分,Cloudera主导启动了Hive On Spark,并且该项目得到了IBM,Intel和MapR的支持(但是没有Databricks)。
相同点:
两个产品都是处于上面的
"翻译层",把一个SQL翻译成分布式的可执行的Spark Job。
相关文章推荐
- win10、win7系统64位oracle11g安装教程以及32位plsql连接教程
- sql之left join、right join、inner join的区别
- Oracle 12c安装步骤及使用问题总结:
- ubuntu15.04 安装mongodb
- Ubuntu 安装mysql和简单操作
- oracle新建用户
- Day 8(11.15):游标操作指令
- Day 8(11.15):游标
- 数据库应用类型:OLTP与OLAP的比较
- Day 8(11.15):随堂笔记
- 理解数据库范式——通俗易懂
- Day 8(11.15):存储过程(6)--错误处理
- Day 8(11.15):存储过程(5)--使用事务
- Day 8(11.15):存储过程(4)--处理错误信息
- Day 8(11.15):存储过程(3)--扩展存储过程
- Day 8(11.15):存储过程(2)--存储过程中使用参数
- Day 8(11.15):存储过程(1)--创建、修改和删除存储过程
- 由于Redis漏洞导致服务器被黑
- 老男孩mysql学习笔记<1>
- SQL集合操作