使Apache Spark和Mysql作数据分析
2017-05-08 16:39
555 查看
使用用spart-shell读取MySQL表中的数据
步骤1: 执行spark-shell命令,进入spark-shell命令行,执行命令如下:bigdata@ubuntu1:~/run/spark/bin$ ./spark-shell --master spark://ubuntu1:7077 --jars /home/bigdata/run/spark/mysql-connector-java-5.1.30-bin.jar
执行结果如下:
bigdata@ubuntu1:~/run/spark/bin$ ./spark-shell --master spark://ubuntu1:7077 --jars /home/bigdata/run/spark/mysql-connector-java-5.1.30-bin.jar Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 17/05/08 01:40:28 WARN spark.SparkConf: The configuration key 'spark.history.updateInterval' has been deprecated as of Spark 1.3 and may be removed in the future. Please use the new key 'spark.history.fs.update.interval' instead. 17/05/08 01:40:46 WARN spark.SparkConf: The configuration key 'spark.history.updateInterval' has been deprecated as of Spark 1.3 and may be removed in the future. Please use the new key 'spark.history.fs.update.interval' instead. 17/05/08 01:40:46 WARN spark.SparkConf: The configuration key 'spark.history.updateInterval' has been deprecated as of Spark 1.3 and may be removed in the future. Please use the new key 'spark.history.fs.update.interval' instead. 17/05/08 01:40:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/05/08 01:40:57 WARN spark.SparkConf: The configuration key 'spark.history.updateInterval' has been deprecated as of Spark 1.3 and may be removed in the future. Please use the new key 'spark.history.fs.update.interval' instead. 17/05/08 01:41:01 WARN DataNucleus.General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/home/bigdata/run/spark/jars/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/bigdata/run/spark-2.1.0-bin-hadoop2.7/jars/datanucleus-api-jdo-3.2.6.jar." 17/05/08 01:41:01 WARN DataNucleus.General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/home/bigdata/run/spark-2.1.0-bin-hadoop2.7/jars/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/bigdata/run/spark/jars/datanucleus-rdbms-3.2.9.jar." 17/05/08 01:41:01 WARN DataNucleus.General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/home/bigdata/run/spark-2.1.0-bin-hadoop2.7/jars/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/bigdata/run/spark/jars/datanucleus-core-3.2.10.jar." 17/05/08 01:41:10 WARN metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException Spark context Web UI available at http://10.3.19.171:4040 Spark context available as 'sc' (master = spark://ubuntu1:7077, app id = app-20170508014050-0004). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.1.0 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_25) Type in expressions to have them evaluated. Type :help for more information. scala>
步骤2: 创建变量sqlContext
scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc) warning: there was one deprecation warning; re-run with -deprecation for details sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@6164b3a2
步骤3:从Mysql中加载数据
scala> val dataframe_mysql = sqlContext.read.format("jdbc").option("url", "jdbc:mysql://127.0.0.1:3306/mydatabase").option("driver", "com.mysql.jdbc.Driver").option("dbtable", "mytable").option("user", "myname").option("password", "mypassword").load() dataframe_mysql: org.apache.spark.sql.DataFrame = [id: string, grouptype: int ... 16 more fields]
步骤4:显示dataframe中的数据
scala> dataframe_mysql.show +---+---------+-------+---------+------+------+---+--------------------+---+-----------+-----+----+--------------------+--------------------+-----+------+----------+---+ | id|grouptype|groupid|loginname| name| pwd|sex| birthday|tel|mobilephone|email|isOk| lastLoginTime| addtime|intro|credit|experience|img| +---+---------+-------+---------+------+------+---+--------------------+---+-----------+-----+----+--------------------+--------------------+-----+------+----------+---+ | 1| 1| 1| admin| admin| admin| 1|2016-05-05 14:51:...| 1| 1| 1| 1|2016-05-10 14:52:...|2016-05-08 14:52:...| 1| 1| 1| 1| | 2| 2| 2| wanghb|wanghb|wanghb| 2|2016-05-10 14:56:...| 2| 2| 2| 2|2016-05-11 14:57:...|2016-05-10 14:57:...| 2| 2| 22| 2| +---+---------+-------+---------+------+------+---+--------------------+---+-----------+-----+----+--------------------+--------------------+-----+------+----------+---+
步骤5:为了后续查询,将dataframe中的数据注册为一个临时表
scala> dataframe_mysql.registerTempTable("tmp_tablename") warning: there was one deprecation warning; re-run with -deprecation for details
步骤6:现在可以从临时表"tmp_tablename"中查询数据
scala> dataframe_mysql.sqlContext.sql("select * from tmp_tablename").collect.foreach(println) [1,1,1,admin,admin,admin,1,2016-05-05 14:51:58.0,1,1,1,1,2016-05-10 14:52:07.0,2016-05-08 14:52:12.0,1,1,1,1] [2,2,2,wanghb,wanghb,wanghb,2,2016-05-10 14:56:58.0,2,2,2,2,2016-05-11 14:57:05.0,2016-05-10 14:57:08.0,2,2,22,2]
通过Spark将数据写入MySQL
相关文章推荐
- 使用Apache Spark和MySQL打造强大的数据分析
- Bluemix中的Apache Spark数据分析服务入门
- 用Apache Spark进行大数据处理 - 第六部分: 用Spark GraphX进行图数据分析
- Apache Spark 之外的三种新兴的开源数据分析工具
- Spark 分析Json数据存入Mysql 遇到的坑
- Spark获取并分析Mysql数据
- Apache Spark数据分析教程(二):Spark SQL
- spark之broadcast后分析数据并行分区保存到mysql
- Apache Spark - 交互式数据分析
- 使用Apache Hadoop、Impala和MySQL进行数据分析
- Apache Spark数据分析教程(二):Spark SQL
- 用Apache Spark进行大数据处理之用Spark GraphX图数据分析(6)
- MySQL InnoDB和MyISAM数据引擎的差别分析
- Linux-Apache-MySQL-PHP网站架构方案分析
- 用perl 从mysql取出数据做统计分析代码
- 小白学数据分析----->与MySQL有关的小知识_I
- IBM 技术文档:Spark, 快速数据分析的又一选择
- Mysql源代码分析(7):MYISAM的数据文件处理--转载
- 【转】MySQL导入数据乱码的分析与解决
- mysql导入数据之乱码分析及解决办法