您的位置:首页 > 数据库 > MySQL

Spark与Mysql的交互

2016-09-27 15:26 309 查看

背景

Spark在对目标数据进行计算后,RDD格式的数据一般都会存在HDFS,Hive,HBase中,另一方面,对于非RDD格式的数据,可能会存放在像Mysql中这种传统的RDMS中.

但是写入过程中经常出现各种各样的问题, stackoverflow上有很多帖子:

Error writing spark dataframe to mysql table

JDBC batch insert performance

还有些其他的贴

- Using Apache Spark and MySQL for Data Analysis

- spark 1.3.0 将dataframe数据写入Hive分区表

- Spark读取数据库(Mysql)的四种方式详解

- 完整java开发中JDBC连接数据库代码和步骤

- Spark踩坑记——数据库(Hbase+Mysql)

RDD

Spark SQL通过JDBC连接MySQL读写数据

非RDD

import java.sql.{Date, DriverManager, PreparedStatement, Connection}
/*
tableName = "tempTableName"
columns = [key : String, value : Int]
DBIP = 10.10.10.10
DBPort = 8888
DB = tempDB
*/
def connection2Mysql() = {
var conn : Connection = null
var ps : PreparedStatement = nulll
val userName = "admin"
val passwd = "admin"
val key = "Tom"
val value = 1024
val sql = "INSERT INTO tempTableName(key,value) values (?,?)"
try {
Class.forName("com.mysql.jdbc.Driver").newInstance
conn = DriverManager.getConnection("jdbc:mysql://10.10.10.10:8888/tempDB", userName, passwd)
ps = conn.prepareStatement(sql)
ps.setDate(1, key)
ps.setLong(2, value)
ps.executeUpdate()
} catch {
case e: Exception => println("----> Exception!\t:\t" + e + "<-----")
} finally {
if (ps != null) ps.close()
if (conn != null) conn.close()
}
}


这里会遇到的一个问题是,在本地启动client进行功能检查,方法是可行的,但是通过submit提交给YARN之后,却被报错

java.sql.SQLException: No suitable driver found for jdbc:mysql://10.10.10.10:8888/tempDB


或是

Error:java.lang.ClassNotFoundException: com.mysql.jdbc.Driver


发生这种情况的原因是因为在上述代码中,没有找到对应的类,即
Class.forName("com.mysql.jdbc.Driver").newInstance
这行代码出现了问题. 这行代码的目的是对driver进行注册,查看Driver源码,如下所示:

/**
* The Java SQL framework allows for multiple database drivers. Each driver should supply a class that implements the Driver interface
*
* The DriverManager will try to load as many drivers as it can find and then for any given connection request, it will ask each driver in turn to try to
* connect to the target URL.
*
* It is strongly recommended that each Driver class should be small and standalone so that the Driver class can be loaded and queried without bringing in vast
* quantities of supporting code.
*
* When a Driver class is loaded, it should create an instance of itself and register it with the DriverManager. This means that a user can load and register a
* driver by doing Class.forName("foo.bah.Driver")
*/
public class Driver extends NonRegisteringDriver implements java.sql.Driver {
// Register ourselves with the DriverManager
static {
try {
java.sql.DriverManager.registerDriver(new Driver());
} catch (SQLException E) {
throw new RuntimeException("Can't register driver!");
}
}

/**
* Construct a new driver and register it with DriverManager
* @throws SQLException if a database error occurs.
*/
public Driver() throws SQLException {
// Required for Class.forName().newInstance()
}
}


出现这个问题有多种可能,

1. 在–jars参数里面加入Mysql jar包引用是没有用的. 需要通过加入–driver-class-path参数来设置driver的classpath.

$  bin/spark-submit --master local[2] --driver-class-path lib/mysql-connector-java-5.1.38.jar --class temp.jar


原因是两者分发的node不同, link

–driver-class-path driver所依赖的包,多个包之间用冒号(:)分割

–jars driver和executor都需要的包,多个包之间用逗号(,)分割

2. 使用依赖结果打包的时候没有将对应jar包导入.

使用依赖的时候需要将对应jar包打入最终的jar包中,这样才能正确的找到对应的类名并成功注册.

例如,在build.sbt中添加依赖后

// https://mvnrepository.com/artifact/mysql/mysql-connector-java libraryDependencies += "mysql" % "mysql-connector-java" % "5.1.38"


需要在Artifacts中添加对应的jar包.

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  mysql spark