您的位置:首页 > 数据库 > MySQL

spark 集群模式读写mysql 问题的处理

2015-03-13 16:27 555 查看
时间:20150313

现象:使用local 模式提交正常,但是调整为集群模式出现空指针异常。使用./bin/spark-submit --master spark://jt-host-kvm-17:7077 --classparkMysql.ParkInCountMysql --executor-memory 300m /httx/work/work.jar local--driver-class-path /httx/work/mysql-connector-java-5.1.21.jar
这个正常,使用下面的语句异常。./bin/spark-submit--master spark://jt-host-kvm-17:7077 --class parkMysql.ParkInCountMysql--executor-memory 300m /httx/work/work.jar spark://jt-host-kvm-17:7077 --driver-class-path/httx/work/mysql-connector-java-5.1.21.jar
Spark assembly has been built with Hive, includingDatanucleus jars on classpath
15/03/13 09:19:19 INFO spark.SecurityManager: Changingview acls to: hduser,
15/03/13 09:19:19 INFO spark.SecurityManager: Changingmodify acls to: hduser,
15/03/13 09:19:19 INFO spark.SecurityManager:SecurityManager: authentication disabled; ui acls disabled; users with viewpermissions: Set(hduser, ); users with modify permissions: Set(hduser, )
15/03/13 09:19:19 INFO slf4j.Slf4jLogger: Slf4jLoggerstarted
15/03/13 09:19:19 INFO Remoting: Starting remoting
15/03/13 09:19:19 INFO Remoting: Remoting started;listening on addresses :[akka.tcp://sparkDriver@jt-host-kvm-17:41594]
15/03/13 09:19:19 INFO Remoting: Remoting now listenson addresses: [akka.tcp://sparkDriver@jt-host-kvm-17:41594]
15/03/13 09:19:19 INFO util.Utils: Successfullystarted service 'sparkDriver' on port 41594.
15/03/13 09:19:19 INFO spark.SparkEnv: RegisteringMapOutputTracker
15/03/13 09:19:19 INFO spark.SparkEnv: RegisteringBlockManagerMaster
15/03/13 09:19:19 INFO storage.DiskBlockManager:Created local directory at /tmp/spark-local-20150313091919-279b
15/03/13 09:19:19 INFO util.Utils: Successfullystarted service 'Connection manager for block manager' on port 50750.
15/03/13 09:19:19 INFO network.ConnectionManager:Bound socket to port 50750 with id = ConnectionManagerId(jt-host-kvm-17,50750)
15/03/13 09:19:19 INFO storage.MemoryStore:MemoryStore started with capacity 265.4 MB
15/03/13 09:19:19 INFO storage.BlockManagerMaster:Trying to register BlockManager
15/03/13 09:19:19 INFOstorage.BlockManagerMasterActor: Registering block manager jt-host-kvm-17:50750with 265.4 MB RAM
15/03/13 09:19:19 INFO storage.BlockManagerMaster:Registered BlockManager
15/03/13 09:19:19 INFO spark.HttpFileServer: HTTP Fileserver directory is /tmp/spark-d81926b3-e51a-4c54-b2bb-da139a9a413d
15/03/13 09:19:19 INFO spark.HttpServer: Starting HTTPServer
15/03/13 09:19:19 INFO server.Server:jetty-8.y.z-SNAPSHOT
15/03/13 09:19:19 INFO server.AbstractConnector:Started SocketConnector@0.0.0.0:43888
15/03/13 09:19:19 INFO util.Utils: Successfullystarted service 'HTTP file server' on port 43888.
15/03/13 09:19:20 INFO server.Server:jetty-8.y.z-SNAPSHOT
15/03/13 09:19:20 INFO server.AbstractConnector:Started SelectChannelConnector@0.0.0.0:4040
15/03/13 09:19:20 INFO util.Utils: Successfullystarted service 'SparkUI' on port 4040.
15/03/13 09:19:20 INFO ui.SparkUI: Started SparkUI athttp://jt-host-kvm-17:4040
15/03/13 09:19:20 WARN util.NativeCodeLoader: Unableto load native-hadoop library for your platform... using builtin-java classeswhere applicable
15/03/13 09:19:20 INFO spark.SparkContext: Added JARfile:/httx/work/work.jar at http://10.7.12.117:43888/jars/work.jar withtimestamp 1426209560636
15/03/13 09:19:20 INFO client.AppClient$ClientActor:Connecting to master spark://jt-host-kvm-17:7077...
15/03/13 09:19:20 INFOcluster.SparkDeploySchedulerBackend: SchedulerBackend is ready for schedulingbeginning after reached minRegisteredResourcesRatio: 0.0
15/03/13 09:19:20 INFO spark.SparkContext: Startingjob: collectAsMap at ParkInCountMysql.scala:32
15/03/13 09:19:20 INFO scheduler.DAGScheduler:Registering RDD 2 (map at ParkInCountMysql.scala:29)
15/03/13 09:19:20 INFO scheduler.DAGScheduler: Got job0 (collectAsMap at ParkInCountMysql.scala:32) with 1 output partitions(allowLocal=false)
15/03/13 09:19:20 INFO scheduler.DAGScheduler: Finalstage: Stage 0(collectAsMap at ParkInCountMysql.scala:32)
15/03/13 09:19:20 INFO scheduler.DAGScheduler: Parentsof final stage: List(Stage 1)
15/03/13 09:19:20 INFO scheduler.DAGScheduler: Missingparents: List(Stage 1)
15/03/13 09:19:20 INFO scheduler.DAGScheduler:Submitting Stage 1 (MappedRDD[2] at map at ParkInCountMysql.scala:29), whichhas no missing parents
15/03/13 09:19:20 INFOcluster.SparkDeploySchedulerBackend: Connected to Spark cluster with app IDapp-20150313091920-0008
15/03/13 09:19:20 INFO client.AppClient$ClientActor:Executor added: app-20150313091920-0008/0 on worker-20150305092118-jt-host-kvm-18-59151(jt-host-kvm-18:59151) with 16 cores
15/03/13 09:19:20 INFOcluster.SparkDeploySchedulerBackend: Granted executor IDapp-20150313091920-0008/0 on hostPort jt-host-kvm-18:59151 with 16 cores, 300.0MB RAM
15/03/13 09:19:20 INFO client.AppClient$ClientActor:Executor added: app-20150313091920-0008/1 onworker-20150305092118-jt-host-kvm-17-50115 (jt-host-kvm-17:50115) with 16 cores
15/03/13 09:19:20 INFOcluster.SparkDeploySchedulerBackend: Granted executor IDapp-20150313091920-0008/1 on hostPort jt-host-kvm-17:50115 with 16 cores, 300.0MB RAM
15/03/13 09:19:20 INFO client.AppClient$ClientActor:Executor added: app-20150313091920-0008/2 onworker-20150305092118-jt-host-kvm-19-48800 (jt-host-kvm-19:48800) with 16 cores
15/03/13 09:19:20 INFO cluster.SparkDeploySchedulerBackend:Granted executor ID app-20150313091920-0008/2 on hostPort jt-host-kvm-19:48800with 16 cores, 300.0 MB RAM
15/03/13 09:19:20 INFO client.AppClient$ClientActor:Executor updated: app-20150313091920-0008/1 is now RUNNING
15/03/13 09:19:20 INFO client.AppClient$ClientActor:Executor updated: app-20150313091920-0008/0 is now RUNNING
15/03/13 09:19:20 INFO client.AppClient$ClientActor:Executor updated: app-20150313091920-0008/2 is now RUNNING
15/03/13 09:19:20 INFO storage.MemoryStore:ensureFreeSpace(2832) called with curMem=0, maxMem=278302556
15/03/13 09:19:20 INFO storage.MemoryStore: Blockbroadcast_0 stored as values in memory (estimated size 2.8 KB, free 265.4 MB)
15/03/13 09:19:20 INFO storage.MemoryStore:ensureFreeSpace(1651) called with curMem=2832, maxMem=278302556
15/03/13 09:19:20 INFO storage.MemoryStore: Blockbroadcast_0_piece0 stored as bytes in memory (estimated size 1651.0 B, free265.4 MB)
15/03/13 09:19:20 INFO storage.BlockManagerInfo: Addedbroadcast_0_piece0 in memory on jt-host-kvm-17:50750 (size: 1651.0 B, free:265.4 MB)
15/03/13 09:19:20 INFO storage.BlockManagerMaster:Updated info of block broadcast_0_piece0
15/03/13 09:19:20 INFO scheduler.DAGScheduler:Submitting 1 missing tasks from Stage 1 (MappedRDD[2] at map atParkInCountMysql.scala:29)
15/03/13 09:19:20 INFO scheduler.TaskSchedulerImpl:Adding task set 1.0 with 1 tasks
15/03/13 09:19:23 INFOcluster.SparkDeploySchedulerBackend: Registered executor:Actor[akka.tcp://sparkExecutor@jt-host-kvm-19:41576/user/Executor#1766662750]with ID 2
15/03/13 09:19:23 INFO scheduler.TaskSetManager:Starting task 0.0 in stage 1.0 (TID 0, jt-host-kvm-19, PROCESS_LOCAL, 998bytes)
15/03/13 09:19:23 INFOcluster.SparkDeploySchedulerBackend: Registered executor:Actor[akka.tcp://sparkExecutor@jt-host-kvm-17:45787/user/Executor#-210267352]with ID 1
15/03/13 09:19:23 INFO cluster.SparkDeploySchedulerBackend:Registered executor:Actor[akka.tcp://sparkExecutor@jt-host-kvm-18:38188/user/Executor#-1134749150]with ID 0
15/03/13 09:19:23 INFOstorage.BlockManagerMasterActor: Registering block manager jt-host-kvm-19:39020with 155.3 MB RAM
15/03/13 09:19:23 INFOstorage.BlockManagerMasterActor: Registering block manager jt-host-kvm-18:33649with 155.3 MB RAM
15/03/13 09:19:23 INFOstorage.BlockManagerMasterActor: Registering block manager jt-host-kvm-17:57066with 155.3 MB RAM
15/03/13 09:19:23 INFO network.ConnectionManager:Accepted connection from [jt-host-kvm-19/10.7.12.119:42519]
15/03/13 09:19:23 INFO network.SendingConnection:Initiating connection to [jt-host-kvm-19/10.7.12.119:39020]
15/03/13 09:19:23 INFO network.SendingConnection:Connected to [jt-host-kvm-19/10.7.12.119:39020], 1 messages pending
15/03/13 09:19:24 INFO storage.BlockManagerInfo: Addedbroadcast_0_piece0 in memory on jt-host-kvm-19:39020 (size: 1651.0 B, free:155.2 MB)
15/03/13 09:19:24 WARN scheduler.TaskSetManager: Losttask 0.0 in stage 1.0 (TID 0, jt-host-kvm-19): java.lang.NullPointerException:

org.apache.spark.rdd.JdbcRDD$$anon$1.<init>(JdbcRDD.scala:74)
org.apache.spark.rdd.JdbcRDD.compute(JdbcRDD.scala:70)
org.apache.spark.rdd.JdbcRDD.compute(JdbcRDD.scala:50)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
org.apache.spark.scheduler.Task.run(Task.scala:54)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:744)
15/03/13 09:19:24 INFO scheduler.TaskSetManager:Starting task 0.1 in stage 1.0 (TID 1, jt-host-kvm-17, PROCESS_LOCAL, 998bytes)
15/03/13 09:19:24 INFO storage.BlockManagerInfo: Addedbroadcast_0_piece0 in memory on jt-host-kvm-17:57066 (size: 1651.0 B, free:155.2 MB)
15/03/13 09:19:24 INFO scheduler.TaskSetManager: Losttask 0.1 in stage 1.0 (TID 1) on executor jt-host-kvm-17:java.lang.NullPointerException (null) [duplicate 1]
15/03/13 09:19:24 INFO scheduler.TaskSetManager:Starting task 0.2 in stage 1.0 (TID 2, jt-host-kvm-19, PROCESS_LOCAL, 998bytes)
15/03/13 09:19:24 INFO scheduler.TaskSetManager: Losttask 0.2 in stage 1.0 (TID 2) on executor jt-host-kvm-19:java.lang.NullPointerException (null) [duplicate 2]
15/03/13 09:19:24 INFO scheduler.TaskSetManager:Starting task 0.3 in stage 1.0 (TID 3, jt-host-kvm-17, PROCESS_LOCAL, 998bytes)
15/03/13 09:19:24 INFO scheduler.TaskSetManager: Losttask 0.3 in stage 1.0 (TID 3) on executor jt-host-kvm-17: java.lang.NullPointerException(null) [duplicate 3]
15/03/13 09:19:24 ERROR scheduler.TaskSetManager: Task0 in stage 1.0 failed 4 times; aborting job
15/03/13 09:19:24 INFO scheduler.TaskSchedulerImpl:Removed TaskSet 1.0, whose tasks have all completed, from pool

15/03/13 09:19:24 INFO scheduler.TaskSchedulerImpl:Cancelling stage 1
15/03/13 09:19:24 INFO scheduler.DAGScheduler: Failedto run collectAsMap at ParkInCountMysql.scala:32
Exception in thread "main"org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 instage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID3, jt-host-kvm-17): java.lang.NullPointerException:

org.apache.spark.rdd.JdbcRDD$$anon$1.<init>(JdbcRDD.scala:74)
org.apache.spark.rdd.JdbcRDD.compute(JdbcRDD.scala:70)
org.apache.spark.rdd.JdbcRDD.compute(JdbcRDD.scala:50)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
org.apache.spark.scheduler.Task.run(Task.scala:54)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:744)
Driver stacktrace:
atorg.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
atorg.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
atorg.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
atscala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
atscala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
atorg.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
atorg.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
atorg.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
atscala.Option.foreach(Option.scala:236)
atorg.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
atorg.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
atakka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
atakka.actor.ActorCell.invoke(ActorCell.scala:456)
atakka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
atakka.dispatch.Mailbox.run(Mailbox.scala:219)
atakka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
atscala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
atscala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
atscala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
atscala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


处理方式:将 mysql jdbc 包放到—jars 参数中即可 ./bin/spark-submit--master spark://jt-host-kvm-17:7077 --class parkMysql.ParkInCountMysql--executor-memory 300m --jars/httx/work/mysql-connector-java-5.1.21.jar /httx/work/work.jar spark://jt-host-kvm-17:7077
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: