您的位置:首页 > 其它

Spark问题10之Spark运行时节点空间不足导致运行报错

2017-03-06 10:47 323 查看
更多代码请见:https://github.com/xubo245/SparkLearning

Spark生态之Alluxio学习 版本:alluxio(tachyon) 0.7.1,spark-1.5.2,hadoop-2.6.0

1.问题描述

1.1 简述

在写了脚本运行多个application的时候,运行到十几个之后,报错了。

org.apache.spark.SparkException: Job aborted due to stage failure: Task 8 in stage 1.0 failed 4 times, most recent failure: Lost task 8.3 in stage 1.0 (TID 25, Mcnode4): org.apache.spark.SparkException: File ./DSA.jar exists and does not match contents of http://Master:41701/jars/DSA.jar


查看history时发现是节点空间不足:http://master:18080/history/app-20170209152626-0632/stages/stage/?id=1&attempt=0

需要深入查询到Tasks层记录,才发现问题,1.2的报错没有直接提前。

java.io.IOException: No space left on device +details

java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:345)
at org.spark-project.guava.io.ByteStreams.copy(ByteStreams.java:211)
at org.spark-project.guava.io.ByteSource.copyTo(ByteSource.java:204)
at org.spark-project.guava.io.Files.copy(Files.java:436)
at org.apache.spark.util.Utils$.org$apache$spark$util$Utils$$copyRecursive(Utils.scala:514)
at org.apache.spark.util.Utils$.copyFile(Utils.scala:485)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:362)
at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:405)
at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:397)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)


查看的hdfs和Mcnode4,确认了确实是空间不足,就2.47 MB 可用了,DSA.jar是2.5M。 主要是work的application记录逐渐增多。

1.2 问题报错记录

hadoop@Master:~/disk2/xubo/project/alignment/SparkSW/SparkSW20161114/alluxio-1.3.0$ ./cloudSWatmtimequerystandalone.sh > cloudSWatmtimequerystandalonetime201702072344.txt
[Stage 1:> (0 + 16) / 128]Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 8 in stage 1.0 failed 4 times, most recent failure: Lost task 8.3 in stage 1.0 (TID 25, Mcnode4): org.apache.spark.SparkException: File ./DSA.jar exists and does not match contents of http://Master:41701/jars/DSA.jarat org.apache.spark.util.Utils$.copyFile(Utils.scala:464)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:362)
at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:405)
at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:397)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:397)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:193)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1496)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1944)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1007)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
at org.apache.spark.rdd.RDD.reduce(RDD.scala:989)
at org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1.apply(RDD.scala:1370)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
at org.apache.spark.rdd.RDD.takeOrdered(RDD.scala:1357)
at org.apache.spark.rdd.RDD$$anonfun$top$1.apply(RDD.scala:1338)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
at org.apache.spark.rdd.RDD.top(RDD.scala:1337)
at org.dsa.core.DSW.align(DSW.scala:39)
at org.dsa.core.SequenceAlignment$$anonfun$run$1.apply(SequenceAlignment.scala:33)
at org.dsa.core.SequenceAlignment$$anonfun$run$1.apply(SequenceAlignment.scala:32)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at org.dsa.core.SequenceAlignment.run(SequenceAlignment.scala:32)
at org.dsa.core.DSW$.main(DSW.scala:138)
at org.dsa.time.CloudSWATMQueryTime$$anonfun$main$1$$anonfun$apply$mcVI$sp$3$$anonfun$apply$mcVI$sp$4.apply$mcVI$sp(CloudSWATMQueryTime.scala:93)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.dsa.time.CloudSWATMQueryTime$$anonfun$main$1$$anonfun$apply$mcVI$sp$3.apply$mcVI$sp(CloudSWATMQueryTime.scala:92)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.dsa.time.CloudSWATMQueryTime$$anonfun$main$1.apply$mcVI$sp(CloudSWATMQueryTime.scala:85)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.dsa.time.CloudSWATMQueryTime$.main(CloudSWATMQueryTime.scala:13)
at org.dsa.time.CloudSWATMQueryTime.main(CloudSWATMQueryTime.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:674)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.spark.SparkException: File ./DSA.jar exists and does not match contents of http://Master:41701/jars/DSA.jar at org.apache.spark.util.Utils$.copyFile(Utils.scala:464)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:362)
at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:405)
at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:397)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:397)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:193)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


2.解决办法:

2.1 增加空间

RT

2.2 删除/移除文件文件

将work下的记录移到disk2

cd ~/disk2/backup

mv time20161102/spark-1.5.2/ .

mv time20161212/spark/work/* spark-1.5.2/work/

mv ~/cloud/spark-1.5.2/work/app-201* spark-1.5.2/work/

3.运行记录:

移除文件后可以正常运行。

参考

【1】http://spark.apache.org/docs/1.5.2/programming-guide.html
【2】https://github.com/xubo245/SparkLearning
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: