Hive Tez任务失败
2015-09-23 16:21
1431 查看
最近集群上的Tez任务经常跑失败,报错信息见下:
出错日志
Map 1: 555(+41)/596 Reducer 2: 0(+0,-2)/1 15/09/23 14:50:35 INFO SessionState: Map 1: 555(+41)/596 Reducer 2: 0(+0,-2)/1 Map 1: 555(+41)/596 Reducer 2: 0(+1,-2)/1 15/09/23 14:50:37 INFO SessionState: Map 1: 555(+41)/596 Reducer 2: 0(+1,-2)/1 Map 1: 555(+41)/596 Reducer 2: 0(+1,-3)/1 15/09/23 14:50:38 INFO SessionState: Map 1: 555(+41)/596 Reducer 2: 0(+1,-3)/1 Map 1: 555(+41)/596 Reducer 2: 0(+1,-3)/1 15/09/23 14:50:41 INFO SessionState: Map 1: 555(+41)/596 Reducer 2: 0(+1,-3)/1 Map 1: 555(+0)/596 Reducer 2: 0(+0,-4)/1 15/09/23 14:50:44 INFO SessionState: Map 1: 555(+0)/596 Reducer 2: 0(+0,-4)/1 Status: Failed 15/09/23 14:50:45 ERROR SessionState: Status: Failed Vertex failed, vertexName=Reducer 2, vertexId=vertex_1442391298043_123239_1_01, diagnostics=[Task failed, taskId=task_1442391298043_123239_1_01_000000, diagnostics=[TaskAttempt 0 failed, info=[Container container_1442391298043_123239_01_008650 finished with diagnostics set to [Container preempted internally]], TaskAttempt 1 failed, info=[Container container_1442391298043_123239_01_008771 finished with diagnostics set to [Container preempted internally]], TaskAttempt 2 failed, info=[Container container_1442391298043_123239_01_009010 finished with diagnostics set to [Container preempted internally]], TaskAttempt 3 failed, info=[Container container_1442391298043_123239_01_009723 finished with diagnostics set to [Container preempted internally]]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex vertex_1442391298043_123239_1_01 [Reducer 2] killed/failed due to:null] 15/09/23 14:50:45 ERROR SessionState: Vertex failed, vertexName=Reducer 2, vertexId=vertex_1442391298043_123239_1_01, diagnostics=[Task failed, taskId=task_1442391298043_123239_1_01_000000, diagnostics=[TaskAttempt 0 failed, info=[Container container_1442391298043_123239_01_008650 finished with diagnostics set to [Container preempted internally]], TaskAttempt 1 failed, info=[Container container_1442391298043_123239_01_008771 finished with diagnostics set to [Container preempted internally]], TaskAttempt 2 failed, info=[Container container_1442391298043_123239_01_009010 finished with diagnostics set to [Container preempted internally]], TaskAttempt 3 failed, info=[Container container_1442391298043_123239_01_009723 finished with diagnostics set to [Container preempted internally]]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex vertex_1442391298043_123239_1_01 [Reducer 2] killed/failed due to:null] Vertex killed, vertexName=Map 1, vertexId=vertex_1442391298043_123239_1_00, diagnostics=[Vertex received Kill while in RUNNING state., Vertex killed as other vertex failed. failedTasks:0, Vertex vertex_1442391298043_123239_1_00 [Map 1] killed/failed due to:null] 15/09/23 14:50:45 ERROR SessionState: Vertex killed, vertexName=Map 1, vertexId=vertex_1442391298043_123239_1_00, diagnostics=[Vertex received Kill while in RUNNING state., Vertex killed as other vertex failed. failedTasks:0, Vertex vertex_1442391298043_123239_1_00 [Map 1] killed/failed due to:null] DAG failed due to vertex failure. failedVertices:1 killedVertices:1
分析:
task_1442391298043_123239_1_01_000000,失败了4次,失败的原因是container被高优先级的任务抢占了。而task最大的失败次数默认是4.当集群上的任务比较多时,比较容易出现这个问题。解决方案:
修改默认值,tez.am.task.max.failed.attempts=10 tez.am.max.app.attemps=5;
相关文章推荐
- 简化Unity开发的利器coroutine
- ubuntu下使用code::blocks编译运行一个简单的gtk+2.0项目
- bzoj3994[SDOI2015]约数个数和
- Android服务的方法回传调用
- 解决Cannot change version of project facet Dynamic web module to 2.5
- linux内核驱动---hello_world驱动加载
- JS一些方法
- div在div中行排,多处后换行问题,最后层加一个clear层
- 单链表反转
- 图像压缩算法
- DOM4j 操作XML
- 这个世界为什么需要程序员
- 关于MFC中InvalidateRect()的思考与疑问
- iOS里面KVO模式的详解和使用
- List和ArrayList, Map和HashMap的区别
- UIScrollView && UIPageControl
- MySQL的随机数函数rand()的使用技巧
- PAT(甲级)1008
- Spring中注解的使用
- apue.h头文件的配置