您的位置:首页 > 其它

hive-错误-处理解压文件gz出错

2016-03-02 00:00 337 查看
gz压缩文件上传到hdfs,hive读取处理

Task with the most failures(4):

-----

Task ID:

task_1456816082333_1354_m_000339

URL:
http://xxxx:8088/taskdetails.jsp?jobid=job_1456816082333_1354&tipid=task_1456816082333_1354_m_000339
-----

Diagnostic Messages for this Task:

Error: java.io.IOException: java.io.EOFException: Unexpected end of input stream

at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)

at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)

at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:344)

at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)

at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)

at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:122)

at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:197)

at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:183)

at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)

at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)

at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)

at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)

Caused by: java.io.EOFException: Unexpected end of input stream

at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:145)

at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)

at java.io.InputStream.read(InputStream.java:101)

at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:211)

at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)

at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:206)

at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:45)

at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:339)

... 13 more

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

MapReduce Jobs Launched:

Job 0: Map: 358 Cumulative CPU: 5067.56 sec HDFS Read: 2749408714 HDFS Write: 2747850422 FAIL

Total MapReduce CPU Time Spent: 0 days 1 hours 24 minutes 27 seconds 560 msec

出现这个错误,明细是压缩文件损坏,处理方式应是删除hdfs对应的文件,重新上传

可hive表或者分区内,有几万个文件,识别不了是哪个压缩文件,想办法找出来,寻找方式:

执行hive任务的时候,进入到8088的map详细进度列表,即是RUNNING MAP attempts in job_1456816082333_1354,查看最后出错的map是哪个节点或者在页面直接点击logs进入详细log日志查看,或者进入到节点的hadoop的logs/userlogs目录,根据jobid找到对应的目录: application_1456816082333_1354,里面有错误的文件id,然后删除掉hdfs的对应的损坏文件。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: