您的位置:首页 > 其它

hive 优化遇到的一个问题:hive.auto.convert.join

2013-11-21 17:29 519 查看
hive的join 有一种优化的方式:map join

但是,使用这种优化的时候要小心一点,先说一下优化配置的参数:

set hive.optimize.correlation=true
set hive.auto.convert.join=true

当运行一个比较大的join时候,出现了下面的问题:

at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
... 8 more
Caused by: java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method)
at org.apache.hadoop.io.Text.set(Text.java:225)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryString.init(LazyBinaryString.java:48)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:216)
at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:197)
at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:61)
at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.evaluate(ExprNodeColumnEvaluator.java:98)
at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:234)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:90)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:652)
... 9 more
网上查了一圈,貌似这还是个bug:

https://issues.apache.org/jira/i#browse/HIVE-4502

将 hive.auto.convert.join 设置成false,重新运行,问题就不出现了。

有一篇文件可以看一下:

http://www.gemini5201314.net/hadoop/hadoop-%E4%B8%AD%E7%9A%84%E4%B8%A4%E8%A1%A8join.html

hive 0.11 版的bug 也要注意一下。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: