您的位置：首页 > 其它

Hive on Spark配置总结

2016-06-04 14:13 429 查看

环境配置Maven-3.3.3JDK 7u79Scala 2.10.6Hive 2.0.1Spark 1.5.0 sourceHadoop 2.6.4Hive版本Spark版本要相匹配，因此下载Hive源代码的pom.xml中查看spark.version来确定要使用的spark版本。Note that you must have a version of Spark which does not includethe Hive jars. Meaning one which was not built with the Hive profile.注意：Spark官网上pre-build spark-2.x都是集成Hive的，所以想要使用Hive on spark那么必须要下载源代码进行编译推荐 hive-1.2.1 on spark-1.3.1 / hive-2.0.1 on spark-1.5.2

编译Spark

默认是使用Scala 2.10.4来编译的export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"mvn -Pyarn -Phadoop-2.6 -DskipTests clean package./make-distribution.sh --name xm-spark --tgz -Phadoop-2.6 -Pyarn若果是用Scala 2.11.x来编译

./dev/change-scala-version.sh 2.11

mvn -Pyarn -Phadoop-2.6 -Dscala-2.11 -DskipTests clean package

./make-distribution.sh --name xm-spark --tgz -Phadoop-2.6 -Pyarntar包就会生成在spark目录里Hive-site.xml配置<property> <name>hive.execution.engine</name> <value>spark</value></property>Spark-default.conf配置Hive官网推荐配置

hive.vectorized.execution.enabled=truehive.cbo.enable=truehive.optimize.reducededuplication.min.reducer=4hive.optimize.reducededuplication=truehive.orc.splits.include.file.footer=falsehive.merge.mapfiles=truehive.merge.sparkfiles=falsehive.merge.smallfiles.avgsize=16000000hive.merge.size.per.task=256000000hive.merge.orcfile.stripe.level=truehive.auto.convert.join=truehive.auto.convert.join.noconditionaltask=truehive.auto.convert.join.noconditionaltask.size=894435328hive.optimize.bucketmapjoin.sortedmerge=falsehive.map.aggr.hash.percentmemory=0.5hive.map.4000aggr=truehive.optimize.sort.dynamic.partition=falsehive.stats.autogather=truehive.stats.fetch.column.stats=truehive.vectorized.execution.reduce.enabled=falsehive.vectorized.groupby.checkinterval=4096hive.vectorized.groupby.flush.percent=0.1hive.compute.query.using.stats=truehive.limit.pushdown.memory.usage=0.4hive.optimize.index.filter=truehive.exec.reducers.bytes.per.reducer=67108864hive.smbjoin.cache.rows=10000hive.exec.orc.default.stripe.size=67108864hive.fetch.task.conversion=morehive.fetch.task.conversion.threshold=1073741824hive.fetch.task.aggr=falsemapreduce.input.fileinputformat.list-status.num-threads=5spark.kryo.referenceTracking=falsespark.kryo.classesToRegister=org.apache.hadoop.hive.ql.io.HiveKey,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch

问题总结

1.Causedby: java.lang.NoClassDefFoundError:org/apache/hive/spark/client/Job a. 编译spark时把-Phive或-Phive-thrift给加上去了 b. hive 和 spark编译版本不匹配导致的错误2.Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)' FAILED:Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask a. hive 和 spark版本不匹配 b. scala环境错误导致的客户端启动失败(装好scala，重启YARN)3.环境配置导致的错误

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航