Hive on Spark配置总结
2016-06-04 14:13
429 查看
环境配置Maven-3.3.3JDK 7u79Scala 2.10.6Hive 2.0.1Spark 1.5.0 sourceHadoop 2.6.4Hive版本Spark版本要相匹配,因此下载Hive源代码的pom.xml中查看spark.version来确定要使用的spark版本。Note that you must have a version of Spark which does not includethe Hive jars. Meaning one which was not built with the Hive profile.注意:Spark官网上pre-build spark-2.x都是集成Hive的,所以想要使用Hive on spark那么必须要下载源代码进行编译推荐 hive-1.2.1 on spark-1.3.1 / hive-2.0.1 on spark-1.5.2
编译Spark
默认是使用Scala 2.10.4来编译的export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"mvn -Pyarn -Phadoop-2.6 -DskipTests clean package./make-distribution.sh --name xm-spark --tgz -Phadoop-2.6 -Pyarn若果是用Scala 2.11.x来编译./dev/change-scala-version.sh 2.11
mvn -Pyarn -Phadoop-2.6 -Dscala-2.11 -DskipTests clean package./make-distribution.sh --name xm-spark --tgz -Phadoop-2.6 -Pyarntar包就会生成在spark目录里Hive-site.xml配置<property> <name>hive.execution.engine</name> <value>spark</value></property>Spark-default.conf配置Hive官网推荐配置
hive.vectorized.execution.enabled=truehive.cbo.enable=truehive.optimize.reducededuplication.min.reducer=4hive.optimize.reducededuplication=truehive.orc.splits.include.file.footer=falsehive.merge.mapfiles=truehive.merge.sparkfiles=falsehive.merge.smallfiles.avgsize=16000000hive.merge.size.per.task=256000000hive.merge.orcfile.stripe.level=truehive.auto.convert.join=truehive.auto.convert.join.noconditionaltask=truehive.auto.convert.join.noconditionaltask.size=894435328hive.optimize.bucketmapjoin.sortedmerge=falsehive.map.aggr.hash.percentmemory=0.5hive.map.4000aggr=truehive.optimize.sort.dynamic.partition=falsehive.stats.autogather=truehive.stats.fetch.column.stats=truehive.vectorized.execution.reduce.enabled=falsehive.vectorized.groupby.checkinterval=4096hive.vectorized.groupby.flush.percent=0.1hive.compute.query.using.stats=truehive.limit.pushdown.memory.usage=0.4hive.optimize.index.filter=truehive.exec.reducers.bytes.per.reducer=67108864hive.smbjoin.cache.rows=10000hive.exec.orc.default.stripe.size=67108864hive.fetch.task.conversion=morehive.fetch.task.conversion.threshold=1073741824hive.fetch.task.aggr=falsemapreduce.input.fileinputformat.list-status.num-threads=5spark.kryo.referenceTracking=falsespark.kryo.classesToRegister=org.apache.hadoop.hive.ql.io.HiveKey,org.apache.hadoop.io.BytesWritable,org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch
问题总结
1.Causedby: java.lang.NoClassDefFoundError:org/apache/hive/spark/client/Job a. 编译spark时把-Phive或-Phive-thrift给加上去了 b. hive 和 spark编译版本不匹配导致的错误2.Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)' FAILED:Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask a. hive 和 spark版本不匹配 b. scala环境错误导致的客户端启动失败(装好scala,重启YARN)3.环境配置导致的错误相关文章推荐
- ubuntu16.04怎么关掉图形界面启动(需要用root权限)
- Visio使用技巧
- Hibernate 继承关系 对象关系映射--subclass ,测试 crud
- 移动端开发-touch事件及其相关属性
- EnforceLearning-主动强化学习
- Air Playit User Guide
- SpringMVC-视图
- springmvc容器初始化
- Eclipse使用Axis2,spring开发jax-ws(webservice)的配置讲解
- 个人最终总结
- JavaBean基础
- Axis2 的配置方法
- Axis2(一):配置指导
- java.util.HashMap源码分析
- 通过服务器获取终端上应用软件更新数据的方法和系统
- EWM 标准服务类
- Win7安装mongodb及配置
- https+xml服务端/客户端java后台编写及soapUI https测试
- 从WI-FI密码破解来看如何提高WI-FI的安全性
- UniversalMusicPlayer 学习笔记(二)