您的位置：首页 > 数据库

spark sql thrift server搭建及踩过的坑

2016-07-23 21:02 423 查看

如何配置

配置hadoop和yarn

配置HADOOP_CONF_DIR

copy

hive-site.xml

到 spark_home/conf

在spark_env.sh中配置mysql的路径

如何启动

./start-thriftserver.sh \

–name olap.thriftserver \

–master yarn-client \

–queue bi \

–conf spark.driver.memory=3G \

–conf spark.shuffle.service.enabled=true \

–conf spark.dynamicAllocation.enabled=true \

–conf spark.dynamicAllocation.minExecutors=1 \

–conf spark.dynamicAllocation.maxExecutors=30 \

–conf spark.dynamicAllocation.sustainedSchedulerBacklogTimeout=5s \

–jars /usr/local/spark-1.6.1-bin-hadoop2.6/lib/*

如何使用

shell进入beeline

!connect jdbc:hive2://storage8.test.lan:10000

输入

hive-site.xml

中的

javax.jdo.option.ConnectionUserName

和

javax.jdo.option.ConnextionPassWord

hive的hbase外部表问题

很多文档中都没提到一个jar，这下坑惨了，大部分文档提到了以下jar:

hbase-client-0.98.18-hadoop2.jar

hbase-common-0.98.18-hadoop2.jar

hbase-server-0.98.18-hadoop2.jar

hbase-protocol-0.98.18-hadoop2.jar

protobuf-java-2.5.0.jar

guava-12.0.1.jar

/hive-hbase-handler-1.2.1.jar

但是hbase坑爹的在35次retry之后抛出了一个

ClassNotFind

，时间长达7分钟我也是醉了，后面根据报错加了一个监控的jar，正常运行，最后给出我的配置的

spark-env,sh

export SPARK_CLASSPATH=$SPARK_HOME/lib/mysql-connector-java-5.1.38.jar:$SPARK_HOME/lib/hbase-server-0.98.18-hadoop2.jar:$SPARK_HOME/lib/hbase-common-0.98.18-hadoop2.jar:$SPARK_HOME/lib/hbase-client-0.98.18-hadoop2.jar:$SPARK_HOME/lib/hbase-protocol-0.98.18-hadoop2.jar:$SPARK_HOME/lib/htrace-core-2.04.jar:$SPARK_HOME/lib/protobuf-java-2.5.0.jar:$SPARK_HOME/lib/guava-12.0.1.jar:$SPARK_HOME/lib/hive-hbase-handler-1.2.1-xima.jar:$SPARK_HOME/lib/com.yammer.metrics.metrics-core-2.2.0.jar:${SPARK_CLASSPATH}

另一个hive启动的问题

我们会发现hive每次启动最终都用derby数据库，网上看hive.metastore.schema.verification in hive-site.xml改为false，取消check schema，参考资料，但我们不想改配置，后来跟了下源码貌似少初始化了一个类，所以我们现在先保证当前ClassLoad里面有这个类:

package com.ximalaya.spark.thriftserver

import org.apache.hive.service.server.HiveServer2

/**
* @author todd.chen at 7/23/16 8:49 PM.
*         email : todd.chen@ximalaya.com
*/
object XimaThriftServer {
def main(args: Array[String]): Unit = {
Class.forName("org.apache.hadoop.hive.ql.metadata.Hive")
HiveServer2.main(args)

}

}

然后改一下

start-thriftserver.sh

#!/usr/bin/env bash

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0 #
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

#
# Shell script for starting the Spark SQL Thrift server

# Enter posix mode for bash
set -o posix

if [ -z "${SPARK_HOME}" ]; then
export SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"
fi

# NOTE: This exact class name is matched downstream by SparkSubmit.
# Any changes need to be reflected there.
CLASS="com.ximalaya.spark.thriftserver.XimaSparkThriftServer"

function usage {
echo "Usage: ./sbin/start-thriftserver [options] [thrift server options]"
pattern="usage"
pattern+="\|Spark assembly has been built with Hive"
pattern+="\|NOTE: SPARK_PREPEND_CLASSES is set"
pattern+="\|Spark Command: "
pattern+="\|======="
pattern+="\|--help"

"${SPARK_HOME}"/bin/spark-submit --help 2>&1 | grep -v Usage 1>&2
echo
echo "Thrift server options:"
"${SPARK_HOME}"/bin/spark-class $CLASS --help 2>&1 | grep -v "$pattern" 1>&2
}

if [[ "$@" = *--help ]] || [[ "$@" = *-h ]]; then
usage
exit 0
fi

export SUBMIT_USAGE_FUNCTION=usage

exec "${SPARK_HOME}"/sbin/spark-daemon.sh submit $CLASS 1 "$@"

好了，正常使用了

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： spark hadoop

相关文章推荐

新的分享

章节导航