您的位置:首页 > 数据库

spark sql thrift server搭建及踩过的坑

2016-07-23 21:02 423 查看

如何配置

配置hadoop和yarn

配置HADOOP_CONF_DIR

copy
hive-site.xml
到 spark_home/conf

在spark_env.sh中配置mysql的路径

如何启动

./start-thriftserver.sh \

–name olap.thriftserver \

–master yarn-client \

–queue bi \

–conf spark.driver.memory=3G \

–conf spark.shuffle.service.enabled=true \

–conf spark.dynamicAllocation.enabled=true \

–conf spark.dynamicAllocation.minExecutors=1 \

–conf spark.dynamicAllocation.maxExecutors=30 \

–conf spark.dynamicAllocation.sustainedSchedulerBacklogTimeout=5s \

–jars /usr/local/spark-1.6.1-bin-hadoop2.6/lib/*

如何使用

shell进入beeline

!connect jdbc:hive2://storage8.test.lan:10000

输入
hive-site.xml
中的
javax.jdo.option.ConnectionUserName
javax.jdo.option.ConnextionPassWord


hive的hbase外部表问题

很多文档中都没提到一个jar,这下坑惨了,大部分文档提到了以下jar:

hbase-client-0.98.18-hadoop2.jar

hbase-common-0.98.18-hadoop2.jar

hbase-server-0.98.18-hadoop2.jar

hbase-protocol-0.98.18-hadoop2.jar

protobuf-java-2.5.0.jar

guava-12.0.1.jar

/hive-hbase-handler-1.2.1.jar

但是hbase坑爹的在35次retry之后抛出了一个
ClassNotFind
,时间长达7分钟我也是醉了,后面根据报错加了一个监控的jar,正常运行,最后给出我的配置的
spark-env,sh
:



export SPARK_CLASSPATH=$SPARK_HOME/lib/mysql-connector-java-5.1.38.jar:$SPARK_HOME/lib/hbase-server-0.98.18-hadoop2.jar:$SPARK_HOME/lib/hbase-common-0.98.18-hadoop2.jar:$SPARK_HOME/lib/hbase-client-0.98.18-hadoop2.jar:$SPARK_HOME/lib/hbase-protocol-0.98.18-hadoop2.jar:$SPARK_HOME/lib/htrace-core-2.04.jar:$SPARK_HOME/lib/protobuf-java-2.5.0.jar:$SPARK_HOME/lib/guava-12.0.1.jar:$SPARK_HOME/lib/hive-hbase-handler-1.2.1-xima.jar:$SPARK_HOME/lib/com.yammer.metrics.metrics-core-2.2.0.jar:${SPARK_CLASSPATH}


另一个hive启动的问题

我们会发现hive每次启动最终都用derby数据库,网上看hive.metastore.schema.verification in hive-site.xml改为false,取消check schema,参考资料,但我们不想改配置,后来 跟了下源码貌似少初始化了一个类,所以我们现在先保证当前ClassLoad里面有这个类:

package com.ximalaya.spark.thriftserver

import org.apache.hive.service.server.HiveServer2

/**
* @author todd.chen at 7/23/16 8:49 PM.
*         email : todd.chen@ximalaya.com
*/
object XimaThriftServer {
def main(args: Array[String]): Unit = {
Class.forName("org.apache.hadoop.hive.ql.metadata.Hive")
HiveServer2.main(args)

}

}


然后改一下
start-thriftserver.sh
:

#!/usr/bin/env bash

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0 #
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

#
# Shell script for starting the Spark SQL Thrift server

# Enter posix mode for bash
set -o posix

if [ -z "${SPARK_HOME}" ]; then
export SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"
fi

# NOTE: This exact class name is matched downstream by SparkSubmit.
# Any changes need to be reflected there.
CLASS="com.ximalaya.spark.thriftserver.XimaSparkThriftServer"

function usage {
echo "Usage: ./sbin/start-thriftserver [options] [thrift server options]"
pattern="usage"
pattern+="\|Spark assembly has been built with Hive"
pattern+="\|NOTE: SPARK_PREPEND_CLASSES is set"
pattern+="\|Spark Command: "
pattern+="\|======="
pattern+="\|--help"

"${SPARK_HOME}"/bin/spark-submit --help 2>&1 | grep -v Usage 1>&2
echo
echo "Thrift server options:"
"${SPARK_HOME}"/bin/spark-class $CLASS --help 2>&1 | grep -v "$pattern" 1>&2
}

if [[ "$@" = *--help ]] || [[ "$@" = *-h ]]; then
usage
exit 0
fi

export SUBMIT_USAGE_FUNCTION=usage

exec "${SPARK_HOME}"/sbin/spark-daemon.sh submit $CLASS 1 "$@"


好了,正常使用了
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  spark hadoop