您的位置：首页 > 其它

hive三种方式区别和搭建、HiveServer2环境搭建、HWI环境搭建和beeline环境搭建

2017-04-17 14:43 344 查看

说在前面的话

　　以下三种情况，最好是在3台集群里做，比如，master、slave1、slave2的master和slave1都安装了hive，将master作为服务端，将slave1作为服务端。

hive三种方式区别和搭建

　　Hive中metastore（元数据存储）的三种方式：

　　a)内嵌Derby方式

　　b)Local方式

　　c)Remote方式

1.本地derby

这种方式是最简单的存储方式，只需要在hive-site.xml做如下配置便可

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

<name>javax.jdo.option.ConnectionURL</name>

<value>jdbc:derby:;databaseName=metastore_db;create=true</value>

</property>

<property>

<name>javax.jdo.option.ConnectionDriverName</name>

<value>org.apache.derby.jdbc.EmbeddedDriver</value>

</property>

<property>

<name>hive.metastore.local</name>

<value>true</value>

</property>

<property>

<name>hive.metastore.warehouse.dir</name>

<value>/user/hive/warehouse</value>

</property>

</configuration>

注：使用derby存储方式时，运行hive会在当前目录生成一个derby文件和一个metastore_db目录。这种存储方式的弊端是在同一个目录下同时只能有一个hive客户端能使用数据库，否则会提示如下错误

[html] view plaincopyprint?

hive> show tables;

FAILED: Error in metadata: javax.jdo.JDOFatalDataStoreException: Failed to start database ‘metastore_db‘, see the next exception for details.

NestedThrowables:

java.sql.SQLException: Failed to start database ‘metastore_db‘, see the next exception for details.

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

hive> show tables;

FAILED: Error in metadata: javax.jdo.JDOFatalDataStoreException: Failed to start database ‘metastore_db‘, see the next exception for details.

NestedThrowables:

java.sql.SQLException: Failed to start database ‘metastore_db‘, see the next exception for details.

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

2.本地mysql （单节点）（也叫作hive单用户模式）

这种存储方式需要在本地运行一个mysql服务器，并作如下配置（下面两种使用mysql的方式，需要将mysql的jar包拷贝到$HIVE_HOME/lib目录下）。

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

<name>hive.metastore.warehouse.dir</name>

<value>/user/hive_remote/warehouse</value>

</property>

<property>

<name>hive.metastore.local</name>

<value>true</value>

</property>

<property>

<name>javax.jdo.option.ConnectionURL</name>

<value>jdbc:mysql://localhost/hive_remote?createDatabaseIfNotExist=true</value>

</property>

<property>

<name>javax.jdo.option.ConnectionDriverName</name>

<value>com.mysql.jdbc.Driver</value>

</property>

<property>

<name>javax.jdo.option.ConnectionUserName</name>

<value>hive</value>

</property>

<property>

<name>javax.jdo.option.ConnectionPassword</name>

<value>password</value>

</property>

</configuration>

3..远端mysql （3、5节点，在主从上配）（也叫作hive多用户模式）

1.remote一体

这种存储方式需要在远端服务器运行一个mysql服务器，并且需要在Hive服务器启动meta服务。

这里用mysql的测试服务器，ip位192.168.1.214，新建hive_remote数据库，字符集位latine1

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

<name>hive.metastore.warehouse.dir</name>

<value>/user/hive/warehouse</value>

</property>

<property>

<name>javax.jdo.option.ConnectionURL</name>

<value>jdbc:mysql://192.168.57.6:3306/hive?createDatabaseIfNotExist=true</value>

</property>

<property>

<name>javax.jdo.option.ConnectionDriverName</name>

<value>com.mysql.jdbc.Driver</value>

</property>

<property>

<name>javax.jdo.option.ConnectionUserName</name>

<value>hive</value>

</property>

<property>

<name>javax.jdo.option.ConnectionPassword</name>

<value>password</value>

</property>

<property>

<name>hive.metastore.local</name>

<value>false</value>

</property>

<property>

<name>hive.metastore.uris</name>

<value>thrift://192.168.1.188:9083</value>

</property>

</configuration>

注：这里把hive的服务端和客户端都放在同一台服务器上了。服务端和客户端可以拆开。

2.Remote分开

将hive-site.xml配置文件拆为如下两部分

1）、服务端配置文件（比如在master）

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

<name>hive.metastore.warehouse.dir</name>

<value>/user/hive/warehouse</value>

</property>

<property>

<name>javax.jdo.option.ConnectionURL</name>

<value>jdbc:mysql://192.168.57.6:3306/hive?createDatabaseIfNotExist=true</value>

</property>

<property>

<name>javax.jdo.option.ConnectionDriverName</name>

<value>com.mysql.jdbc.Driver</value>

</property>

<property>

<name>javax.jdo.option.ConnectionUserName</name>

<value>root</value>

</property>

<property>

<name>javax.jdo.option.ConnectionPassword</name>

<value>123456</value>

</property>

</configuration>

2）、客户端配置文件（比如在slave1）

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

<name>hive.metastore.warehouse.dir</name>

<value>/user/hive/warehouse</value>

</property>

<property>

<name>hive.metastore.local</name>

<value>false</value>

</property>

<property>

<name>hive.metastore.uris</name>

<value>thrift://192.168.57.5:9083</value>

</property>

</configuration>

启动hive服务端程序

hive --service metastore

客户端直接使用hive命令即可

:~$ hive

Hive history file=/tmp/root/hive_job_log_root_201301301416_955801255.txt

hive> show tables;

OK

test_hive

Time taken: 0.736 seconds

hive>

　　

HiveServer2　　 & HWI 　　& 　　beeline 三大详细讲解
知识准备：bin/hiveserver2，这个是thrift服务器。

　　　　 bin/beeline，这个是客户端cli

其实，去看下Hive的架构，一目了然了。

　　1、CLI（command line interface）即命令行接口。

　　2、Thrift Server是Facebook开发的一个软件框架，它用来开发可扩展且跨语言的服务，Hive集成了该服务，能让不同的编程语言调用Hive的接口。

　　3、Hive客户端提供了通过网页的方式访问Hive提供的服务，这个接口对应Hive的HWI组件（Hive web interface），使用前要启动HWI服务。

　　4、Metastore是Hive中的元数据存储，主要存储Hive中的元数据，

　　　　包括表的名称、表的列和分区及其属性、表的属性（是否为外部表等）、表的数据所在目录等，一般使用MySQL或Derby数据库。

参考链接：

　　在之前的学习和实践中，使用的都是CLI或者hive –e的方式，该方式仅允许使用HiveQL执行查询、更新等操作，并且该方式比较笨拙单一。幸好Hive提供了轻客户端的实现，通过HiveServer或者HiveServer2，客户端可以在不启动CLI的情况下对Hive中的数据进行操作，两者都允许远程客户端使用多种编程语言如、向Hive提交请求，取回结果。

　　HiveServer或者HiveServer2都是基于Thrift的，但HiveSever有时被称为Thrift server，而HiveServer2却不会。

　　既然已经存在HiveServer为什么还需要HiveServer2呢？这是因为HiveServer不能处理多于一个客户端的并发请求，这是由于HiveServer使用的Thrift接口所导致的限制，不能通过修改HiveServer的代码修正。因此在Hive-0.11.0版本中重写了HiveServer代码得到了HiveServer2，进而解决了该问题。HiveServer2支持多客户端的并发和认证，为开放API客户端如JDBC、ODBC提供了更好的支持。

既然HiveServer2提供了更强大的功能，将会对其进行着重学习，但也会简单了解一下HiveServer的使用方法。在命令中输入hive --service help，结果如下。从结果可以了解到，可以使用hive <parameters> --service serviceName <serviceparameters>启动特定的服务，如cli、hiverserver、hiveserver2等。

[~]$ hive --service help

Usage ./hive<parameters> --service serviceName <service parameters>

Service List: beelinecli help hiveserver2 hiveserver hwi jar lineage metastore metatool orcfiledumprcfilecat schemaTool version

Parametersparsed:

--auxpath : Auxillary jars

--config : Hive configuration directory

--service : Starts specificservice/component. cli is default

Parameters used:

HADOOP_HOME or HADOOP_PREFIX : Hadoop installdirectory

HIVE_OPT : Hive options

For help on aparticular service:

./hive --service serviceName --help

Debug help: ./hive --debug --help

在命令行输入hive --service hiveserver –help查看hiveserver的帮助信息：

[~]$ hive --service hiveserver --help

Starting Hive Thrift Server

usage:hiveserver

-h,--help Print help information

--hiveconf <property=value> Use value for given property

--maxWorkerThreads <arg> maximum number of worker threads,

default:2147483647

--minWorkerThreads <arg> minimum number of worker threads,

default:100

-p <port> Hive Server portnumber, default:10000

-v,--verbose Verbose mode

启动hiveserver服务，可以得知默认hiveserver运行在端口10000，最小100工作线程，最大2147483647工作线程。

[~]$ hive --service hiveserver -v

Starting Hive Thrift Server

14/08/01 11:07:09WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has anyeffect. Use hive.hmshandler.retry.*instead

Starting hive serveron port 10000 with 100 min worker threads and 2147483647 maxworker threads

接下来学习更强大的hiveserver2。Hiveserver2允许在配置文件hive-site.xml中进行配置管理，具体的参数为：

　　hive.server2.thrift.min.worker.threads– 最小工作线程数，默认为5。

　　hive.server2.thrift.max.worker.threads – 最小工作线程数，默认为500。

　　hive.server2.thrift.port– TCP 的监听端口，默认为10000。

　　hive.server2.thrift.bind.host– TCP绑定的主机，默认为localhost。

　　也可以设置环境变量HIVE_SERVER2_THRIFT_BIND_HOST和HIVE_SERVER2_THRIFT_PORT覆盖hive-site.xml设置的主机和端口号。

从Hive-0.13.0开始，HiveServer2支持通过HTTP传输消息，该特性当客户端和服务器之间存在代理中介时特别有用。与HTTP传输相关的参数如下：

　　hive.server2.transport.mode – 默认值为binary（TCP），可选值HTTP。

　　hive.server2.thrift.http.port– HTTP的监听端口，默认值为10001。

　　hive.server2.thrift.http.path – 服务的端点名称，默认为 cliservice。

　　hive.server2.thrift.http.min.worker.threads– 服务池中的最小工作线程，默认为5。

　　hive.server2.thrift.http.max.worker.threads– 服务池中的最小工作线程，默认为500。

　　启动Hiveserver2有两种方式

　　　　一种是上面已经介绍过的hive --service hiveserver2

　　　　另一种更为简洁，为hiveserver2。

使用hive--service hiveserver2 –H或hive--service hiveserver2 –help查看帮助信息：

Starting HiveServer2

Unrecognizedoption: -h

usage:hiveserver2

-H,--help Print help information

--hiveconf <property=value> Use value for given property

　　默认情况下，HiveServer2以提交查询的用户执行查询（true），如果hive.server2.enable.doAs设置为false，查询将以运行hiveserver2进程的用户运行。为了防止非加密模式下的内存泄露，可以通过设置下面的参数为true禁用文件系统的缓存：

　　fs.hdfs.impl.disable.cache – 禁用HDFS文件系统缓存，默认值为false。

　　fs.file.impl.disable.cache – 禁用本地文件系统缓存，默认值为false。

HiveServer2

　　客户端可以在不启动CLI的情况下对Hive中的数据进行操作。

　　步骤一：配置HiveServer2，即是配置Hive的JDBC接口啦

　　　　去修改hive-site.xml文件，当然默认大部分都配置好了，若出现什么问题，去网上搜索查查再具体配置。

　　　　见

　　步骤二：启动HiveServer2，默认是10000，

　　　　在hive的安装目录下，执行bin/hive --server hiveserver2

　　　　或执行bin/hiveserver2

　　　　或执行bin/hive --service Hiveserver2 &

　　　　当然也可以如下这样

　　　　　　bin/hive --service hiveserver2 --hiveconf hive.server2.thrift.port=10001

　　

　　

Hive与JDBC示例(非常重要，公司里必须这么干)

在使用 JDBC 开发 Hive 程序时, 必须首先开启 Hive 的远程服务接口。使用下面命令进行开启:

　　步骤一：在hive的安装目录下　　

bin/hive --service Hiveserver2 & //Hive0.11.0以上版本提供了的服务是：Hiveserver2

　　我这里使用的Hive1.2.1版本，故我们使用Hiveserver2服务，下面我使用 Java 代码通过JDBC连接Hiveserver。

　　步骤二:准备好，测试数据

本地目录/home/hadoop/下的djt.txt文件内容(每行数据之间用tab键隔开)如下所示:

　　 1 dajiangtai

　　 2 hadoop

　　 3 Hive

　　 4 hbase

　　 5 spark

　在此，比如你是在Eclipse里或MyEclipse里编程，则需要

　步骤三：编写号，程序代码

　　import java.sql.Connection;

　　import java.sql.DriverManager;

　　import java.sql.ResultSet;

　　import java.sql.SQLException;

　　import java.sql.Statement;

　　public class Hive {

　　 private static String driverName = "org.apache.Hive.jdbc.HiveDriver";//Hive驱动名称

　　 private static String url = "jdbc:hive2://djt11:10000/default";//连接Hive2服务的连接地址

　　 private static String user = "spark";//对HDFS有操作权限的用户

　　 private static String password = "spark";//在非安全模式下，指定一个用户运行查询，忽略密码

　　private static String sql = "";

　　 private static ResultSet res;

　　public static void main(String[] args) {

　　 try {

　　 Class.forName(driverName);//加载HiveServer2驱动程序

　　 Connection conn = DriverManager.getConnection(url, user, password);//根据URL连接指定的数据库

　　 Statement stmt = conn.createStatement();

　　 //创建的表名

　　 String tableName = "testHiveDriverTable";

　　 /** 第一步:表存在就先删除 **/

　　 sql = "drop table " + tableName;

　　 stmt.execute(sql);

　　 /** 第二步:表不存在就创建 **/

　　 sql = "create table " + tableName + " (key int, value string) row format delimited fields terminated by ‘\t‘ STORED AS TEXTFILE";

　　 stmt.execute(sql);

　　// 执行“show tables”操作

　　sql = "show tables ‘" + tableName + "‘";

　　 res = stmt.executeQuery(sql);

　　 if (res.next()) {

　　 System.out.println(res.getString(1));

　　}

　　 // 执行“describe table”操作

　　 sql = "describe " + tableName;

　　 res = stmt.executeQuery(sql);

　　while (res.next()) {

　　 System.out.println(res.getString(1) + "\t" + res.getString(2));

　　 }

　　 // 执行“load data into table”操作

　　 String filepath = "/home/hadoop/djt.txt";//Hive服务所在节点的本地文件路径

　　 sql = "load data local inpath ‘" + filepath + "‘ into table " + tableName;

　　 stmt.execute(sql);

　　// 执行“select * query”操作

　　 sql = "select * from " + tableName;

　　 res = stmt.executeQuery(sql);

　　 while (res.next()) {

　　 System.out.println(res.getInt(1) + "\t" + res.getString(2));

　　 }

　　 // 执行“regular Hive query”操作，此查询会转换为MapReduce程序来处理

　　 sql = "select count(*) from " + tableName;

　　 res = stmt.executeQuery(sql);

　　 while (res.next()) {

　　 System.out.println(res.getString(1));

　　 }

　　 conn.close();

　　 conn = null;

　　 } catch (ClassNotFoundException e) {

　　 e.printStackTrace();

　　System.exit(1);

　　 } catch (SQLException e) {

　　 e.printStackTrace();

　　 System.exit(1);

　　 }

　　}

　　}

　　运行结果(右击-->Run as-->Run on Hadoop)

　　执行“show tables”运行结果:

　　　　　　　　testHivedrivertable

　　　　执行“describe table”运行结果:

　　　　　　key int

　　　　　　value string

　　执行“select * query”运行结果:

　　　　　　1 dajiangtai

　　　　　　2 hadoop

　　　　　　3 Hive

　　　　　　4 hbase

　　　　　　5 spark

　　　　执行“regular Hive query”运行结果:

　　　　　　5

Hwi环境搭建

　　HWI是Hive Web Interface的简称，是hive cli的一个web替换方案。

感谢！

Hive Web Interface（HWI）简介：Hive自带了一个Web-GUI。但在lib下，是一个hive-hwi-1.2.1.jar，需要我们自己制作。

怎么制作出hive-hwi-*.*.*.war？

　　这里，以hive-1.2.1位例。

下载源码

　　下载地址：http://www.apache.org/dyn/closer.cgi/hive/

　　得到apache-hive-1.2.1-src.tar.gz

打包

将源码解压:

　　tar -zxvf apache-hive-1.2.1-src.tar.gz

进入解压后的目录，再进入hwi目录下：

　　cd apache-hive-1.2.1-src/hwi/

生成war包：

　　jar cvM hive-hwi-1.2.1.war -C web .

将生成的war包，拷贝到hive的lib目录下，重启hwi服务。

报错解决

若有如下报错，需将jre下的tools.jar包拷到Hive的lib目录下，重启hwi服务：

cp /usr/java/jdk1.7.0_79/lib/tools.jar /home/Big.Data/Hive/apache-hive-1.2.1-bin/lib/.

sh bin/hive --service hwi

----------------------------------------------------------------------------------------------------------------------

Problem accessing /hwi/. Reason:

Unable to find a javac compiler;

com.sun.tools.javac.Main is not on the classpath.

Perhaps JAVA_HOME does not point to the JDK.

It is currently set to "/usr/java/jdk1.7.0_79/jre"

怎么制作出hive-hwi-*.*.*.war？

　　　　需要下载Hive的源码文件，然后将hwi/web目录下的文件用 jar cvf hive-hwi-1.2.1.war ./*

　　　　其实war包也是zip包，可以通过。

　　　　cd hwi/web

　　　　zip hive-hwi-1.2.1.zip ./* 　　　　 //打包成.zip文件。

　　　　将zip包后缀改成war

　　　　mv hive-hwi-1.2.1.zip hive-hwi-1.2.1.war

　　

cp hive-hwi-1.2.1.war /opt/sxt/soft/apache-hive-1.2.1-bin/lib/

命令来打包成一个war包，然后放到Hive的lib目录下即可。

<property>

<name>hive.hwi.listen.host</name>

<value>0.0.0.0</value>

<description>This is the host address the Hive Web Interface will listen on</description>

</property>

<property>

<name>hive.hwi.listen.port</name>

<value>9999</value>

<description>This is the port the Hive Web Interface will listen on</description>

</property>

<property>

<name>hive.hwi.war.file</name>

<value>${env:HWI_WAR_FILE}</value>

<description>This sets the path to the HWI war file, relative to ${HIVE_HOME}. </description>

</property>

这是hive-1.2.1自带的，需要修改成下面部分。

配置文件conf/hive-site.xml，添加hive.hwi.war.file的配置：

<property>

<name>hive.hwi.listen.host</name>

<value>0.0.0.0</value>

<description>This is the host address the Hive Web Interface will listen on</description>

</property>

<property>

<name>hive.hwi.listen.port</name>

<value>9999</value>

<description>This is the port the Hive Web Interface will listen on</description>

</property>

<property>

<name>hive.hwi.war.file</name>

<value>lib/hive-hwi-1.2.1.war</value>

<description>This sets the path to the HWI war file, relative to ${HIVE_HOME}. </description>

</property>

启动

$ sh bin/hive --service hwi

----------------------------------------------------------------------------------------------------

没有UI war包的，需要自己下载对应版本的源码进行打包，后拷到lib下。

　　其实这里/lib/hive-hwi-1.2.1war，就是hive安装目录下。soga!

　　在配置文件中，监听端口默认是9999，也可以通过hive配置文件对端口进行修改。当配置完成后，

　　在hive的安装目录下，执行bin/hive --server hwi

　　对应地，http://masterIP:9999/hwi

　

　

　　Hive网络接口操作实例

　　如，数据库及表信息查询、Hive查询、等

可参照

beeline环境搭建

步骤一：

在hive的安装目录下，执行bin/beeline，进入beeline，执行以下

!connect jdbc:hive2://localhost:10000 root org.apache.hive.jdbc.HiveDriver

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航