hive安装配置与hive的JDBC (hadoop-0.20.2 + hive-0.7.0)
2016-02-22 16:35
169 查看
转:http://blog.163.com/huang_zhong_yuan/blog/static/174975283201181371146365/
hive安装
安装hive的时候参考了http://yp.oss.org.cn/software/show_resource.php?resource_id=270 和http://www.cnblogs.com/flying5/archive/2011/05/23/2078399.html具体步骤大概是:
1)安装hadoop
2)jdk
3)下载hive
上面这些都可以从网上找到很多,下面进入正题
hive配置
1)配置环境变量
vim ~/.bashrc 在后面加上
export HADOOP_HOME=/home/arthur/hadoop-0.20.2
export HIVE_HOME=/home/arthur/hadoop-0.20.2/hive-0.7.0-bin
export HIVE_CONF_DIR=$HIVE_HOME/conf
export HIVE_LIB=$HIVE_HOME/lib
export CLASSPATH=$CLASSPATH:$HIVE_LIB
export PATH=$HIVE_HOME/bin/:$PATH
这一步也可以在$HIVE_HOME/bin/hive-config.sh中配置
export HADOOP_HOME=/home/arthur/hadoop-0.20.2
export HIVE_HOME=/home/arthur/hadoop-0.20.2/hive-0.7.0-bin
export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.24
2)在$HIVE_HOME/conf目录下,新建一个hive-site.xml,配置Hive元数据的存储方式(我用的是mysql存储)
注意下面配置的是hive默认的是mysql的hive数据库,用户名是hive,密码是hive。所以在进行下列操作前要用root登录mysql创建hive数据库并且分配给用户hive。
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.local</name>
<value>true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>false</value>
</property>
</configuration>
注意:如果在Hive执行的时候,遇到“ERROR exec.DDLTask:
FAILED: Error in metadata: java.lang.IllegalArgumentException: URI: does not have a scheme”这种错误就可能是$HIVE_HOME配置不对,导致hive找不到元数据的配置文件或者找不到元数据的数据库导致的。
现在我们执行$HIVE_HOME/bin/hive后showtables命令,会提示“java.sql.SQLNonTransientConnectionException: java.net.ConnectException : Error connecting to server localhost on port 3306 with message Connection refused”。这是因为我们的mysql数据库没有安装,服务没有启动。
测试hive
注意,测试hive之前必须启动hadoop,因为hive是依赖于hadoop的
下面这些测试是从那两个网页中复制过来的,不过我都测试成功了
[hadoop@gp1 hive]$ bin/hive
hive> CREATE TABLE pokes (foo INT, bar STRING);
OK
Time taken: 0.251 seconds
hive> CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);
OK
Time taken: 0.106 seconds
hive> SHOW TABLES;
OK
invites pokes
Time taken: 0.107 seconds
hive> DESCRIBE invites;
OK
foo int
bar string
ds string
Time taken: 0.151 seconds
hive> ALTER TABLE pokes ADD COLUMNS (new_col INT);
OK
Time taken: 0.117 seconds
hive> ALTER TABLE invites ADD COLUMNS (new_col2 INT COMMENT ‘a comment’);
OK
Time taken: 0.152 seconds
hive> LOAD DATA LOCAL INPATH ‘./examples/files/kv1.txt’ OVERWRITE INTO TABLE pokes;
Copying data from file:/home/hadoop/hadoop-0.19.1/contrib/hive/examples/files/kv1.txt
Loading data to table pokes
OK
Time taken: 0.288 seconds
hive> LOAD DATA LOCAL INPATH ‘./examples/files/kv2.txt’ OVERWRITE INTO TABLE invites PARTITION (ds=’2008-08-15′);
Copying data from file:/home/hadoop/hadoop-0.19.1/contrib/hive/examples/files/kv2.txt
Loading data to table invites partition {ds=2008-08-15}
OK
Time taken: 0.524 seconds
hive> LOAD DATA LOCAL INPATH ‘./examples/files/kv3.txt’ OVERWRITE INTO TABLE invites PARTITION (ds=’2008-08-08′);
Copying data from file:/home/hadoop/hadoop-0.19.1/contrib/hive/examples/files/kv3.txt
Loading data to table invites partition {ds=2008-08-08}
OK
Time taken: 0.406 seconds
hive> INSERT OVERWRITE DIRECTORY ‘/tmp/hdfs_out’ SELECT a.* FROM invites a;
Total MapReduce jobs = 1
Starting Job = job_200902261245_0002, Tracking URL = http://gp1:50030/jobdetails.jsp?jobid=job_200902261245_0002
Kill Command = /home/hadoop/hadoop-0.19.1/bin/hadoop job -Dmapred.job.tracker=gp1:9001 -kill job_200902261245_0002
map = 0%, reduce =0%
map = 50%, reduce =0%
map = 100%, reduce =0%
Ended Job = job_200902261245_0002
Moving data to: /tmp/hdfs_out
OK
Time taken: 18.551 seconds
hive> select count(1) from pokes;
Total MapReduce jobs = 2
Number of reducers = 1
In order to change numer of reducers use:
set mapred.reduce.tasks = <number>
Starting Job = job_200902261245_0003, Tracking URL = http://gp1:50030/jobdetails.jsp?jobid=job_200902261245_0003
Kill Command = /home/hadoop/hadoop-0.19.1/bin/hadoop job -Dmapred.job.tracker=gp1:9001 -kill job_200902261245_0003
map = 0%, reduce =0%
map = 50%, reduce =0%
map = 100%, reduce =0%
map = 100%, reduce =17%
map = 100%, reduce =100%
Ended Job = job_200902261245_0003
Starting Job = job_200902261245_0004, Tracking URL = http://gp1:50030/jobdetails.jsp?jobid=job_200902261245_0004
Kill Command = /home/hadoop/hadoop-0.19.1/bin/hadoop job -Dmapred.job.tracker=gp1:9001 -kill job_200902261245_0004
map = 0%, reduce =0%
map = 50%, reduce =0%
map = 100%, reduce =0%
map = 100%, reduce =100%
Ended Job = job_200902261245_0004
OK
500
Time taken: 57.285 seconds
hive> INSERT OVERWRITE DIRECTORY ‘/tmp/hdfs_out’ SELECT a.* FROM invites a;
Total MapReduce jobs = 1
Starting Job = job_200902261245_0005, Tracking URL = http://gp1:50030/jobdetails.jsp?jobid=job_200902261245_0005
Kill Command = /home/hadoop/hadoop-0.19.1/bin/hadoop job -Dmapred.job.tracker=gp1:9001 -kill job_200902261245_0005
map = 0%, reduce =0%
map = 50%, reduce =0%
map = 100%, reduce =0%
Ended Job = job_200902261245_0005
Moving data to: /tmp/hdfs_out
OK
Time taken: 18.349 seconds
hive> INSERT OVERWRITE DIRECTORY ‘/tmp/reg_5′ SELECT COUNT(1) FROM invites a;
Total MapReduce jobs = 2
Number of reducers = 1
In order to change numer of reducers use:
set mapred.reduce.tasks = <number>
Starting Job = job_200902261245_0006, Tracking URL = http://gp1:50030/jobdetails.jsp?jobid=job_200902261245_0006
Kill Command = /home/hadoop/hadoop-0.19.1/bin/hadoop job -Dmapred.job.tracker=gp1:9001 -kill job_200902261245_0006
map = 0%, reduce =0%
map = 50%, reduce =0%
map = 100%, reduce =0%
map = 100%, reduce =17%
map = 100%, reduce =100%
Ended Job = job_200902261245_0006
Starting Job = job_200902261245_0007, Tracking URL = http://gp1:50030/jobdetails.jsp?jobid=job_200902261245_0007
Kill Command = /home/hadoop/hadoop-0.19.1/bin/hadoop job -Dmapred.job.tracker=gp1:9001 -kill job_200902261245_0007
map = 0%, reduce =0%
map = 50%, reduce =0%
map = 100%, reduce =0%
map = 100%, reduce =17%
map = 100%, reduce =100%
Ended Job = job_200902261245_0007
Moving data to: /tmp/reg_5
OK
hive JDBC配置
public class HiveJDBC {
public static void main(String[] args) throws Exception{
Class.forName("org.apache.hadoop.hive.jdbc.HiveDriver");
String dropSQL = "drop table table_3";
String createSQL = "create table table_3(key int, value string, a int) row format delimited fields terminated by '\t' stored as textfile";
String insertSQL = "load data local inpath '/home/huangzhongyuan/hadoop-0.20.2/hive-0.7.0-bin/test/test3.txt' overwrite into table table_3";
String querySQL = "select a.* from table_3 a";
//Connection con = DriverManager.getConnection("jdbc:hive://localhost:3306/hive","hive","hive"); //注意这个不行
Connection con = DriverManager.getConnection("jdbc:hive://localhost:10000/default", "", ""); //默认使用端口10000, 使用默认数据库,用户名密码默认
Statement stmt = con.createStatement();
stmt.executeQuery(dropSQL);
stmt.executeQuery(createSQL);
stmt.executeQuery(insertSQL);
ResultSet res = stmt.executeQuery(querySQL);
while(res.next()){
System.out.println(res.getString(1) + " " + res.getString(2) + " " + res.getString(3));
}
}
}
注意这段代码要吧hive的lib下的jar全部都包含进去,另外要包含hadoop-0.20.2-core.jar。
不过如果仅仅这样,会报出错,ms是什么connection refused。因为刚接触hive,所以查了很久也没有查出原因,后来才发现犯了一个很傻的错误,就是忘记启动hive server。启动命令:bin/hive --service hiveserver >/dev/null 2>/dev/null &,(这个具体分别代表什么意思我也不知道,如果有知道的麻烦留言一下,谢谢)用着个命令也可以重启。
当hive server启动出现问题时,可能是由于hadoop和hive中都有libthrift.jar和libfb303.jar他们版本相冲突造成的,删除他们就可以了。
hive建表的几个命令
drop table table_3
create table table_3(key int, value string, a int) row format delimited fields terminated by '\t' stored as textfile
//用制表符作间隔
load data local inpath '/home/huangzhongyuan/hadoop-0.20.2/hive-0.7.0-bin/test/test3.txt' overwrite into table table_3 //把本地的文件写到hive的表中
如果上句中没有local,则是把hdfs中的文件写到hive表中。
由于很多数据在hadoop平台,当从hadoop平台的数据迁移到hive目录下时,由于hive默认的分隔符是/u0001(即ctrl+ a),为了平滑迁移,需要在创建表格时指定数据的分割符号,语法如下:
create table test(uid string,name string)row format delimited fields terminated by '/t';
通过这种方式,完成分隔符的指定。
然后通过hadoop fs -cp或者hadoop distcp 进行文件夹复制。
由于数据复制的成本比较高,时间比较慢,当不需要复制数据的时候,可以直接采取移动的方式将hadoop数据转移到hive,hadoop fs -mv src dest。
一个比较简单的方法是直接创建 external table,语法如下:
create table test(uid string,name string)row format delimited fields terminated by '/t' location 'hdfs';
通过这种方式,避免数据的移动带来时间损耗,提高运行的效率。
将hive select的结果放到本地文件系统中
insert overwrite local directory ‘/tmp/reg_3'
select a.* from event a;
将select的结果放到hdfs文件系统中:
insert overwrite directory '/tmp/hdfs_out' select a.* from invites a where a.ds='<data>';
hive安装
安装hive的时候参考了http://yp.oss.org.cn/software/show_resource.php?resource_id=270 和http://www.cnblogs.com/flying5/archive/2011/05/23/2078399.html具体步骤大概是:
1)安装hadoop
2)jdk
3)下载hive
上面这些都可以从网上找到很多,下面进入正题
hive配置
1)配置环境变量
vim ~/.bashrc 在后面加上
export HADOOP_HOME=/home/arthur/hadoop-0.20.2
export HIVE_HOME=/home/arthur/hadoop-0.20.2/hive-0.7.0-bin
export HIVE_CONF_DIR=$HIVE_HOME/conf
export HIVE_LIB=$HIVE_HOME/lib
export CLASSPATH=$CLASSPATH:$HIVE_LIB
export PATH=$HIVE_HOME/bin/:$PATH
这一步也可以在$HIVE_HOME/bin/hive-config.sh中配置
export HADOOP_HOME=/home/arthur/hadoop-0.20.2
export HIVE_HOME=/home/arthur/hadoop-0.20.2/hive-0.7.0-bin
export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.24
2)在$HIVE_HOME/conf目录下,新建一个hive-site.xml,配置Hive元数据的存储方式(我用的是mysql存储)
注意下面配置的是hive默认的是mysql的hive数据库,用户名是hive,密码是hive。所以在进行下列操作前要用root登录mysql创建hive数据库并且分配给用户hive。
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.local</name>
<value>true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>false</value>
</property>
</configuration>
注意:如果在Hive执行的时候,遇到“ERROR exec.DDLTask:
FAILED: Error in metadata: java.lang.IllegalArgumentException: URI: does not have a scheme”这种错误就可能是$HIVE_HOME配置不对,导致hive找不到元数据的配置文件或者找不到元数据的数据库导致的。
现在我们执行$HIVE_HOME/bin/hive后showtables命令,会提示“java.sql.SQLNonTransientConnectionException: java.net.ConnectException : Error connecting to server localhost on port 3306 with message Connection refused”。这是因为我们的mysql数据库没有安装,服务没有启动。
测试hive
注意,测试hive之前必须启动hadoop,因为hive是依赖于hadoop的
下面这些测试是从那两个网页中复制过来的,不过我都测试成功了
[hadoop@gp1 hive]$ bin/hive
hive> CREATE TABLE pokes (foo INT, bar STRING);
OK
Time taken: 0.251 seconds
hive> CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);
OK
Time taken: 0.106 seconds
hive> SHOW TABLES;
OK
invites pokes
Time taken: 0.107 seconds
hive> DESCRIBE invites;
OK
foo int
bar string
ds string
Time taken: 0.151 seconds
hive> ALTER TABLE pokes ADD COLUMNS (new_col INT);
OK
Time taken: 0.117 seconds
hive> ALTER TABLE invites ADD COLUMNS (new_col2 INT COMMENT ‘a comment’);
OK
Time taken: 0.152 seconds
hive> LOAD DATA LOCAL INPATH ‘./examples/files/kv1.txt’ OVERWRITE INTO TABLE pokes;
Copying data from file:/home/hadoop/hadoop-0.19.1/contrib/hive/examples/files/kv1.txt
Loading data to table pokes
OK
Time taken: 0.288 seconds
hive> LOAD DATA LOCAL INPATH ‘./examples/files/kv2.txt’ OVERWRITE INTO TABLE invites PARTITION (ds=’2008-08-15′);
Copying data from file:/home/hadoop/hadoop-0.19.1/contrib/hive/examples/files/kv2.txt
Loading data to table invites partition {ds=2008-08-15}
OK
Time taken: 0.524 seconds
hive> LOAD DATA LOCAL INPATH ‘./examples/files/kv3.txt’ OVERWRITE INTO TABLE invites PARTITION (ds=’2008-08-08′);
Copying data from file:/home/hadoop/hadoop-0.19.1/contrib/hive/examples/files/kv3.txt
Loading data to table invites partition {ds=2008-08-08}
OK
Time taken: 0.406 seconds
hive> INSERT OVERWRITE DIRECTORY ‘/tmp/hdfs_out’ SELECT a.* FROM invites a;
Total MapReduce jobs = 1
Starting Job = job_200902261245_0002, Tracking URL = http://gp1:50030/jobdetails.jsp?jobid=job_200902261245_0002
Kill Command = /home/hadoop/hadoop-0.19.1/bin/hadoop job -Dmapred.job.tracker=gp1:9001 -kill job_200902261245_0002
map = 0%, reduce =0%
map = 50%, reduce =0%
map = 100%, reduce =0%
Ended Job = job_200902261245_0002
Moving data to: /tmp/hdfs_out
OK
Time taken: 18.551 seconds
hive> select count(1) from pokes;
Total MapReduce jobs = 2
Number of reducers = 1
In order to change numer of reducers use:
set mapred.reduce.tasks = <number>
Starting Job = job_200902261245_0003, Tracking URL = http://gp1:50030/jobdetails.jsp?jobid=job_200902261245_0003
Kill Command = /home/hadoop/hadoop-0.19.1/bin/hadoop job -Dmapred.job.tracker=gp1:9001 -kill job_200902261245_0003
map = 0%, reduce =0%
map = 50%, reduce =0%
map = 100%, reduce =0%
map = 100%, reduce =17%
map = 100%, reduce =100%
Ended Job = job_200902261245_0003
Starting Job = job_200902261245_0004, Tracking URL = http://gp1:50030/jobdetails.jsp?jobid=job_200902261245_0004
Kill Command = /home/hadoop/hadoop-0.19.1/bin/hadoop job -Dmapred.job.tracker=gp1:9001 -kill job_200902261245_0004
map = 0%, reduce =0%
map = 50%, reduce =0%
map = 100%, reduce =0%
map = 100%, reduce =100%
Ended Job = job_200902261245_0004
OK
500
Time taken: 57.285 seconds
hive> INSERT OVERWRITE DIRECTORY ‘/tmp/hdfs_out’ SELECT a.* FROM invites a;
Total MapReduce jobs = 1
Starting Job = job_200902261245_0005, Tracking URL = http://gp1:50030/jobdetails.jsp?jobid=job_200902261245_0005
Kill Command = /home/hadoop/hadoop-0.19.1/bin/hadoop job -Dmapred.job.tracker=gp1:9001 -kill job_200902261245_0005
map = 0%, reduce =0%
map = 50%, reduce =0%
map = 100%, reduce =0%
Ended Job = job_200902261245_0005
Moving data to: /tmp/hdfs_out
OK
Time taken: 18.349 seconds
hive> INSERT OVERWRITE DIRECTORY ‘/tmp/reg_5′ SELECT COUNT(1) FROM invites a;
Total MapReduce jobs = 2
Number of reducers = 1
In order to change numer of reducers use:
set mapred.reduce.tasks = <number>
Starting Job = job_200902261245_0006, Tracking URL = http://gp1:50030/jobdetails.jsp?jobid=job_200902261245_0006
Kill Command = /home/hadoop/hadoop-0.19.1/bin/hadoop job -Dmapred.job.tracker=gp1:9001 -kill job_200902261245_0006
map = 0%, reduce =0%
map = 50%, reduce =0%
map = 100%, reduce =0%
map = 100%, reduce =17%
map = 100%, reduce =100%
Ended Job = job_200902261245_0006
Starting Job = job_200902261245_0007, Tracking URL = http://gp1:50030/jobdetails.jsp?jobid=job_200902261245_0007
Kill Command = /home/hadoop/hadoop-0.19.1/bin/hadoop job -Dmapred.job.tracker=gp1:9001 -kill job_200902261245_0007
map = 0%, reduce =0%
map = 50%, reduce =0%
map = 100%, reduce =0%
map = 100%, reduce =17%
map = 100%, reduce =100%
Ended Job = job_200902261245_0007
Moving data to: /tmp/reg_5
OK
hive JDBC配置
public class HiveJDBC {
public static void main(String[] args) throws Exception{
Class.forName("org.apache.hadoop.hive.jdbc.HiveDriver");
String dropSQL = "drop table table_3";
String createSQL = "create table table_3(key int, value string, a int) row format delimited fields terminated by '\t' stored as textfile";
String insertSQL = "load data local inpath '/home/huangzhongyuan/hadoop-0.20.2/hive-0.7.0-bin/test/test3.txt' overwrite into table table_3";
String querySQL = "select a.* from table_3 a";
//Connection con = DriverManager.getConnection("jdbc:hive://localhost:3306/hive","hive","hive"); //注意这个不行
Connection con = DriverManager.getConnection("jdbc:hive://localhost:10000/default", "", ""); //默认使用端口10000, 使用默认数据库,用户名密码默认
Statement stmt = con.createStatement();
stmt.executeQuery(dropSQL);
stmt.executeQuery(createSQL);
stmt.executeQuery(insertSQL);
ResultSet res = stmt.executeQuery(querySQL);
while(res.next()){
System.out.println(res.getString(1) + " " + res.getString(2) + " " + res.getString(3));
}
}
}
注意这段代码要吧hive的lib下的jar全部都包含进去,另外要包含hadoop-0.20.2-core.jar。
不过如果仅仅这样,会报出错,ms是什么connection refused。因为刚接触hive,所以查了很久也没有查出原因,后来才发现犯了一个很傻的错误,就是忘记启动hive server。启动命令:bin/hive --service hiveserver >/dev/null 2>/dev/null &,(这个具体分别代表什么意思我也不知道,如果有知道的麻烦留言一下,谢谢)用着个命令也可以重启。
当hive server启动出现问题时,可能是由于hadoop和hive中都有libthrift.jar和libfb303.jar他们版本相冲突造成的,删除他们就可以了。
hive建表的几个命令
drop table table_3
create table table_3(key int, value string, a int) row format delimited fields terminated by '\t' stored as textfile
//用制表符作间隔
load data local inpath '/home/huangzhongyuan/hadoop-0.20.2/hive-0.7.0-bin/test/test3.txt' overwrite into table table_3 //把本地的文件写到hive的表中
如果上句中没有local,则是把hdfs中的文件写到hive表中。
由于很多数据在hadoop平台,当从hadoop平台的数据迁移到hive目录下时,由于hive默认的分隔符是/u0001(即ctrl+ a),为了平滑迁移,需要在创建表格时指定数据的分割符号,语法如下:
create table test(uid string,name string)row format delimited fields terminated by '/t';
通过这种方式,完成分隔符的指定。
然后通过hadoop fs -cp或者hadoop distcp 进行文件夹复制。
由于数据复制的成本比较高,时间比较慢,当不需要复制数据的时候,可以直接采取移动的方式将hadoop数据转移到hive,hadoop fs -mv src dest。
一个比较简单的方法是直接创建 external table,语法如下:
create table test(uid string,name string)row format delimited fields terminated by '/t' location 'hdfs';
通过这种方式,避免数据的移动带来时间损耗,提高运行的效率。
将hive select的结果放到本地文件系统中
insert overwrite local directory ‘/tmp/reg_3'
select a.* from event a;
将select的结果放到hdfs文件系统中:
insert overwrite directory '/tmp/hdfs_out' select a.* from invites a where a.ds='<data>';
相关文章推荐
- Linux NAT转发设定
- 软件产品架构师手记
- linux快捷键的使用
- Nginx隐藏版本号
- 开启RHEL7的VNC Server服务及防火墙配置的方法
- Linux笔记(12)——快捷键与命令的别名
- shell script中奇怪的符号之${0##*/}
- openstack配置增加
- linux下批量替换文件内容
- dcoker镜像加速器使用
- SVN工具 Cornerstone 报错:the working copy is locked
- virtualBox中ubuntu安装docker
- linux命令行链接mysql数据库
- linux下crontab定时运行python代码
- linux下mmap详解
- Linux shell脚本 基础知识总结
- linux iptables 基础及相关知识
- Unable to stop activity
- linux字符驱动之异步通知按键驱动
- VMware WorkStation9.0 安装centos-6.4