您的位置：首页 > 其它

Hive学习笔记

2015-08-16 22:15 211 查看

环境描述：

Hadoop集群版本：hadoop-1.2.1

Hive版本：hive-0.10.0

Hive在使用时只在一个节点上安装即可。

一、Hive安装配置

1.上传hive压缩包（hive-0.10.0-bin.tar.gz）hadoop集群的某个节点服务器，解压安装：

tar -zxvf hive-0.10.0.tar.gz -C /home/suh/

2.修改hive环境配置文件hive-env.sh，增加以下配置，指明hadoop安装路径：（测试好像可以不用指明，也行）

export HADOOP_HOME=/home/suh/hadoop-1.2.1

3.配置hive 使用MySQL数据库保存 metastore

将默认配置文件模板重命名，然后增加相应配置：

cp hive-default.xml.template hive-site.xml

修改hive-site.xml（将<property></property> 对都删除）

添加如下内容：

<property>

<name>javax.jdo.option.ConnectionURL</name>

<value>jdbc:mysql://boss:3306/hive_test?createDatabaseIfNotExist=true</value>

<description>JDBC connect string for a JDBC metastore</description>

</property>

<property>

<name>javax.jdo.option.ConnectionDriverName</name>

<value>com.mysql.jdbc.Driver</value>

<description>Driver class name for a JDBC metastore</description>

</property>

<property>

<name>javax.jdo.option.ConnectionUserName</name>

<value>root</value>

<description>username to use against metastore database</description>

</property>

<property>

<name>javax.jdo.option.ConnectionPassword</name>

<value>123456</value>

<description>password to use against metastore database</description>

</property>

4.以上配置hive完成后，将mysql的连接驱动jar包拷贝到$HIVE_HOME/lib目录下

如果出现没有权限的问题，在mysql授权(在安装mysql的机器上执行)

mysql -uroot -p

#(执行下面的语句 *.*:所有库下的所有表 %：任何IP地址或主机都可以连接)

GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY '123' WITH GRANT OPTION;

FLUSH PRIVILEGES;

注意：把mysql的数据库字符类型改为latin1，否则show table 时候就开始报错。

二、Hive 使用

进入到$HIVE_HOME/bin目录，执行命令：./hive 进入到hive模式，接下里的操作就同mysql类似。

1.建表(默认是内部表)

create table trade_detail(id bigint, account string, income double, expenses double, time string)

row format delimited fields terminated by '\t';

建分区表

create table trade(tradedate string,tradetime string,securityid string,bidpx1 double,bidsize1 string,offerpx1 double,offersize1 string)

partitioned by (trade_date string)

row format delimited fields terminated by ',';

建外部表

create external table td_ext(id bigint, account string, income double, expenses double, time string)

row format delimited fields terminated by '\t'

location '/td_ext';

2、Hive中的三种不同的数据导出方式

（1、导出到本地文件系统：

insert overwrite local directory '/home/suh/hive/trade_01' select * from trade where tradedate='20130726';

PS:这条HQL的执行需要启用Mapreduce完成，运行完这条语句之后，将会在本地文件系统的/home/suh/hive/trade_01目录下生成文件。

这个文件是Reduce产生的结果（这里生成的文件名是000000_0）,数据中的列与列之间的分隔符是^A(ascii码是\00001)。

（2、导出到HDFS中:

insert overwrite directory '/user/trade02' select * from trade where tradedate='20130725';

PS:将会在HDFS的/user/trade02 目录下保存导出来的数据（这里生成的文件名是000000_0），数据中的列与列之间的分隔符是^A(ascii码是\00001)。

和导出文件到本地文件系统的HQL少一个local，数据的存放路径就不一样了。

（3、导出到Hive的另一个表中：

insert into table trade_test partition(trade_date='20130724') select tradedate,tradetime,securityid,bidpx1,bidsize1,offerpx1,offersize1 from trade where tradedate='20130724';

select tradedate,tradetime,securityid,bidpx1,offerpx1 from trade_test where tradedate='20130724';

PS:前提是trade_test已经存在。

（4、导出后续补充学习：

在hive0.11.0版本后新引进了一个新的特性，也就是当用户将hive查询结果输出到文件，用户可以指定使用的列的分隔符，而在之前的版本中是不能指定列之间的分隔符的。

例如：

insert overwrite local directory '/home/suh/hive/trade_01' row format delimited fields terminated by '\t' select * from trade;

还可以用hive的-e和-f参数来导出数据，其中-e表示后面直接带双引号的sql语句；而-f是接一个文件，文件的内容为一个sql语句。如下所示：

执行：./hive -e "select * from trade" >> /home/suh/hive/trade001.txt

或

执行：./hive -f /home/suh/hive/SQL.sql >> /home/suh/hive/trade002.txt

三、实际业务案例操作：

（1、创建交易数据表及临时表：

create table trade(tradedate string,tradetime string,securityid string,bidpx1 double,bidsize1 string,offerpx1 double,offersize1 string)
partitioned by(trade_date string) row format delimited fields terminated by ',';

create table trade_tmp(tradedate string,tradetime string,securityid string,bidpx1 double,bidsize1 string,offerpx1 double,offersize1 string) row format delimited fields terminated by ',';

（2、导入交易数据集文件total.csv到Hive中，用日期做为分区表的分区ID：

由于交易记录文件total.csv里的数据是多个日期的记录，所以先导入到临时表trade_tmp，然后再从临时表中导入到正式的trade 分区表中

导入到临时表trade_tmp：

load data local inpath '/home/suh/hive/total.csv' overwrite into table trade_tmp;

从临时表中导入到正式的trade 分区表：

insert into table trade partition(trade_date='20130724') select tradedate,tradetime,securityid,bidpx1,bidsize1,offerpx1,offersize1 from trade_tmp where tradedate='20130724';

insert into table trade partition(trade_date='20130725') select tradedate,tradetime,securityid,bidpx1,bidsize1,offerpx1,offersize1 from trade_tmp where tradedate='20130725';

insert into table trade partition(trade_date='20130726') select tradedate,tradetime,securityid,bidpx1,bidsize1,offerpx1,offersize1 from trade_tmp where tradedate='20130726';

（3、按securityid分组，分别统计每个产品每日的最高价和最低价：

select tradedate,securityid,max(bidpx1),min(bidpx1),max(offerpx1),min(offerpx1)
from trade group by tradedate , securityid;

（4、按securityid分组，以分钟做为最小单位，求204001的任意1日的每分钟均价：

select tradedate,securityid,***G(bidpx1),***G(offerpx1) from trade where securityid='204001' group by tradedate,securityid;

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航