您的位置：首页 > 其它

hive的基本操作（重点）

2016-11-06 15:47 274 查看

hive的基本表操作

1.创建管理表

create table [if not exists] db01.student(

id int,

name string,

age int,

...

)

row format delimited fields terminated by '\t';

2.加载数据

load data [local] inpath 'filepath' [overwrite] into table tableName;

3.创建外部表

create external table [if not exists] db01.student(

id int,

name string,

age int,

...

)

row format delimited fields terminated by '\t'

location 'hdfspath';

注意:location后面跟的是hdfs的目录,不能够填文件名.

4.分区表

   select * from tableName where id=?;

   分区表的优势就是

   1.查询的速度更快

   2.管理文件的结构更合理

日志文件

   20161025

   20161026

   1.每天加载日志文件到hive表

   2.每天需要统计日志中的有用信息(pv uv ip)

   3.将日志文件以日期的形式来分区

   4.统计有效信息的范围可以通过分区来指定

需求:将学生表按照省份province来划分分区

一级分区表:

create table if not exists db01.student_par(

id int,

name string,

age int

)

partitioned by (province string)

row format delimited fields terminated by '\t';

在创建表的时候,只是指定了分区的字段,并未指定分区的范围,

分区的范围在加载数据的时候进行指定

语法：

LOAD DATA [LOCAL] INPATH 'filepath'

[OVERWRITE] INTO TABLE tablename

[PARTITION (partcol1=val1, partcol2=val2 ...)]

load data local inpath '/opt/data/stu.txt'

overwrite into table db01.student_par

partition (province='jiangsu');

load data local inpath '/opt/data/stu2.txt'

overwrite into table db01.student_par

partition (province='zhejiang');

查看分区信息:

   show partitions tableName

查询分区信息

   select * from student_par where province='jiangsu';

二级分区表创建

create table student_par2(

id int,

name string,

age int

)

partitioned by (province string, city string)

row format delimited fields terminated by '\t';

二级分区表的加载

load data local inpath '/opt/data/stu.txt'

overwrite into table student_par2

partition (province='jiangsu',city='xuzhou');

load data local inpath '/opt/data/stu2.txt'

overwrite into table student_par2

partition (province='shandong',city='jinan');

load data local inpath '/opt/data/stu3.txt'

overwrite into table student_par2

partition (province='America',city='Los');

load data local inpath '/opt/data/stu4.txt'

overwrite into table student_par2

partition (province='America',city='NewYork');

查询二级分区表

   select * from student_par2

   where province='America' and city='NewYork'

==========================================

创建表，直接上传数据

create table if not exists db01.student2(

id int,

name string,

age int

)

row format delimited fields terminated by '\t';

hdfs dfs -put stu.txt /user/hive/warehouse/db01.db/student2

在创建之后，直接上传文件也可以完成加载数据

创建一个分区表：

create table student_par3(

id int,

name string,

age int

)

partitioned by (province string, city string)

row format delimited fields terminated by '\t';

创建对应目录：

hdfs dfs -mkdir -p /user/hive/warehouse/db01.db/student_par3/province=America/city=Los;

hdfs dfs -mkdir -p /user/hive/warehouse/db01.db/student_par3/province=America/city=NewYork

存放数据：

hdfs dfs -put stu3.txt /user/hive/warehouse/db01.db/student_par3/province=America/city=Los;

hdfs dfs -put stu4.txt /user/hive/warehouse/db01.db/student_par3/province=America/city=NewYork

添加分区信息(add partition)

alter table student_par3 add partition (province='America',city='Los');

alter table student_par3 add partition (province='America',city='NewYork');

==========================================================================

外部分区表

create external table student_ext_par(

id int,

name string,

age int

)

partitioned by (province string, city string)

row format delimited fields terminated by '\t';

将数据上传到 hdfs

hdfs dfs -mkdir -p /nicole/input/student_ext_par/province=America/city=Los;

hdfs dfs -mkdir -p /nicole/input/student_ext_par/province=America/city=NewYork

hdfs dfs -put stu3.txt /nicole/input/student_ext_par/province=America/city=Los;

hdfs dfs -put stu4.txt /nicole/input/student_ext_par/province=America/city=NewYork

关联数据

alter table student_ext_par add partition (province='America',city='Los') location '/nicole/input/student_ext_par/province=America/city=Los';

alter table student_ext_par add partition (province='America',city='NewYork') location '/nicole/input/student_ext_par/province=America/city=NewYork';

alter table student_ext_par add partition (province='America',city='Miami') location '/nicole/input/student_ext_par/test1';

alter table student_ext_par add partition (province='America',city='Las') location '/nicole/input/student_ext_par/test2';

虽然外部表的分区信息与实际关联的hdfs的路径信息并没有一个必须的对应关系，

但是仍然建议对应分区来创建hdfs的路径信息，好处便于管理。

删除分区的命令   drop partition

alter table student_ext_par drop partition (province='America',city='Las')

======================================================
创建加载 hive表
第一种方式：

create table s1 (

id int,

name string

)

row format delimited fields terminated by '\t';

加载数据

load data [local] inpath 'path' [overwrite] into table s1;

第二种方式:

create table student_like like student;

加载数据

load data [local] inpath 'path' [overwrite] into table student_like;

第三种方式：
create table student_as as select * from student;
直接创建并加载数据

这种方式经常用于创建一个临时表

举例，创建emp临时表

create table emp_as as select empno as no,empname as name, empjob as job from emp;

第四种方式：
insert语句插入表数据之前必须创建表

create table emp_insert(

no int,

name string,

job string

)

row format delimited fields terminated by '\t';

insert into table tableName           追加

insert overwrite table tableName   覆盖

insert into table emp_insert select empsalary,empname,empjob from emp;

========================================================

往hive表中导入数据的几种方式

第一种方式：从本地到hive

   load data local inpath 'path/file' [overwrite] into table 表名称 ;

第二种方式：从hdfs到hive

   load data inpath 'path/file' into table 表名称 ;

第三种方式：创建表的时候使用as直接加载数据

   create table db_01.emp_as as select * from emp ;

第四种方式：使用insert命令加载

   insert into table 表名 select * from emp

   insert overwrite table 表名 select * from emp

第五种方式：创建表的时候通过location指定

   create table 表名(...)

   partioned by

   row format ..

   location "" ;

从hive导出表的几种方式：

第一种：往本地导出

   insert overwrite local directory 'localpath' 查询语句;

举例：

insert overwrite local directory "/opt/data/hive" select * from emp;

insert overwrite local directory '/opt/data/hive/aaa' row format delimited fields terminated by '\t' select * from emp;
注意：

1.在用这种方式导出数据的时候，必须切换目录才能看到文件

2.文件000000_0会覆盖目录中的所有其他文件

3.如果不适用row format语句指定分隔符,会使用默认分隔符分隔字段

4.指定目录使用单引号、双引号都可以

第二种：往hdfs导出

insert overwrite directory "/nicole/input/emp/temp" select * from emp;

存在的问题：

   在0.13.1版本中不支持直接导出到hdfs可以指定分隔符

第三种：

$ bin/hive -e "select * from db01.emp" > /opt/data/hive/emp.txt

第四种：

   使用sqoop来导出





练习：

drop table if exists emp;

create table emp(

empno int,

empname string,

empjob string,

managerno int,

empdate string,

empsalary double,

empreward double,

deptno int

)

row format delimited fields terminated by '\t';

load data local inpath '/opt/data/emp.txt' overwrite into table emp;

emp表、dept表

1.求出每个部门的最高薪资

   select max(empsalary),deptno from emp group by deptno;

2.求出每个部门的最高薪资,部门名称

select

max(e.empsalary) salary,e.deptno,d.deptname

from emp as e

join

dept as d

on

e.deptno=d.deptno

group by

e.deptno,d.deptname;

3.显示部门名称,部门最高薪资,部门所在的城市

select

max(e.empsalary) salary,e.deptno,d.deptname,d.deptcity

from emp as e

join

dept as d

on

e.deptno=d.deptno

group by

e.deptno,d.deptname,d.deptcity;

4.显示部门名称,部门最高薪资,且薪资必须大于等于3000的

select

max(e.empsalary) salary,e.deptno,d.deptname,d.deptcity

from emp as e

join

dept as d

on

e.deptno=d.deptno

group by

e.deptno,d.deptname,d.deptcity

having salary >= 3000;

5.从绩效工资小于10000的员工中，按部门查看最高基本工资，

显示部门名称以及基本工资大于等于3000的

select

max(e.empsalary) salary,e.deptno,d.deptname,d.deptcity

from emp as e

join

dept as d

on e.deptno=d.deptno

where e.empreward < 10000

group by

e.deptno,d.deptname,d.deptcity

having salary >= 3000;

join

   select * from emp join dept;       --笛卡尔积m*n



inner join   内连接

left join   左连接

   select * from emp left join dept on emp.deptno=dept.deptno;

right join   右连接

   select * from emp right join dept on emp.deptno=dept.deptno;

==========================================================================
explain

   解析执行计划

语法：explain 查询语句

   explain select * from emp right join dept on emp.deptno=dept.deptno;

常用的函数：

hive (db01)> show functions;             查看所有函数

hive (db01)> desc function max;         查看函数描述

hive (db01)> desc function extended sum;查看函数详细描述

concat 连接字符串函数

hive (db01)> select concat(empname,empjob) from emp;

hive (db01)> select concat(empname,"_",empjob) from emp;

substr 截取字符串函数

hive (db01)> select substr(empdate,1,4) from emp;

   1:代表从第一位开始取

   4:代表取长度为4的string

时间相关函数

day

mouth

year

hour

   hive (db01)> select hour("2010-10-10 10:11:12");

minute

second

hive (db01)> select year(empdate),month(empdate),day(empdate),hour(empdate) from emp;

_c0     _c1     _c2     _c3

1980    12      17      NULL

1981    2       20      NULL

有就能取出来，没有就返回NULL

Synonyms: dayofmonth

date is a string in the format of 'yyyy-MM-dd HH:mm:ss' or 'yyyy-MM-dd'.

unix_timestamp函数

   将时间转换为自从1970年1月1日至今的秒数

hive (db01)> select unix_timestamp("2016-10-26 15:23:30");

   1477466610

from_unixtime函数

   将unix时间转换为日期时间(格式2016-10-26 15:23:30)

hive (db01)> select from_unixtime(1477466610)

   2016-10-26 15:23:30

cast函数

   从日志上获取到了ms值：1477466610456ms

   cast(1477466610456/1000 as int)



case when

语法：

case

when 条件 then 返回值

when 条件 then 返回值

...

else 返回值

end

举例：

select empname,

case

when empreward>=10000 then "rich"

when empreward<10000 and empreward >=5000 then "just so so"

else "pool"

end

from emp;

hiveserver2    基于thrift软件架构的服务器

修改hive-site.xml配置文件



<property>

<name>hive.server2.thrift.port</name>

<value>10000</value>

</property>



<property>

<name>hive.server2.thrift.bind.host</name>

<value>nicole02.com.cn</value>

</property>



<property>

<name>hive.server2.long.polling.timeout</name>

<value>5000</value>

</property>



启动hiveserver2

$ bin/hive --service hiveserver2   #启动服务器

$ bin/beeline       #启动客户端

beeline> help       #查看命令

   > !connect jdbc:hive2://192.168.234.150:10000

                   # 连接数据库

Enter username for jdbc:hive2://192.168.234.150:10000: nicole

Enter password for jdbc:hive2://192.168.234.150:10000: *****

   #注:输入的用户名和密码是linux使用的用户名密码

   #如果不输入也可以进入，但是没有权限



   #进入之后,显示如下:

0: jdbc:hive2://192.168.234.150:10000>

   #输入的命令与hive无异



2.2.2Drop Partitions

ALTER TABLE table_name DROP partition_spec, partition_spec,...

ALTER TABLE c02_clickstat_fatdt1 DROP PARTITION (dt='20101202');

2.2.3Rename Table

ALTER TABLE table_name RENAME TO new_table_name

这个命令可以让用户为表更名。数据所在的位置和分区名并不改变。换而言之，老的表名并未“释放”，对老表的更改会改变新表的数据。

2.2.4Change Column

ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name column_type [COMMENT col_comment] [FIRST|AFTER column_name]

这个命令可以允许改变列名、数据类型、注释、列位置或者它们的任意组合

Eg:

2.2.5Add/Replace Columns

ALTER TABLE table_name ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...)

ADD是代表新增一字段，字段位置在所有列后面(partition列前);REPLACE则是表示替换表中所有字段。

Eg:

hive> desc xi;

OK

id      int

cont    string

dw_ins_date     string

Time taken: 0.061 seconds

hive> create table xibak like xi;

OK

Time taken: 0.157 seconds

hive> alter table xibak replace columns (ins_date string);

OK

Time taken: 0.109 seconds

hive> desc xibak;

OK

ins_date        string

2.3Create View

CREATE VIEW [IF NOT EXISTS] view_name [ (column_name [COMMENT column_comment], ...) ]

[COMMENT view_comment]

[TBLPROPERTIES (property_name = property_value, ...)][？？？？？？]

AS SELECT ...

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： hive 应用操作

相关文章推荐

新的分享

章节导航