您的位置：首页 > 运维架构

hive语法尝试及结论

2014-09-17 17:45 162 查看

特别注意：要谨慎使用overwrite关键字，特别是它与目录结合的时候，路径不要搞错了，否则目录下的文件直接被覆盖了

hive> insert into area_t values('1','1','1',now(),'1','1',2,2);

NoViableAltException(26@[])

结论：不支持此种用法

hive> insert into table area_t select areacode,areaname,'1',gxrq,parentcode,bz,flags,flags1 from area limit 15;

结论：追加方式

hive> insert overwrite table area_t select areacode,areaname,'1',gxrq,parentcode,bz,flags,flags1 from area limit 15;

结论：覆盖的方式

hive> insert overwrite directory '/user/lifeng' select * from area;

结论：不能用into、目录要用引号包含

hive> from area

> insert into table area_t select areacode,areaname,'1',gxrq,parentcode,bz,flags,flags1 limit 10;

结论：基本模式的用法

hive> from area

> insert into table area_t select areacode,areaname,'1',gxrq,parentcode,bz,flags,flags1 limit 10

> insert into table area_t select areacode,areaname,'1',gxrq,parentcode,bz,flags,flags1 order by areacode desc limit 15;

FAILED: SemanticException [Error 10087]: The same output cannot be present multiple times: area_t
结论：使用多插入模式时，不能插入相同的表

所有查询都不会显示列头(即字段名)

hive> select [all] parentcode from area limit 20;

结论：查询所有记录

hive> select all parentcode from area order by parentcode limit 20;

结论：排序后再选取前面的20条记录，order by 全局排序，只有一个Reduce任务

hive> select all parentcode from area sort by parentcode limit 20;

结论：sort by会起两个job进行处理，花费的时间更久，只在本机做排序

hive> select distinct parentcode from area order by parentcode limit 20;

结论：排序后、去重后选择前面的20条记录

hive> select yxaccno from area a,area_t b where a.areacode = b.yxaccno;

结论：无结果产生，不能用此等值连接方式

hive> select b.yxaccno from area a right join area_t b on a.areacode = b.yxaccno;

结论：列出area_t的所有数据

hive> select a.areacode,b.yxaccno from area_t b left join area a on a.areacode = b.yxaccno;

结论：列出area_t的所有数据

hive> select a.areacode,b.yxaccno from area_t b inner join area a on a.areacode = b.yxaccno;

结论：无结果产生，不能用此等值连接方式

hive> select a.areacode,b.yxaccno from area_t b full join area a on a.areacode = b.yxaccno;

结论：列出area和area_t中的所有数据

hive> select a.areacode,b.yxaccno from area_t b join area a on a.areacode = b.yxaccno;

结论：无结果产生，不能用此等值连接方式

hive> select parentcode,count(1),sum(sons) from area group by parentcode;

结论：产生统计信息

hive> show functions;

结论：产生所有系统函数

hive> describe function substr;

结论：显示系统函数的具体用法

hive> show databases;

结论：显示所有的数据库

hive> use dw_testing;

结论：使用dw_testing库

hive> show tables;

结论：显示该库下的所有的表

hive> show tables '*t';

结论：显示以't'结尾的表。'_'这个不能任意匹配单个字符，只能代表它本身

hive> desc area_t;

结论：查看表结构

hive> alter table area_t add columns(create_time date comment '创建时间');

结论：添加字段并注释

hive> alter table area_t rename to area_new;

结论：表重命名

hive> select areacode from area where gxrq > '0' limit 2;

hive> select areacode from area where gxrq is not null limit 2;

hive> select areacode from area where gxrq = '2008/9/23 14:10:09' limit 2;

结论：对于时间的比较上不能使用上面的两种方式

hive> select areacode from area where areacode = '7777580' limit 2;

结论：也查不出数据，要去空格才可以查询到结果：select areacode from area where trim(areacode) = '7777580' limit 2;

hive> insert into table area_t select '1','1','1','2014-09-17','2','2',3,3,'2014-09-16' from area limit 1;

hive> select * from area_t;

结论：日期类型的数据以字符串的格式插入是可以的，自动调用cast进行转换

hive> alter table area_t replace columns(create_time date);

结论：

1.删除表中的除了create_time的字段。一定需要字段名+字段类型，否则会报错

2.hdfs中文件内容并没有删除，只是删除了元数据而已

hive> dfs -cat /hive/warehouse/dw_testing.db/area_t/*;

结论：查看文件内容

hive> alter table area_t change create_time update_time date;

结论：修改字段名

hive> alter table area_t add columns(name varchar(30),age int);

结论：添加多个字段

hive> alter table area_t change id cid int first;

结论：将id改名为cid并放在首列

hive> alter table area_t change username name varchar(30) after cid;

结论：将username改名为name后并紧随cid列排放。一定要更改字段的相关信息才能搭配迁移位置

hive> DESCRIBE EXTENDED area;

结论：查看外部表字段名及元数据信息，内部表值显示字段名信息

hive> desc area_t;

结论：不管内部还是外部表都显示其表结构信息(字段名、类型、长度、注释)

hive> drop table area_t;

结论：内部表会删除元数据和数据文件；外部表只会删除元数据但不删除数据文件

create table AREA_T

(

yxaccno VARCHAR(20),

yxaccname VARCHAR(100),

dbckm CHAR(1),

gxrq DATE,

yhid VARCHAR(10),

bz VARCHAR(200),

areatb int,

levels int

) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ESCAPED BY '\\' STORED AS TEXTFILE;

结论：建表时一定要加上分隔符

hive> load data local inpath '/usr/local/wonhigh/hivearea_t_file.txt' into table area_t;

结论：将本地数据追加进内部表

hive> load data local inpath '/usr/local/wonhigh/hivearea_t_file.txt' overwrite into table area_t;
结论：将本地数据覆盖所有内部表的数据

hive> create table area_t_cp like area_t;

结论：内部表的表结构复制

hive> create table area_cp like area;

结论：外部表的表结构复制

hive> select colthno,count(*) from mid where trim(colthno) in('BYW34U09DJ1BM4','BYW35G03DU1BM4','BYW35N32DP1BL4','BYW35N32DU1BL4') group by colthno having trim(colthno) = 'BYW34U09DJ1BM4';

结论：0.13版本支持in、支持having操作

hive> select * from mid b where b.colthno in(select a.colthno from mid_cp a where a.price > 0 limit 10) limit 20;

结论：in中支持子查询

hive> select b.* from mid b where exists (select 1 from mid_cp a where a.colthno = b.colthno) limit 20;

结论：支持exists

hive> select a.colthno,a.price from (select colthno,sum(round(price,2)) price from mid where trim(colthno) in('BYW34U09DJ1BM4','BYW35G03DU1BM4','BYW35N32DP1BL4','BYW35N32DU1BL4') group by colthno having colthno <> 'BYW34U09DJ1BM4') a order by a.colthno;

结论：支持嵌套查询

hive> drop database dc_retail_mdm cascade;

结论：当dc_retail_mdm中拥有表时，需要加上cascade才能删除

小结：

1.在创建表时一定要加上数据分隔符

2.平面文件中的表头不能保留表头(字段列)

3.平面文件以UTF8格式保存，防止乱码

4.对于日期的使用，date只能表示日期且在平面文件中的格式必须是yyyy-MM-dd；timestamp表示日期+时间，格式可以有两种形式：YYYY-MM-DD HH:MM:SS或YYYY-MM-DD
HH:MM:SS.fffffffff

5.在使用overwrite时一定要谨慎路径问题

6.平面文件分隔符尽量不要用'\t'，可能字段值本身就有空格

7.load数据的时候，如果字符超过字段容忍的长度，会自动从第一位开始截取，如果类型不匹配且转换不了的就直接赋予null

8.hive中所有的查询语句在展示列表是都不会显示列头(字段名)

9.在删除表中某些字段后，hdfs上的文件没有实质上的变换

10.删除内部表时，hdfs上的文件也会随之删除

11.删除外部表时，hdfs上的文件不会被删除

12.在做外部表与文件进行关联时要先在hdfs上创建存储目录，然后上传文件，最后建表关联

13.在字符串操作时，一定得注意字符串的空格情况，不然查询不出想要的记录

14.待续(分区)

学习文档：https://cwiki.apache.org/confluence/display/Hive/LanguageManual

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： hadoop hive

相关文章推荐

新的分享

章节导航