Hive 基本命令
2016-11-15 12:24
267 查看
本文简单介绍Hive CLI下面一些常用的HQL语句,如建表,删表,导入数据等等。
在执行这个HQL语句之前先进行Hive CLI,如下。当然了,使用Hive CLI的前提条件是你的环境里面已经安装了Hadoop/Hive相关组件,
1 创建简单表 –CREATE TABLE
2 显示所有表 –SHOW TABLES {regexp(tablename)}
3 显示所有数据库 –SHOW DATABASES
4 查看表结构 –DESCRIBE/DESC tablename
5 修改表 –ALTER TABLE
6 删除表 –DROP TABLE
7 导入数据 –LOAD DATA {LOCAL} INPATH … {OVERWRITE} INTO TABLE tablename
(注:LOCAL表示从本地导入,去掉表示从HDFS导入;OVERWRITE表示覆盖表原有数据,去掉表示 APPEND)
8 查询数据(转换为MapReduce) –SELECT
9 创建表,指定分隔符 –FIELDS TERMINATED BY
10 创建分区表 –PARTITIONED BY
11 新增分区 –ADD PARTITION
12 导入数据到固定分区 –LOAD DATA …PARTITION …
13 创建外部表 –CREATE EXTERNAL TABLE
14 查看表定义 –SHOW CREATE TABLE
在执行这个HQL语句之前先进行Hive CLI,如下。当然了,使用Hive CLI的前提条件是你的环境里面已经安装了Hadoop/Hive相关组件,
[centos@cent-2 ~]$ hive 16/11/15 11:32:11 WARN conf.HiveConf: HiveConf of name hive.optimize.mapjoin.mapreduce does not exist 16/11/15 11:32:11 WARN conf.HiveConf: HiveConf of name hive.heapsize does not exist 16/11/15 11:32:11 WARN conf.HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist 16/11/15 11:32:11 WARN conf.HiveConf: HiveConf of name hive.auto.convert.sortmerge.join.noconditionaltask does not exist Logging initialized using configuration in file:/etc/hive/conf/hive-log4j.properties SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hive/lib/hive-jdbc-0.14.0.2.2.0.0-2041-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] hive>
1 创建简单表 –CREATE TABLE
hive> create table test1(columna int, columnb string); OK Time taken: 0.349 seconds
2 显示所有表 –SHOW TABLES {regexp(tablename)}
hive> show tables; OK test1 ttest1 Time taken: 0.023 seconds, Fetched: 2 row(s) hive> show tables 'tt*'; OK ttest1 Time taken: 0.026 seconds, Fetched: 1 row(s)
3 显示所有数据库 –SHOW DATABASES
hive> show databases; OK default Time taken: 0.015 seconds, Fetched: 1 row(s)
4 查看表结构 –DESCRIBE/DESC tablename
hive> describe test1; OK columna int columnb string Time taken: 1.497 seconds, Fetched: 2 row(s)
5 修改表 –ALTER TABLE
hive> alter table test1 rename to test2; OK Time taken: 0.177 seconds hive> show tables; OK test2 Time taken: 0.045 seconds, Fetched: 1 row(s) hive> alter table test2 add columns(columnc string); OK Time taken: 0.129 seconds hive> desc test2; OK columna int columnb string columnc string Time taken: 0.097 seconds, Fetched: 3 row(s)
6 删除表 –DROP TABLE
hive> drop table test2; OK Time taken: 0.351 seconds hive> show tables; OK Time taken: 0.023 seconds
7 导入数据 –LOAD DATA {LOCAL} INPATH … {OVERWRITE} INTO TABLE tablename
(注:LOCAL表示从本地导入,去掉表示从HDFS导入;OVERWRITE表示覆盖表原有数据,去掉表示 APPEND)
[hdfs@cent-2 ~]$ hadoop fs -cat /user/hive/test.txt 1,'AAA' 2,'BBB' 3,'CCC' hive> load data inpath '/user/hive/test.txt' overwrite into table test1; Loading data to table default.test1 Table default.test1 stats: [numFiles=1, numRows=0, totalSize=24, rawDataSize=0] OK Time taken: 1.74 seconds
8 查询数据(转换为MapReduce) –SELECT
hive> select count(*) from test1; Query ID = hdfs_20161115120101_3de3af75-ca9c-4cec-892b-559f5af6f313 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1479180357223_0001, Tracking URL = http://cent-2.novalocal:8088/proxy/application_1479180357223_0001/ Kill Command = /usr/hdp/2.2.0.0-2041/hadoop/bin/hadoop job -kill job_1479180357223_0001 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2016-11-15 12:01:35,574 Stage-1 map = 0%, reduce = 0% 2016-11-15 12:01:44,187 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.08 sec 2016-11-15 12:01:50,553 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.77 sec MapReduce Total cumulative CPU time: 3 seconds 770 msec Ended Job = job_1479180357223_0001 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.77 sec HDFS Read: 241 HDFS Write: 2 SUCCESS Total MapReduce CPU Time Spent: 3 seconds 770 msec OK 3 Time taken: 34.68 seconds, Fetched: 1 row(s)
9 创建表,指定分隔符 –FIELDS TERMINATED BY
hive> create table test2(columna int, columnb string) > row format delimited fields terminated by ','; OK Time taken: 0.115 seconds
10 创建分区表 –PARTITIONED BY
hive> create table test3(columna int, columnb string) > partitioned by (dt string); OK Time taken: 0.216 seconds hive> describe test3; OK columna int columnb string dt string # Partition Information # col_name data_type comment dt string Time taken: 0.097 seconds, Fetched: 8 row(s)
11 新增分区 –ADD PARTITION
hive> alter table test4 add partition(dt='201601'); OK Time taken: 0.194 seconds hive> alter table test4 add partition(dt='201602'); OK Time taken: 0.09 seconds hive> alter table test4 add partition(dt='201603'); OK Time taken: 0.084 seconds hive> show partitions test4; OK dt=201601 dt=201602 dt=201603 Time taken: 0.096 seconds, Fetched: 3 row(s)
12 导入数据到固定分区 –LOAD DATA …PARTITION …
hive> load data local inpath '/home/hdfs/test.txt' into table test4 partition(dt='201603'); Loading data to table default.test4 partition (dt=201603) Partition default.test4{dt=201603} stats: [numFiles=1, totalSize=40] OK Time taken: 0.879 seconds
13 创建外部表 –CREATE EXTERNAL TABLE
hive> create external table ext_table(columna int, columnb string, columnc string) > row format delimited > fields terminated by ',' > location '/user/hive'; OK Time taken: 0.103 seconds hive> select * from ext_table; OK 1 HHH 201601 2 JJJ 201602 3 KKK 201603 NULL NULL NULL Time taken: 0.055 seconds, Fetched: 4 row(s)
14 查看表定义 –SHOW CREATE TABLE
hive> show create table default.eboxdata; OK CREATE EXTERNAL TABLE `default.eboxdata`( `ctime` string, `mac` string, `addr` int, `title` string, `o_c` smallint, `enable_net_ctrl` smallint, `alarm` int, `model` string, `specification` string, `version` string, `a_a` float, `a_ld` float, `a_t` float, `a_v` float, `a_w` float, `power` float, `mxdw` float, `mxgg` float, `mxgl` float, `mxgw` float, `mxgy` float, `mxld` float, `mxqy` float, `control` smallint, `visibility` smallint) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://n11.trafodion.local:8020/bulkload/EBOXDATA' TBLPROPERTIES ( 'transient_lastDdlTime'='1479781899') Time taken: 0.158 seconds, Fetched: 36 row(s)