您的位置：首页 > 其它

HIVE-DDL

2015-11-21 23:04 155 查看

Hive QL

可以使用

set mapred.job.tracker

来指定运行hive的集群。

比如说

hive>set mapred.job.tracker=local

来指定在本地运行hive。

设置日志级别：

bin/hive -hiveconf hive.root.logger=INFO,console

进行设置。

Hive 配置文件：

hive.metastore.warehouse.dir

指定hive在HDFS上的存储目录。

Hive表的创建：

create [external] table [if not exists] table_name
[(col_name data_type [comment col_name],...)]
[comment table_name]
#给表添加分区,即可以把表数据存放到不同的分区对于的位置。以便于查找。
[partitioned by (col_name data_type [comment col_comment],col_name data_type [comment col_comment],...)
#桶的操作，将字段放入到桶中，同时可以使用sort来对字段进行排序.
[clustered by (col_name,col_name,...)] [sorted by (col_name,...)] into num_buckets BUCKETS]
[row format row_format]
#rot_format:
delimited [fields terminated by char] :对于字段以某个字符分隔(delimited fields terminated by ',')
[collection items terminated by char] :对于集合的元素之间按照某个字符进行分割(delimited collections items terminated by '.')
[map keys terminated by char] : 对于map集合的key按照某个元素进行分割(delimited map keys terminated by ':')
[stored as file_format]:数据要存储的形式
#file_format:
sequencefile:以压缩的方式进行存储
textfile:以纯文本的方式进行存储
inputformat input_format_classname outputformat output_format_classname:自定义的存储方式。
[location hdfs_path]

#data_type:primitive_type,array_type,map_type
primitive_type:tinyint|smallint|int|bigint|boolean|float|double|string
array_type:array<primitive_type>
map_type:map<primitive_type,primitive_type>
e.g.
create table page_view (uid int comment '这是对列的说明',view_time int,
page_url string,referrer_url string)comment '对表的说明'
partitioned by (language string,country string)
clustered by (uid) sorted by (uid) into 10 buckets
row format delimited fields terminated by ' ' collection items terminated by ','
map keys terminated by ':'
stored as textfile;

修改表语句 :

alert table

1).重命名表:

alert table table_name RENAME TO new_table_name

2).改变列的名字类型注释位置:

alert table table_name change [column] old_column_name new_column column_type [comment col_comment] [first|after col_name]alert table page_view column uid user_id bigint comment 'alter uid' first:first表示放在第一个位置。after col_name表示放在某一列的后面。

3).添加列：

alert table talbe_name add|replace columns (col_name data_type [comment col_comment])

add columns

:允许用户在col_name的末尾、分区列之前加上新的列。

replace columns

:删除col_name列，加入新的列。

4).增加SerDe属性

>alert table table_name  set serde serde_class_name [with serdeproperties serde_properties]

>alert table table_name set serdeproperties serde_properties

#serde_properties:(property_name=property_value,.....)

5).表分区操作:

hive表分区命令:创建分区、增加分区、删除分区

创建分区在create中已经指定，但是只有在具体的增加分区目录才能够使用。

增加分区:

alert table table_name ADD partition_spec [LOCATION 'location1']

partition_spec [LOCATION 'location2']...

partition_spec:

PARTITION(partition_col = partition_col_value,partition_col=parition_col_value]...)

其中location如果不指定的话，就是用在配置文件中配置的位置:

hive.metastore.warehouse.dir

删除分区：

alter table table_name drop partition(partition_col=partition_col_value,...)

删除之后对应的目录也删除了。

6). 修改存储属性：可以修改存储格式，其中如果有分区，则需要使用partition子句。

ALTER TABLE table_name [PARTITION (col_name = col_value)] SET FILEFORMAT TEXTFILE|SEQUENCEFILE

7).

alter table ... archive|unarchive partition

会将分区中的数据打包(解压)（HAR)，

可以减少文件系统中的文件数和减轻namenode的压力，而不会减少存储空间。只能用于独立分区

8).

alter table table_name partition(col_name=col_value,...) enable no_drop|offline

no_drop:改分区不能被删除 | offline:改分区中的数据不能被查询。disable与之相反。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航