您的位置:首页 > 其它

hive学习3-DDL语句

2014-11-18 13:57 190 查看
Databases in Hive -- 在HIVE 中使用数据库

Hive offers no support for row-level inserts, updates, and deletes.

Hive doesn’t support transactions. Hive adds ex-tensions to provide

better performance in the context of Hadoop and to integrate with

custom extensions and even external programs.

Hive 不支持行级插入,更新,删除。也不支持事务

创建数据库

[sql]
view plaincopyprint?

hive> CREATE DATABASE financials;

仓库数据库判断数据库是否存在

[sql]
view plaincopyprint?

hive> CREATE DATABASE IF NOT EXISTS financials;

显示现在有的数据库

[sql]
view plaincopyprint?

hive> SHOW DATABASES;
default
financials
hive> CREATE DATABASE human_resources;
hive> SHOW DATABASES;
default
financials
human_resources

条件查询数据库

[sql]
view plaincopyprint?

hive> SHOW DATABASES LIKE 'h.*';
human_resources
hive> ...

创建指定存放文件位置 数据库

[sql]
view plaincopyprint?

hive> CREATE DATABASE financials
> LOCATION '/my/preferred/directory';

创建数据库时 添加注释信息

[sql]
view plaincopyprint?

hive> CREATE DATABASE financials
> COMMENT 'Holds all financial tables';
hive> DESCRIBE DATABASE financials;
financials Holds all financial tables
hdfs://master-server/user/hive/warehouse/financials.db

创建数据库 添加扩展信息

[sql]
view plaincopyprint?

hive> CREATE DATABASE financials
> WITH DBPROPERTIES ('creator' = 'Mark Moneybags', 'date' = '2012-01-02');
hive> DESCRIBE DATABASE financials;
financials hdfs://master-server/user/hive/warehouse/financials.db
hive> DESCRIBE DATABASE EXTENDED financials;
financials hdfs://master-server/user/hive/warehouse/financials.db
{date=2012-01-02, creator=Mark Moneybags);

使用数据库

[sql]
view plaincopyprint?

hive> USE financials;

设置显示当前数据库

[sql]
view plaincopyprint?

hive> set hive.cli.print.current.db=true;
hive (financials)> USE default;
hive (default)> set hive.cli.print.current.db=false;
hive> ...

删除数据库

[sql]
view plaincopyprint?

hive> DROP DATABASE IF EXISTS financials;

当数据库存在表时,先要删除表 再能删除数据库

[sql]
view plaincopyprint?

hive> DROP DATABASE IF EXISTS financials CASCADE;

Alter Database -- 修改数据库

[sql]
view plaincopyprint?

hive> ALTER DATABASE financials SET DBPROPERTIES ('edited-by' = 'Joe Dba');

There is no way to delete or “unset” a DBPROPERTY 没有方法删除或重置 DBPROPERTY

Creating Tables -- 创建表

[sql]
view plaincopyprint?

CREATE TABLE IF NOT EXISTS mydb.employees (
name STRING COMMENT 'Employee name',
salary FLOAT COMMENT 'Employee salary',
subordinates ARRAY<STRING> COMMENT 'Names of subordinates',
deductions MAP<STRING, FLOAT>
COMMENT 'Keys are deductions names, values are percentages
address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
COMMENT 'Home address')
COMMENT 'Description of the table'
TBLPROPERTIES ('creator'='me', 'created_at'='2012-01-02 10:00:00', ...)
LOCATION '/user/hive/warehouse/mydb.db/employees';

创建表-复制表结构

[sql]
view plaincopyprint?

CREATE TABLE IF NOT EXISTS mydb.employees2
LIKE mydb.employees;

显示某个数据库中的表

[sql]
view plaincopyprint?

hive> USE mydb;
hive> SHOW TABLES;
employees
table1
table2

[sql]
view plaincopyprint?

hive> USE default;
hive> SHOW TABLES IN mydb;
employees

显示指定筛选条件 表名

[sql]
view plaincopyprint?

hive> USE mydb;
hive> SHOW TABLES 'empl.*';
employees

显示表扩展信息

[sql]
view plaincopyprint?

hive> DESCRIBE EXTENDED mydb.employees;
name string Employee name
salary float Employee salary
subordinates array<string> Names of subordinates
deductions map<string,float> Keys are deductions names, values are percentages
address struct<street:string,city:string,state:string,zip:int> Home address
Detailed Table Information Table(tableName:employees, dbName:mydb, owner:me,
...
location:hdfs://master-server/user/hive/warehouse/mydb.db/employees,
parameters:{creator=me, created_at='2012-01-02 10:00:00',
last_modified_user=me, last_modified_time=1337544510,
comment:Description of the table, ...}, ...)

指定显示某个字段的信息

[sql]
view plaincopyprint?

hive> DESCRIBE mydb.employees.salary;
salary float Employee salary

External Tables -- 外部表

外部表,删除表不删除数据

[sql]
view plaincopyprint?

CREATE EXTERNAL TABLE IF NOT EXISTS stocks (
exchange STRING,
symbol STRING,
ymd STRING,
price_open FLOAT,
price_high FLOAT,
price_low FLOAT,
price_close FLOAT,
volume INT,
price_adj_close FLOAT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/data/stocks';

复制表结构仓库外部表

[sql]
view plaincopyprint?

CREATE EXTERNAL TABLE IF NOT EXISTS mydb.employees3
LIKE mydb.employees
LOCATION '/path/to/data';

Partitioned, Managed Tables --分区表

[sql]
view plaincopyprint?

CREATE TABLE employees (
name STRING,
salary FLOAT,
subordinates ARRAY<STRING>,
deductions MAP<STRING, FLOAT>,
address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
)
PARTITIONED BY (country STRING, state STRING);

However, Hive will now create subdirectories reflecting the partitioning structure. For

example:

[sql]
view plaincopyprint?

...
.../employees/country=CA/state=AB
.../employees/country=CA/state=BC
...
.../employees/country=US/state=AL
.../employees/country=US/state=AK
...

建议安全措施

把HIVE 设置成“严格”模式,禁止分区表的查询没有

一个WHERE子句

[sql]
view plaincopyprint?

hive> set hive.mapred.mode=strict;
hive> SELECT e.name, e.salary FROM employees e LIMIT 100;
FAILED: Error in semantic analysis: No partition predicate found for
Alias "e" Table "employees"
hive> set hive.mapred.mode=nonstrict;
hive> SELECT e.name, e.salary FROM employees e LIMIT 100;

查看现有分区

[sql]
view plaincopyprint?

hive> SHOW PARTITIONS employees;
...
Country=CA/state=AB
country=CA/state=BC
...
country=US/state=AL
country=US/state=AK

查看分区详细 分区键

[sql]
view plaincopyprint?

hive> SHOW PARTITIONS employees PARTITION(country='US');
country=US/state=AL
country=US/state=AK
...
hive> SHOW PARTITIONS employees PARTITION(country='US', state='AK');
country=US/state=AK

通过 DESC 显示分区键

[sql]
view plaincopyprint?

hive> DESCRIBE EXTENDED employees;
name string,
salary float,
...
address struct<...>,
country string,
state string
Detailed Table Information...
partitionKeys:[FieldSchema(name:country, type:string, comment:null),
FieldSchema(name:state, type:string, comment:null)],
...

从文件读入 分区表

[sql]
view plaincopyprint?

LOAD DATA LOCAL INPATH '${env:HOME}/california-employees'
INTO TABLE employees
PARTITION (country = 'US', state = 'CA');

External Partitioned Tables 外部分区表

1.先创建外部表结构

[sql]
view plaincopyprint?

CREATE EXTERNAL TABLE IF NOT EXISTS log_messages (
hms INT,
severity STRING,
server STRING,
process_id INT,
message STRING)
PARTITIONED BY (year INT, month INT, day INT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

2.为外部表增加指定分区

[sql]
view plaincopyprint?

ALTER TABLE log_messages ADD PARTITION(year = 2012, month = 1, day = 2)
LOCATION 'hdfs://master_server/data/log_messages/2012/01/02';

3.把数据表复制外部表目录结构中

Copy the data for the partition being moved to S3. For example, you can use the

hadoop distcp command:

[sql]
view plaincopyprint?

hadoop distcp /data/log_messages/2011/12/02 s3n://ourbucket/logs/2011/12/02

•Alter the table to point the partition to the S3 location:

[sql]
view plaincopyprint?

ALTER TABLE log_messages PARTITION(year = 2011, month = 12, day = 2)

SET LOCATION 's3n://ourbucket/logs/2011/01/02';

•Remove the HDFS copy of the partition using the hadoop fs -rmr command:

[sql]
view plaincopyprint?

hadoop fs -rmr /data/log_messages/2011/01/02

显示 表分区信息

[sql]
view plaincopyprint?

hive> SHOW PARTITIONS log_messages;
...
year=2011/month=12/day=31
year=2012/month=1/day=1
year=2012/month=1/day=2

[sql]
view plaincopyprint?

hive> DESCRIBE EXTENDED log_messages;
...
message string,
year int,
month int,
day int
Detailed Table Information...
partitionKeys:[FieldSchema(name:year, type:int, comment:null),
FieldSchema(name:month, type:int, comment:null),
FieldSchema(name:day, type:int, comment:null)],
...

[sql]
view plaincopyprint?

hive> DESCRIBE EXTENDED log_messages PARTITION (year=2012, month=1, day=2);
...
location:s3n://ourbucket/logs/2011/01/02,
...

Customizing Table Storage Formats -- 表存储格式

[sql]
view plaincopyprint?

CREATE TABLE employees (
name STRING,
salary FLOAT,
subordinates ARRAY<STRING>,
deductions MAP<STRING, FLOAT>,
address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
COLLECTION ITEMS TERMINATED BY '\002'
MAP KEYS TERMINATED BY '\003'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;

Dropping Tables -- 删除表

[sql]
view plaincopyprint?

DROP TABLE IF EXISTS employees;

For external tables, the metadata is deleted but the data is not.

Alter Table --修改表结构

ALTER TABLE modifies table metadata only. The data for the table is

untouched. It’s up to you to ensure that any modifications are consistent

with the actual data.

Renaming a Table -- 修改表名

[sql]
view plaincopyprint?

ALTER TABLE log_messages RENAME TO logmsgs;

Adding, Modifying, and Dropping a Table Partition -- 增加,修改,删除 表分区

[sql]
view plaincopyprint?

ALTER TABLE log_messages ADD IF NOT EXISTS
PARTITION (year = 2011, month = 1, day = 1) LOCATION '/logs/2011/01/01'
PARTITION (year = 2011, month = 1, day = 2) LOCATION '/logs/2011/01/02'
PARTITION (year = 2011, month = 1, day = 3) LOCATION '/logs/2011/01/03'

[sql]
view plaincopyprint?

ALTER TABLE log_messages PARTITION(year = 2011, month = 12, day = 2)
SET LOCATION 's3n://ourbucket/logs/2011/01/02';

[sql]
view plaincopyprint?

ALTER TABLE log_messages DROP IF EXISTS PARTITION(year = 2011, month = 12, day = 2);

Changing Columns --修改列

[sql]
view plaincopyprint?

ALTER TABLE log_messages
CHANGE COLUMN hms hours_minutes_seconds INT
COMMENT 'The hours, minutes, and seconds part of the timestamp'
AFTER severity;

Adding Columns --增加列

[sql]
view plaincopyprint?

ALTER TABLE log_messages ADD COLUMNS (
app_name STRING COMMENT 'Application name',
session_id LONG COMMENT 'The current session id');

Deleting or Replacing Columns --删除 替换列

[sql]
view plaincopyprint?

ALTER TABLE log_messages REPLACE COLUMNS (
hours_mins_secs INT COMMENT 'hour, minute, seconds from timestamp',
severity STRING COMMENT 'The message severity'
message STRING COMMENT 'The rest of the message');

This statement effectively renames the original hms column and removes the server and

process_id columns from the original schema definition. As for all ALTER statements,

only the table metadata is changed.

Alter Table Properties --修改表属性

[sql]
view plaincopyprint?

ALTER TABLE log_messages SET TBLPROPERTIES (
'notes' = 'The process id is no longer captured; this column is always NULL');

Alter Storage Properties --修改存储属性

[sql]
view plaincopyprint?

ALTER TABLE log_messages
PARTITION(year = 2012, month = 1, day = 1)
SET FILEFORMAT SEQUENCEFILE;

You can specify a new SerDe along with SerDe properties or change the properties for

the existing SerDe. The following example specifies that a table will use a Java class

named com.example.JSONSerDe to process a file of JSON-encoded records

[sql]
view plaincopyprint?

ALTER TABLE table_using_JSON_storage
SET SERDE 'com.example.JSONSerDe'
WITH SERDEPROPERTIES (
'prop1' = 'value1',
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: