您的位置：首页 > 运维架构

hadoop学习之HIVE（3.1）：hive建库，建表语句（DDL）

2016-11-09 10:55 399 查看

hive提供了类sql语句来查询hdfs上的数据，这些语句被翻译成mapreduce程序，实现简单的mr程序。

./hive进入命令行后，可使用跟sql语句一样的命令。

一，建库：

1，创建数据库语句：

hive> create database student;

那么，在hdfs上就生成了一个student库，位置在hdfs:///user/hive/warehouse/下（该路径随着创建库而生成），其中每个库都是一个文件夹，库名即文件夹名，文件夹名会加上.db（student.db）以表示是个数据库。

二，建表（hive有各种各样表，一一详解）：

示例1：基本建表格式

有如下形式的学生表students.txt（存放在本地或者hdfs上）

95001,李勇,男,20,CS
95002,刘晨,女,19,IS
95003,王敏,女,22,MA
95004,张立,男,19,IS
95005,刘刚,男,18,MA
95006,孙庆,男,23,CS
95007,易思玲,女,19,MA
95008,李娜,女,18,CS
95009,梦圆圆,女,18,MA
95010,孔小涛,男,19,CS
95011,包小柏,男,18,MA
95012,孙花,女,20,CS
95013,冯伟,男,21,CS
95014,王小丽,女,19,CS
95015,王君,男,18,MA
95016,钱国,男,21,MA
95017,王风娟,女,18,IS
95018,王一,女,19,IS
95019,邢小丽,女,19,IS
95020,赵钱,男,21,IS
95021,周二,男,17,MA
95022,郑明,男,20,MA

1，创建符合该表格式的hive表（数据仓库）:

create table stu(
id int,
name string,
gender string,
age int,
master string
)
row format delimited
fields terminated by ','
stored as textfile;

2，往表（数据仓库）中导入数据;

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)];

3，导完数据就可以像mysql等数据库那样操作表了。

只不过hive会将sql语句翻译成mr程序来执行，同样可通过node1：8088在网页查看任务运行的状态。

###################################################################

示例2：建立外部表

数据同示例1

1，创建符合该外部表格式的hive表（数据仓库）

create external table stu_external(
id int,
name string,
gender string,
age int,
master string
)
row format delimited
fields terminated by ','
stored as textfile
location '/user/hive_external_table';

2，导入数据，同示例1

3，操作

内/外部表的区别：

内部表在删除时，会同时删除存储在hdfs上的真实数据和在mysql中的元数据

外部表再删除时，仅删除在mysql中的元数据，并不会删除在hdfs上建立的表（数据仓库）。

######################################################################

示例3：分区表

有2份表，分别是1，2班的学生名单

students1.txt
1.jimmy,20
2,tim,22
3,jerry,19

students2.txt
1,tom,23
2,angela,19
3,cat,20

1，现创建分区表：

create table stu_partition(
id int,
name string,
age int
)
partitioned by(classId int)
row format delimited
fields terminated by ','
stored as textfile;

2，将这2张表的数据分别导入hive表中

load data local inpath 'students1.txt' into table stu_partition partition(classId=1);
load data local inpath 'students2.txt' into table stu_partition partition(classId=2);

3，hive表中就能将原本的两张表看成一张表来操作，比如：select * from stu_partition;就展示出2张表中的全部数据

4，没有分区的表在hdfs上位置就是在以表名为文件夹名的目录下，而分区表在表名目录下还有分区目录，各hive表存在各自的目录下。

5，用来分区的字段就变成了伪字段，在操作的时候可以拿来当已知字段使用。

####################################################################

示例4：分桶表

有数据如下：

1,jimmy

2,henry

3,tom

4,jerry

5,angela

6,lucy

7,lili

8,lilei

9,hanmeimei

10,timmy

11,jenef

12,alice

13,anna

14,donna

15,ella

16,fiona

17,grace

18,hebe

19,jean

20,joy

21,kelly

22,lydia

23,mary

1,建立分桶表

#要先开启分桶

set hive.enforce.bucketing = true;

#建表

#先建一个普通表，导入数据

create table stu_list(
id int,
name string
)
row format delimited
fields terminated by ','
stored as textfile;

load data local inpath '.../list.txt' into table stu_list;

#再建一个分桶表，将上一个表的查询结果插入到分桶表中。

create table stu_buckets(
id int,
name string
)
clustered by(id) sorted by(id) into 3 buckets
row format delimited
fields terminated by ','
stored as textfile;

2，插入数据

insert overwrite table stu_buckets select * from stu_list;

3，可看到查询出的数据按id的hash值模除分桶数，然后进到不同的桶。

在hdfs上，stu_buckets表目录下，会出现3个文件：

000000_0

000001_0

000002_0

上述3个文件就是“桶”。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航