您的位置：首页 > 其它

Hive 数据模型

2018-01-07 23:58 253 查看

Hive 数据模型

Hive 数据表有五种类型：内部表，外部表，分区表，桶表，视图表，默认以 tab 分隔
* MySQL (Oracle) 表默认以逗号分隔，因此，要想导入 MySQL(Oracle) 数据，需要设置分隔符，在建表语句后加：
row format delimited fields terminated by ',';

内部表：相当于 MySQL 中的表，将数据保存到Hive 自己的数据仓库目录中：/usr/hive/warehouse

例子：

create table emp
(empno int,
ename string,
job string,
mgr int,
hiredate string,
sal int,
comm int,
deptno int
);

导入数据到表中：本地、HDFS
load语句、insert语句
load语句相当于ctrl+X

load data inpath '/scott/emp.csv' into table emp; ----> 导入HDFS
load data local inpath '/root/temp/***' into table emp; ----> 导入本地文件

创建表，并且指定分隔符
create table emp1
(empno int,
ename string,
job string,
mgr int,
hiredate string,
sal int,
comm int,
deptno int
)row format delimited fields terminated by ',';

创建部门表，保存部门数据
create table dept
(deptno int,
dname string,
loc string
)row format delimited fields terminated by ',';

load data inpath '/scott/dept.csv' into table dept;
外部表：相对于内部表，数据不在自己的数据仓库中，只保存数据的元信息

例子：
（*）实验的数据
[root@bigdata11 ~]# hdfs dfs -cat /students/student01.txt
1,Tom,23
2,Mary,24
[root@bigdata11 ~]# hdfs dfs -cat /students/student02.txt
3,Mike,26

（*）定义：（1）表结构（2）指向的路径
create external table students_ext
(sid int,sname string,age int)
row format delimited fields terminated by ','
location '/students';

分区表：将数据按照设定的条件分开存储，提高查询效率，分区-----> 目录

例子：

（*）根据员工的部门号建立分区
create table emp_part
(empno int,
ename string,
job string,
mgr int,
hiredate string,
sal int,
comm int
)partitioned by (deptno int)
row format delimited fields terminated by ',';

往分区表中导入数据：指明分区
insert into table emp_part partition(deptno=10) select empno,ename,job,mgr,hiredate,sal,comm from emp1 where deptno=10;
insert into table emp_part partition(deptno=20) select empno,ename,job,mgr,hiredate,sal,comm from emp1 where deptno=20;
insert into table emp_part partition(deptno=30) select empno,ename,job,mgr,hiredate,sal,comm from emp1 where deptno=30;

桶表：本质上也是一种分区表，类似 hash 分区桶 ----> 文件
例子：

创建一个桶表，按照员工的职位job分桶
create table emp_bucket
(empno int,
ename string,
job string,
mgr int,
hiredate string,
sal int,
comm int,
deptno int
)clustered by (job) into 4 buckets
row format delimited fields terminated by ',';

使用桶表，需要打开一个开关
set hive.enforce.bucketing=true;

使用子查询插入数据
insert into emp_bucket select * from emp1;

视图表：视图表是一个虚表，不存储数据，用来简化复杂的查询

例子：

查询部门名称、员工的姓名 create view myview as select dept.dname,emp1.ename from emp1,dept where emp1.deptno=dept.deptno;
select * from myview;

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： Hive

相关文章推荐

新的分享

章节导航