您的位置：首页 > 其它

Hive数据类型之Structs、Array、Map的使用

2017-11-09 16:08 453 查看

Structs数据类型使用
建表：

drop table if exists xxxxx_struct_test;
create table xxxxx_struct_test(id INT, info struct<name:STRING, age:INT>)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY ':';

说明：
'FIELDS TERMINATED BY' ：字段与字段之间的分隔符

'COLLECTION ITEMS TERMINATED BY' ：一个字段各个item的分隔符

数据文件准备与装载：

[hadoop@emr-worker-10 fileDir]$ cat struct_file.txt
1,zhou:30
2,yan:30
3,chen:20
4,li:80
hive> LOAD DATA LOCAL INPATH '/home/hadoop/nisj/hiveDataType/fileDir/struct_file.txt' INTO TABLE xxxxx_struct_test;
Loading data to table default.xxxxx_struct_test
OK
Time taken: 0.567 seconds

查询：

select info.age from xxxxx_struct_test;
select * from xxxxx_struct_test;

Array数据类型使用
建表：

drop table if exists xxxxx_array_test;
create table xxxxx_array_test(name string, student_id_list array<INT>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY ':';

数据文件准备与装载：

[hadoop@emr-worker-10 fileDir]$ cat array_file.txt
034,1:2:3:4
035,5:6
036,7:8:9:10
hive> LOAD DATA LOCAL INPATH '/home/hadoop/nisj/hiveDataType/fileDir/array_file.txt' INTO TABLE xxxxx_array_test;
Loading data to table default.xxxxx_array_test
OK
Time taken: 0.241 seconds

查询：

select student_id_list[3] from xxxxx_array_test;
select * from xxxxx_array_test;

使用explode及lateral view查询：

select student_id,count(*) from xxxxx_array_test lateral view explode(student_id_list) student_id_list as student_id group by student_id;

collect_set函数：该函数的作用是将某字段的值进行去重汇总，产生Array类型字段。
建表及数据装载：

drop table if exists xxxxx_tabletest;
CREATE TABLE xxxxx_tabletest(
id string,
name string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'field.delim'=',',
'line.delim'='\n',
'serialization.format'=',');
insert into xxxxx_tabletest(id,name)
values
('1','A'),
('1','C'),
('1','B'),
('2','B'),
('2','C'),
('2','D'),
('3','B'),
('3','C'),
('3','D');

查询：

select id,collect_set(name) from xxxxx_tabletest group by id;

Map数据类型使用
建表：

drop table if exists xxxxx_map_test;
create table xxxxx_map_test(id string, perf map<string, int>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ','
MAP KEYS TERMINATED BY ':';

说明：
'MAP KEYS TERMINATED BY' ：key value分隔符

数据文件准备与装载：

[hadoop@emr-worker-10 fileDir]$ cat map_file.txt
1       job:80,team:60,person:70
2       job:60,team:80
3       job:90,team:70,person:100
hive> LOAD DATA LOCAL INPATH '/home/hadoop/nisj/hiveDataType/fileDir/map_file.txt' INTO TABLE xxxxx_map_test;
Loading data to table default.xxxxx_map_test
OK
Time taken: 0.224 seconds

查询：

select perf['person'] from xxxxx_map_test;
select perf['person'] from xxxxx_map_test where perf['person'] is not null;
select * from xxxxx_map_test;

使用explode及lateral view查询：

select explode(perf) as (item_name,item_value) from xxxxx_map_test;
select id,item_name,item_value from xxxxx_map_test lateral view explode(perf) perf as item_name,item_value;

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航