您的位置:首页 > 其它

Hive数据类型之Structs、Array、Map的使用

2017-11-09 16:08 453 查看
Structs数据类型使用
建表:
drop table if exists xxxxx_struct_test;
create table xxxxx_struct_test(id INT, info struct<name:STRING, age:INT>)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY ':';
说明:
'FIELDS TERMINATED BY' :字段与字段之间的分隔符

'COLLECTION ITEMS TERMINATED BY' :一个字段各个item的分隔符

数据文件准备与装载:
[hadoop@emr-worker-10 fileDir]$ cat struct_file.txt
1,zhou:30
2,yan:30
3,chen:20
4,li:80
hive> LOAD DATA LOCAL INPATH '/home/hadoop/nisj/hiveDataType/fileDir/struct_file.txt' INTO TABLE xxxxx_struct_test;
Loading data to table default.xxxxx_struct_test
OK
Time taken: 0.567 seconds

查询:
select info.age from xxxxx_struct_test;
select * from xxxxx_struct_test;

Array数据类型使用
建表:
drop table if exists xxxxx_array_test;
create table xxxxx_array_test(name string, student_id_list array<INT>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY ':';

数据文件准备与装载:
[hadoop@emr-worker-10 fileDir]$ cat array_file.txt
034,1:2:3:4
035,5:6
036,7:8:9:10
hive> LOAD DATA LOCAL INPATH '/home/hadoop/nisj/hiveDataType/fileDir/array_file.txt' INTO TABLE xxxxx_array_test;
Loading data to table default.xxxxx_array_test
OK
Time taken: 0.241 seconds

查询:
select student_id_list[3] from xxxxx_array_test;
select * from xxxxx_array_test;

使用explode及lateral view查询:
select student_id,count(*) from xxxxx_array_test lateral view explode(student_id_list) student_id_list as student_id group by student_id;

collect_set函数:该函数的作用是将某字段的值进行去重汇总,产生Array类型字段。
建表及数据装载:
drop table if exists xxxxx_tabletest;
CREATE TABLE xxxxx_tabletest(
id string,
name string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'field.delim'=',',
'line.delim'='\n',
'serialization.format'=',');
insert into xxxxx_tabletest(id,name)
values
('1','A'),
('1','C'),
('1','B'),
('2','B'),
('2','C'),
('2','D'),
('3','B'),
('3','C'),
('3','D');

查询:

select id,collect_set(name) from xxxxx_tabletest group by id;

Map数据类型使用
建表:
drop table if exists xxxxx_map_test;
create table xxxxx_map_test(id string, perf map<string, int>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ','
MAP KEYS TERMINATED BY ':';

说明:
'MAP KEYS TERMINATED BY' :key value分隔符

数据文件准备与装载:
[hadoop@emr-worker-10 fileDir]$ cat map_file.txt
1       job:80,team:60,person:70
2       job:60,team:80
3       job:90,team:70,person:100
hive> LOAD DATA LOCAL INPATH '/home/hadoop/nisj/hiveDataType/fileDir/map_file.txt' INTO TABLE xxxxx_map_test;
Loading data to table default.xxxxx_map_test
OK
Time taken: 0.224 seconds

查询:
select perf['person'] from xxxxx_map_test;
select perf['person'] from xxxxx_map_test where perf['person'] is not null;
select * from xxxxx_map_test;

使用explode及lateral view查询:
select explode(perf) as (item_name,item_value) from xxxxx_map_test;
select id,item_name,item_value from xxxxx_map_test lateral view explode(perf) perf as item_name,item_value;
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: