您的位置:首页 > 其它

文章标题

2017-09-16 21:32 239 查看

Hive

基于Hadoop的一个数据仓库工具,构建于hadoop的hdfs和mapred之上,用于管理和查询结构化/非结构化数据的数据仓库。可以将结构化的数据文件映射为一张数据库表,并提供类SQL查询功能。


1.建表

create table city(

province_code INT,

province_name string,

city_code INT,

city_name string

)

row FORMAT delimited

fields terminated by ‘,’

lines terminated by ‘\n’;

2.查看表信息

show create table city;

3.查看表内容

select * from city limit 10;

4.最大10个

select * from city order by city_code desc limit 10;

5.省份里市最多的排序。

select province_name,count(city_name) as sum from city group by province_name order by sum limit 10;

6.去重

select count(distinct province_name) from city;

select count(*) from (select province_name from city group by province_name) a;

7.只有一个市的省份

select province_name,sum from (

select province_name,count(*) as sum

from city group by province_name) a where sum=1;

select province_name,count(*) as sum

from city group by province_name having sum=1;

8.内表与外表

外表

create external table
city_ex


(
province_code
int,

province_name
string,

city_code
int,

city_name
string

)

row format delimited fields terminated by ‘,’

lines terminated by ‘\n’

location’/user/hdfs/lyy/city’;

内表:

CREATE TABLE
user
(

uid
INT,

city_code
INT,

model
string,

access
string

)

row FORMAT delimited

fields terminated by ‘,’

lines terminated by ‘\n’;

将文件导入:

load data local inpath ‘/home/bigdata/hive/user.txt’ into table user;

求比例(除以总数):

select sum(if(access=’2G’,1,0))/count(1) from user;

按条件查询:

select

case

when uid % 10 in (0, 1, 2, 3) then ‘0-3’

when uid % 10 in (4, 5, 6, 7) then ‘4-7’

else ‘8-9’

end as interval,

count(*) as cnt

from user

group by

case

when uid % 10 in (0, 1, 2, 3) then ‘0-3’

when uid % 10 in (4, 5, 6, 7) then ‘4-7’

else ‘8-9’

end;

集合:

去重:select collect_set(access) from user;

不去重:select collect_list(access) from user;

表的关联方式:

select user.uid, user.city_code, city.city_name

from

(select * from user where uid <= 100) user

full join

(select * from city where province_code <= 30) city

on (user.city_code = city.city_code)

limit 20;

分组topN;

select access,city_code,uid

from

(

select uid,access,city_code,

row_number()over (partition by access order by city_code desc)as

row_num

from user

)a

where row_num=1;

select p_date,

sum(cnt) over(order by p_date asc rows between unbounded preceding and current row)

from(

select p_date,count(*)as cnt

from user_daily

where p_date between ‘2017-09-01’ and ‘2017-09-30’

group by p_date

)a;

unbounded preceding
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: