分组求topn
2018-01-05 02:23
113 查看
遇见一个难题:分组求组内topn,由于mysql没有row_number()查阅了许些资料,mark。
需求:取出每个班级的前两名同学(允许并列)
创建表
但在hive中有row_number()函数,举个例子:
需求:按照时间维度,比如,统计一天内各小时产生最多pvs的来源topN
row_number()从1开始,为每一条分组记录返回一个数字
row_number() over (order by col desc) 是先把col列降序,再为降序以后的每条col记录返回一个序号。
而此函数计算的值就表示每组内部排序后的顺序编号(组内连续的唯一的)
需求:取出每个班级的前两名同学(允许并列)
创建表
create table tmp( id int not null auto_increment, sname varchar(10), cname varchar(10), score int, PRIMARY key(id) )engine=InnoDB CHARSET=UTF8;插入数据
insert into tmp values(null,'AAAA','C1',67); insert into tmp values(null,'BBBB','C1',55); insert into tmp values(null,'CCCC','C1',67); insert into tmp values(null,'DDDD','C1',65); insert into tmp values(null,'EEEE','C1',95); insert into tmp values(null,'FFFF','C2',57); insert into tmp values(null,'GGGG','C2',87); insert into tmp values(null,'HHHH','C2',74); insert into tmp values(null,'IIII','C2',52); insert into tmp values(null,'JJJJ','C2',81); insert into tmp values(null,'KKKK','C2',67); insert into tmp values(null,'LLLL','C2',66); insert into tmp values(null,'MMMM','C2',63); insert into tmp values(null,'NNNN','C3',99); insert into tmp values(null,'OOOO','C3',50); insert into tmp values(null,'PPPP','C3',59); insert into tmp values(null,'QQQQ','C3',66); insert into tmp values(null,'RRRR','C3',76); insert into tmp values(null,'SSSS','C3',50); insert into tmp values(null,'TTTT','C3',50); insert into tmp values(null,'UUUU','C3',64); insert into tmp values(null,'VVVV','C3',74);方法一:
select a.id,a.sname,a.cname,a.score from tmp a left join tmp b on a.cname=b.cname and a.score<b.score group by a.id,a.sname,a.cname,a.score having count(b.id)<2 order by a.cname,a.score desc方法二:
select * from tmp a where 2>(select count(*) from tmp where cname=a.cname and score>a.score) order by a.cname,a.score desc如果是求top1
select * from tmp a where not exists (select 1 from tmp where cname=a.cname and score>a.score); select a.* from tmp a inner join (select cname, max(score) as score from tmp group by cname) b on a.cname=b.cname and a.score=b.score select * from (select * from tmp order by score desc) t group by cname
但在hive中有row_number()函数,举个例子:
需求:按照时间维度,比如,统计一天内各小时产生最多pvs的来源topN
insert into table zs.dw_pvs_refhost_topn_h partition(datestr='2016-03-18') select t.hour,t.od,t.ref_host,t.ref_host_cnts from (select ref_host,ref_host_cnts,concat(month,day,hour) as hour, row_number() over (partition by concat(month,day,hour) order by ref_host_cnts desc) as od from zs.dw_ref_host_visit_cnts_h) t where od<=3;看一下内层表实现:对每个小时内的来访host次数倒序排序标号
select ref_host,ref_host_cnts,concat(month,hour,day), row_number() over (partition by concat(month,hour,day) order by ref_host_cnts desc) as od from dw_ref_host_visit_cnts_h结果
+-------------------------+----------------+----------+-----+--+ | ref_host | ref_host_cnts | c2 | od | +-------------------------+----------------+----------+-----+--+ | blog.fens.me | 111 | 09 0019 | 1 | | www.fens.me | 13 | 09 0019 | 2 | | h2w.iask.cn | 6 | 09 0019 | 3 | | angularjs.cn | 3 | 09 0019 | 4 | | www.google.com.hk | 3 | 09 0019 | 5 | | www.leonarding.com | 1 | 09 0019 | 6 | | cnodejs.org | 1 | 09 0019 | 7 | | www.itpub.net | 1 | 09 0019 | 8 | | blog.fens.me | 89 | 09 0119 | 1 | | cos.name | 3 | 09 0119 | 2 | | www.google.com.tw | 2 | 09 0119 | 3 | | www.angularjs.cn | 2 | 09 0119 | 4 | | mp.weixin.qq.com | 1 | 09 0119 | 5 | | h2w.iask.cn | 1 | 09 0119 | 6 | | cnodejs.org | 1 | 09 0119 | 7 | | weibo.com | 1 | 09 0119 | 8 | | www.google.com.hk | 1 | 09 0119 | 9 | | blog.fens.me | 311 | 09 0219 | 1 | | www.google.com | 3 | 09 0219 | 2 | | www.google.com.hk | 3 | 09 0219 | 3 |
row_number()从1开始,为每一条分组记录返回一个数字
row_number() over (order by col desc) 是先把col列降序,再为降序以后的每条col记录返回一个序号。
col row_num 999 1 888 2 777 3 666 4row_number() over (partition by col1 order by col2) 表示根据col1分组,在分组内部根据col22排序,
而此函数计算的值就表示每组内部排序后的顺序编号(组内连续的唯一的)
相关文章推荐
- Hive TopN+分组TopN
- sparksql分组后topN(JAVA)
- mysql 分组topN
- spark中实现分组取topN
- sparksql分组后topN(JAVA)
- sparksql分组后topN(JAVA)
- Spark:求出分组内的TopN
- mysql分组取topn
- sparksql分组后topN(JAVA)
- sparksql分组后topN(JAVA)
- Spark Scala 分组排序取TopN
- sparksql分组后topN(JAVA)
- sparksql分组后topN(JAVA)
- Spark Java 分组排序取TopN
- sparksql分组后topN(JAVA)
- hive 分组topN
- Spark核心编程-分组取topN
- Spark Scala TopN分组排序
- sparksql分组后topN(JAVA)
- 使用dataframe解决spark TopN问题:分组、排序、取TopN和join相关问题