您的位置:首页 > 其它

YDB on Spark 性能测试

2016-01-01 19:56 302 查看

YDBonSpark性能测试关于ydbonspark的介绍,请阅读该文:http://ycloud.net.cn/newsitem/277227270本文测试的目的是用来比对使用原生Spark与YDBonSpark上的性能差异。如果您感兴趣想要亲自测试,请访问http://ycloud.net.cn获取延云YDB,自行测试。
统计项小范围扫描:命中18万    ydb_day='20151202'andydb_province='辽宁'中等范围扫描:命中1800万    (ydb_province='辽宁')全表扫描:总数据量2亿
分类统计项原生Spark查询耗时YdbonSpark查询耗时比较相差倍数原生Spark查询耗时YdbonSpark查询耗时比较相差倍数原生Spark查询耗时YdbonSpark查询耗时比较相差倍数
count(*)count(*)1320.43930014411441560.062600
Top  n排序低维值列ydb_age排序2763922885576001154
高纬值列phonenum排序2853952764695942523
max,min统计高纬值列phonenum的max,min统计1500.6892171442721861611
group  by低维值列ydb_sex的单列group  by1641.838891441.83872159722
低维值列ydb_province的单列groupby1461.22119150275144818
高维值列usernick的单列groupby1441.0341391621.0345.41442822
高维值列phonenum的单列groupby1502.043731447223847221.8
低维值列ydb_sex,ydb_province的多列groupby与排序1501.31151381.346180266.9
高维值与低维值列usernick,ydb_zhiye的多列groupby与排序177288168722.33367202.14
count(distinct)低维值列ydb_sex的count(distinct)1680.4283921561156132433
低维值列ydb_province的count(distinct15011501381138144272
高维值列usernick的count(distinct)150350162295.51562761.76
高维值列phonenum的count(distinct)15627816821.755169961.93

一、数据以及硬件条件

1.数据条数:2亿

数据生成程序源码放在了ydb发布包的data_example目录下,文件名称为:MrMakeShuData.java           

2.硬件条件

机器为阿里云的虚拟机,非物理机器,购买的机器配置如下图磁盘如下    

3.sparkSql启动

            ./spark-sql--masterspark://10.44.20.175:7077--executor-memory512m2>spark.log 

二、数据表结构

YDB数据表结构: createtableydb_example_shu(       phonenumlong,       usernickstring,       ydb_sexstring,       ydb_provincestring,       ydb_gradestring,       ydb_agestring,       ydb_bloodstring,       ydb_zhiyestring,       ydb_earnstring,       ydb_preferstring,       ydb_consumestring,       ydb_daystring,       contenttextcjk) spark的表结构CREATEexternal tableydb_example_shu_spark(phonenumbigint,usernickstring,ydb_sexstring,ydb_provincestring,ydb_gradestring,ydb_agestring,ydb_bloodstring,ydb_zhiyestring,ydb_earnstring,ydb_preferstring,ydb_consumestring,ydb_daystring,contentstring)rowformatdelimitedfieldsterminatedby','location'/data/ydb/shu';          

三、原生spark与sparkonydb占用存储空间对比【22vs10单位g】

类型耗时备注
原生spark22g
Sparkonydb  
   

四、原生spark与sparkonydb性能对比-小范围扫描(
ydb_day
='
20151202
'
and
ydb_province
='
辽宁
'
,命中18万

1.count(*)结果【132vs0.439快300倍】

类型耗时备注
原生spark132秒selectcount(*)fromydb_example_shu_sparkwhere
ydb_day
='
20151202
'
and
ydb_province
='
辽宁
'
limit10
Sparkonydb439毫秒selectsum(cnt)fromspark_on_ydb_small__countlimit10
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_small__count(cntbigint)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES  ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectcount(*)from ydb_example_shuwhereydbpartion='20151231' and
ydb_day
='
20151202
'
and
ydb_province
='
辽宁
'
andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' limit0,10"); 

2.Topn排序

低维值列ydb_age排序【276vs3快92倍】

类型时间备注
原生spark276秒selectphonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_dayfromydb_example_shu_sparkwhere
ydb_day
='
20151202
'
and
ydb_province
='
辽宁
'
orderbyydb_agedesclimit10;
Sparkonydb3秒selectphonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_dayfromspark_on_ydb_small__orderby_ydb_age orderbyydb_age desclimit10;
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_small__orderby_ydb_age(phonenumbigint,usernickstring,ydb_sexstring,ydb_provincestring,ydb_gradestring,ydb_agestring,ydb_bloodstring,ydb_zhiyestring,ydb_earnstring,ydb_preferstring,ydb_consumestring,ydb_daystring)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectphonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_dayfromydb_example_shu whereydbpartion='20151231' and
ydb_day
='
20151202
'
and
ydb_province
='
辽宁
'
andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' orderbyydb_age desclimit0,10");
    

高纬值列phonenum排序【285vs3快95倍】

类型时间备注
原生spark285秒selectphonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_dayfromydb_example_shu_sparkwhere
ydb_day
='
20151202
'
and
ydb_province
='
辽宁
'
orderbyphonenumdesclimit10;
Sparkonydb3秒selectphonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_dayfromspark_on_ydb_small__orderby_phonenum orderbyphonenumdesclimit10; 
ydb与spark的映射配置 CREATEexternal TABLEspark_on_ydb_small__orderby_phonenum(phonenumbigint,usernickstring,ydb_sexstring,ydb_provincestring,ydb_gradestring,ydb_agestring,ydb_bloodstring,ydb_zhiyestring,ydb_earnstring,ydb_preferstring,ydb_consumestring,ydb_daystring)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectphonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_dayfromydb_example_shu whereydbpartion='20151231'and
ydb_day
='
20151202
'
and
ydb_province
='
辽宁
'
 andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' orderbyphonenumdesclimit0,10");
   

3.高纬值列phonenum的max,min统计【150vs0.689快217倍】

按照ydb_age排序(维值低)
类型时间备注
原生spark150秒selectmax(phonenum),min(phonenum)fromydb_example_shu_sparkwhere
ydb_day
='
20151202
'
and
ydb_province
='
辽宁
'
limit10;
Sparkonydb0.689秒selectmax(phonenum1),min(phonenum2)fromspark_on_ydb_small__statlimit10;
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_small__stat(phonenum1bigint,phonenum2bigint)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectmax(phonenum),min(phonenum)fromydb_example_shu whereydbpartion='20151231'and
ydb_day
='
20151202
'
and
ydb_province
='
辽宁
'
 andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000'limit0,10");
    

4.分类汇总统计groupby

低维值列ydb_sex的单列groupby【164vs1.838快89倍】

类型时间备注
原生spark164秒selectydb_sex,count(*)fromydb_example_shu_sparkwhere
ydb_day
='
20151202
'
and
ydb_province
='
辽宁
'
groupbyydb_sexlimit10;
Sparkonydb1.838秒selectydb_sex,sum(cnt)fromspark_on_ydb_small__groupby_ydb_age groupbyydb_sexlimit10;
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_small__groupby_ydb_age(ydb_sexstring,cntbigint)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectydb_sex,count(*) fromydb_example_shu whereydbpartion='20151231' and 
ydb_day
='
20151202
'
and
ydb_province
='
辽宁
'
andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyydb_sex limit0,10");
 

低维值列ydb_province的单列groupby【146vs1.22快119倍】

类型时间备注
原生spark146秒selectydb_province,count(*)fromydb_example_shu_sparkwhere
ydb_day
='
20151202
'
and
ydb_province
='
辽宁
'
groupbyydb_provincelimit10;
Sparkonydb1.22秒selectydb_province,sum(cnt)fromspark_on_ydb_small__groupby_ydb_ppp groupbyydb_provincelimit10;
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_small__groupby_ydb_ppp(ydb_provincestring,cntbigint)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectydb_province,count(*) fromydb_example_shu whereydbpartion='20151231'and
ydb_day
='
20151202
'
and
ydb_province
='
辽宁
'
 andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyydb_province limit0,10");
    

高维值列usernick的单列groupby【144vs1.034快139倍】

类型时间备注
原生spark144秒selectusernick,count(*)fromydb_example_shu_sparkwhere
ydb_day
='
20151202
'
and
ydb_province
='
辽宁
'
groupbyusernicklimit10;
Sparkonydb1.034秒selectusernick,sum(cnt)fromspark_on_ydb_small__groupby_usernick groupbyusernicklimit10;
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_small__groupby_usernick(usernickstring,cntbigint)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectusernick,count(*) fromydb_example_shu whereydbpartion='20151231' and
ydb_day
='
20151202
'
and
ydb_province
='
辽宁
'
andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyusernicklimit0,10");
      

高维值列phonenum的单列groupby【150vs2.043快73倍】

类型时间备注
原生spark150秒selectphonenum,count(*)fromydb_example_shu_sparkwhere
ydb_day
='
20151202
'
and
ydb_province
='
辽宁
'
groupbyphonenumlimit10;
Sparkonydb2.043秒selectphonenum,sum(cnt)fromspark_on_ydb_small__groupby_phonenum groupbyphonenumlimit10;
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_small__groupby_phonenum(phonenumbigint,cntbigint)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectphonenum,count(*) fromydb_example_shu whereydbpartion='20151231'and
ydb_day
='
20151202
'
and
ydb_province
='
辽宁
'
 andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyphonenum limit0,10");
      

低维值列ydb_sex,ydb_province的多列groupby与排序【150vs1.3快115倍】

类型时间备注
原生spark150秒selectydb_sex,ydb_province,count(*)ascntfromydb_example_shu_sparkwhere
ydb_day
='
20151202
'
and
ydb_province
='
辽宁
'
groupbyydb_sex,ydb_provinceorderbycntdesclimit10;
Sparkonydb1.3秒selectydb_sex,ydb_province,sum(cnt)ascntsumfromspark_on_ydb_small__groupby_ydb_age_sort_m groupbyydb_sex,ydb_provinceorderbycntsumlimit10;
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_small__groupby_ydb_age_sort_m(ydb_sexstring,ydb_provincestring,cntbigint)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectydb_sex,ydb_province,count(*) fromydb_example_shu whereydbpartion='20151231' and
ydb_day
='
20151202
'
and
ydb_province
='
辽宁
'
andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyydb_sex,ydb_province limit0,10");
 

高维值与低维值列usernick,ydb_zhiye的多列groupby与排序【177vs2快88倍】

类型时间备注
原生spark177秒selectusernick,ydb_zhiye,count(*)ascntfromydb_example_shu_sparkwhere
ydb_day
='
20151202
'
and
ydb_province
='
辽宁
'
groupbyusernick,ydb_zhiyeorderbycntdesclimit10;
Sparkonydb2秒selectusernick,ydb_zhiye,sum(cnt)ascntsumafromspark_on_ydb_small__groupby_zhiye_sort_m groupbyusernick,ydb_zhiyeorderbycntsumadesclimit10;
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_small__groupby_zhiye_sort_m(usernickstring,ydb_zhiyestring,cntbigint)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectusernick,ydb_zhiye,count(*) fromydb_example_shu whereydbpartion='20151231' and
ydb_day
='
20151202
'
and
ydb_province
='
辽宁
'
andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyusernick,ydb_zhiye limit0,10");
  

5.count(distinct)排重统计

低维值列ydb_sex的count(distinct)【168vs0.428快392倍】

类型时间备注
原生spark168秒selectcount(distinctydb_sex),count(*)fromydb_example_shu_sparkwhere
ydb_day
='
20151202
'
and
ydb_province
='
辽宁
'
limit10;
Sparkonydb0.428秒selectcount(distinctydb_sex),sum(cnt)fromspark_on_ydb_small__groupby_ydb_age_ddlimit10;
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_small__groupby_ydb_age_dd(ydb_sexstring,cntbigint)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectydb_sex,count(*) fromydb_example_shu whereydbpartion='20151231' and
ydb_day
='
20151202
'
and
ydb_province
='
辽宁
'
andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyydb_sex limit0,10");
   

低维值列ydb_province的count(distinct)【150vs1快150倍】

类型时间备注
原生spark150秒selectcount(distinctydb_province),count(*)fromydb_example_shu_sparkwhere
ydb_day
='
20151202
'
and
ydb_province
='
辽宁
'
limit10;
Sparkonydb1秒selectcount(distinctydb_province),sum(cnt)fromspark_on_ydb_small__groupby_ydb_ppp_ddlimit10;
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_small__groupby_ydb_ppp_dd(ydb_provincestring,cntbigint)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectydb_province,count(*) fromydb_example_shu whereydbpartion='20151231'and
ydb_day
='
20151202
'
and
ydb_province
='
辽宁
'
 andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyydb_province limit0,10");
       

高维值列usernick的count(distinct)【150vs3快50倍】

类型时间备注
原生spark150秒selectcount(distinctusernick),count(*)fromydb_example_shu_sparkwhere
ydb_day
='
20151202
'
and
ydb_province
='
辽宁
'
limit10;
Sparkonydb3sselectcount(distinctusernick),sum(cnt)fromspark_on_ydb_small__groupby_usernick_ddlimit10;
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_small__groupby_usernick_dd(usernickstring,cntbigint)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectusernick,count(*) fromydb_example_shu whereydbpartion='20151231'and
ydb_day
='
20151202
'
and
ydb_province
='
辽宁
'
 andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyusernicklimit0,10");
            

高维值列phonenum的count(distinct)【156vs2快78倍】

类型时间备注
原生spark156秒selectcount(distinctphonenum),count(*)fromydb_example_shu_sparkwhere
ydb_day
='
20151202
'
and
ydb_province
='
辽宁
'
limit10;
Sparkonydb2秒selectcount(distinctphonenum),sum(cnt)fromspark_on_ydb_small__groupby_phonenum_dd limit10;
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_small__groupby_phonenum_dd(phonenumbigint,cntbigint)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectphonenum,count(*) fromydb_example_shu whereydbpartion='20151231' and
ydb_day
='
20151202
'
and
ydb_province
='
辽宁
'
andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyphonenum limit0,10");
        

五、原生spark与sparkonydb性能对比-中等范围扫描(
ydb_province='辽宁'
命中1800万

1.count(*)结果【144vs1快144倍】

类型耗时备注
原生spark144秒selectcount(*)fromydb_example_shu_sparkwhere
ydb_province='辽宁'
limit10
Sparkonydb1秒selectsum(cnt)fromspark_on_ydb_middle__countlimit10
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_middle__count(cntbigint)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES  ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectcount(*)from ydb_example_shuwhereydbpartion='20151231' and
ydb_province='辽宁'
andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' limit0,10"); 
     

2.Topn排序

低维值列ydb_age排序【288vs5快57倍】

类型时间备注
原生spark288秒selectphonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_dayfromydb_example_shu_sparkwhere
ydb_province='辽宁'
orderbyydb_agedesclimit10;
Sparkonydb5秒selectphonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_dayfromspark_on_ydb_middle__orderby_ydb_age orderbyydb_age desclimit10;
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_middle__orderby_ydb_age(phonenumbigint,usernickstring,ydb_sexstring,ydb_provincestring,ydb_gradestring,ydb_agestring,ydb_bloodstring,ydb_zhiyestring,ydb_earnstring,ydb_preferstring,ydb_consumestring,ydb_daystring)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectphonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_dayfromydb_example_shu whereydbpartion='20151231' and
ydb_province='辽宁'
andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' orderbyydb_age desclimit0,10");
    

高纬值列phonenum排序【276vs4快69倍】

类型时间备注
原生spark276秒selectphonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_dayfromydb_example_shu_sparkwhere
ydb_province='辽宁'
orderbyphonenumdesclimit10;
Sparkonydb4秒selectphonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_dayfromspark_on_ydb_middle__orderby_phonenum orderbyphonenumdesclimit10; 
ydb与spark的映射配置 CREATEexternal TABLEspark_on_ydb_middle__orderby_phonenum(phonenumbigint,usernickstring,ydb_sexstring,ydb_provincestring,ydb_gradestring,ydb_agestring,ydb_bloodstring,ydb_zhiyestring,ydb_earnstring,ydb_preferstring,ydb_consumestring,ydb_daystring)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectphonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_dayfromydb_example_shu whereydbpartion='20151231'and
ydb_province='辽宁'
 andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' orderbyphonenumdesclimit0,10");
   

3.高纬值列phonenum的max,min统计【144vs2快72倍】

类型时间备注
原生spark144秒selectmax(phonenum),min(phonenum)fromydb_example_shu_sparkwhere
ydb_province='辽宁'
limit10;
Sparkonydb2秒selectmax(phonenum1),min(phonenum2)fromspark_on_ydb_middle__statlimit10;
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_middle__stat(phonenum1bigint,phonenum2bigint)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectmax(phonenum),min(phonenum)fromydb_example_shu whereydbpartion='20151231'and
ydb_province='辽宁'
 andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000'limit0,10");
    

4.分类汇总统计groupby

低维值列ydb_sex的单列groupby【144vs1.838快72倍】

类型时间备注
原生spark144秒selectydb_sex,count(*)fromydb_example_shu_sparkwhere
ydb_province='辽宁'
groupbyydb_sexlimit10;
Sparkonydb2秒selectydb_sex,sum(cnt)fromspark_on_ydb_middle__groupby_ydb_age groupbyydb_sexlimit10;
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_middle__groupby_ydb_age(ydb_sexstring,cntbigint)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectydb_sex,count(*) fromydb_example_shu whereydbpartion='20151231' and 
ydb_province='辽宁'
andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyydb_sex limit0,10");
        

低维值列ydb_province的单列groupby【150vs2快75倍】

类型时间备注
原生spark150秒selectydb_province,count(*)fromydb_example_shu_sparkwhere
ydb_province='辽宁'
groupbyydb_provincelimit10;
Sparkonydb2秒selectydb_province,sum(cnt)fromspark_on_ydb_middle__groupby_ydb_ppp groupbyydb_provincelimit10;
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_middle__groupby_ydb_ppp(ydb_provincestring,cntbigint)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectydb_province,count(*) fromydb_example_shu whereydbpartion='20151231'and
ydb_province='辽宁'
 andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyydb_province limit0,10");
           

高维值列usernick的单列groupby【162vs1.034快5.4倍】

类型时间备注
原生spark162秒selectusernick,count(*)fromydb_example_shu_sparkwhere
ydb_province='辽宁'
groupbyusernicklimit10;
Sparkonydb30秒selectusernick,sum(cnt)fromspark_on_ydb_middle__groupby_usernick groupbyusernicklimit10;
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_middle__groupby_usernick(usernickstring,cntbigint)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectusernick,count(*) fromydb_example_shu whereydbpartion='20151231' and
ydb_province='辽宁'
andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyusernicklimit0,10");
           

高维值列phonenum的单列groupby【144vs72快2倍】

类型时间备注
原生spark144秒selectphonenum,count(*)fromydb_example_shu_sparkwhere
ydb_province='辽宁'
groupbyphonenumlimit10;
Sparkonydb72秒selectphonenum,sum(cnt)fromspark_on_ydb_middle__groupby_phonenum groupbyphonenumlimit10;
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_middle__groupby_phonenum(phonenumbigint,cntbigint)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectphonenum,count(*) fromydb_example_shu whereydbpartion='20151231'and
ydb_province='辽宁'
 andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyphonenum limit0,10");
           

低维值列ydb_sex,ydb_province的多列groupby与排序【138vs1.3快46倍】

类型时间备注
原生spark138秒selectydb_sex,ydb_province,count(*)ascntfromydb_example_shu_sparkwhere
ydb_province='辽宁'
groupbyydb_sex,ydb_provinceorderbycntdesclimit10;
Sparkonydb3秒selectydb_sex,ydb_province,sum(cnt)ascntsumfromspark_on_ydb_middle__groupby_ydb_age_sort_m groupbyydb_sex,ydb_provinceorderbycntsumlimit10;
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_middle__groupby_ydb_age_sort_m(ydb_sexstring,ydb_provincestring,cntbigint)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectydb_sex,ydb_province,count(*) fromydb_example_shu whereydbpartion='20151231' and
ydb_province='辽宁'
andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyydb_sex,ydb_province limit0,10");
       

高维值与低维值列usernick,ydb_zhiye的多列groupby与排序【168vs72快2.3倍】

类型时间备注
原生spark168秒selectusernick,ydb_zhiye,count(*)ascntfromydb_example_shu_sparkwhere
ydb_province='辽宁'
groupbyusernick,ydb_zhiyeorderbycntdesclimit10;
Sparkonydb72秒selectusernick,ydb_zhiye,sum(cnt)ascntsumafromspark_on_ydb_middle__groupby_zhiye_sort_m groupbyusernick,ydb_zhiyeorderbycntsumadesclimit10;
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_middle__groupby_zhiye_sort_m(usernickstring,ydb_zhiyestring,cntbigint)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectusernick,ydb_zhiye,count(*) fromydb_example_shu whereydbpartion='20151231' and
ydb_province='辽宁'
andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyusernick,ydb_zhiye limit0,10");
     

5.count(distinct)排重统计

低维值列ydb_sex的count(distinct)【156vs1快156倍】

类型时间备注
原生spark156秒selectcount(distinctydb_sex),count(*)fromydb_example_shu_sparkwhere
ydb_province='辽宁'
limit10;
Sparkonydb1秒selectcount(distinctydb_sex),sum(cnt)fromspark_on_ydb_middle__groupby_ydb_age_ddlimit10;
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_middle__groupby_ydb_age_dd(ydb_sexstring,cntbigint)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectydb_sex,count(*) fromydb_example_shu whereydbpartion='20151231' and
ydb_province='辽宁'
andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyydb_sex limit0,10");
        

低维值列ydb_zhiye的count(distinct)【138vs1快138倍】

类型时间备注
原生spark138秒selectcount(distinctydb_zhiye),count(*)fromydb_example_shu_sparkwhere
ydb_province='辽宁'
limit10;
Sparkonydb1秒selectcount(distinctydb_zhiye),sum(cnt)fromspark_on_ydb_middle__groupby_ydb_ppp_ddlimit10;
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_middle__groupby_ydb_ppp_dd(ydb_zhiyestring,cntbigint)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectydb_zhiye,count(*) fromydb_example_shu whereydbpartion='20151231'and
ydb_province='辽宁'
 andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyydb_zhiye limit0,10");
           

高维值列usernick的count(distinct)【162vs29快5.5倍】

类型时间备注
原生spark162秒selectcount(distinctusernick),count(*)fromydb_example_shu_sparkwhere
ydb_province='辽宁'
limit10;
Sparkonydb29秒selectcount(distinctusernick),sum(cnt)fromspark_on_ydb_middle__groupby_usernick_ddlimit10;
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_middle__groupby_usernick_dd(usernickstring,cntbigint)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectusernick,count(*) fromydb_example_shu whereydbpartion='20151231'and
ydb_province='辽宁'
 andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyusernicklimit0,10");
           

高维值列phonenum的count(distinct)【168vs2快1.75倍】

类型时间备注
原生spark168秒selectcount(distinctphonenum),count(*)fromydb_example_shu_sparkwhere
ydb_province='辽宁'
limit10;
Sparkonydb96秒selectcount(distinctphonenum),sum(cnt)fromspark_on_ydb_middle__groupby_phonenum_dd limit10; 
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_middle__groupby_phonenum_dd(phonenumbigint,cntbigint)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectphonenum,count(*) fromydb_example_shu whereydbpartion='20151231' and
ydb_province='辽宁'
andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyphonenum limit0,10");
         

六、原生spark与sparkonydb性能对比-全表扫描

1.count(*)结果【156vs0.06快2600倍】

类型耗时备注
原生spark156秒selectcount(*)fromydb_example_shu_sparklimit10
Sparkonydb60毫秒selectsum(cnt)fromspark_on_ydb_countlimit10
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_count(cntbigint)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES  ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectcount(*)from ydb_example_shuwhereydbpartion='20151231' andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' limit0,10"); 
    

2.Topn排序

低维值列ydb_age排序【600vs11快54倍】

类型时间备注
原生spark600秒selectphonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_dayfromydb_example_shu_sparkorderbyydb_agedesclimit10;
Sparkonydb11秒selectphonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_dayfromspark_on_ydb_orderby_ydb_age orderbyydb_age desclimit10;
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_orderby_ydb_age(phonenumbigint,usernickstring,ydb_sexstring,ydb_provincestring,ydb_gradestring,ydb_agestring,ydb_bloodstring,ydb_zhiyestring,ydb_earnstring,ydb_preferstring,ydb_consumestring,ydb_daystring)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectphonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_dayfromydb_example_shu whereydbpartion='20151231' andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' orderbyydb_age desclimit0,10");
    

高纬值列phonenum排序【594vs25快23倍】

类型时间备注
原生spark594秒selectphonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_dayfromydb_example_shu_sparkorderbyphonenumdesclimit10;
Sparkonydb25秒selectphonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_dayfromspark_on_ydb_orderby_phonenum orderbyphonenumdesclimit10; 
ydb与spark的映射配置 CREATEexternal TABLEspark_on_ydb_orderby_phonenum(phonenumbigint,usernickstring,ydb_sexstring,ydb_provincestring,ydb_gradestring,ydb_agestring,ydb_bloodstring,ydb_zhiyestring,ydb_earnstring,ydb_preferstring,ydb_consumestring,ydb_daystring)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectphonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_dayfromydb_example_shu whereydbpartion='20151231' andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' orderbyphonenumdesclimit0,10");
   

3.高纬值列phonenum的max,min统计【186vs16快11倍】

按照ydb_age排序(维值低)
类型时间备注
原生spark186秒selectmax(phonenum),min(phonenum)fromydb_example_shu_sparklimit10;
Sparkonydb16秒selectmax(phonenum1),min(phonenum2)fromspark_on_ydb_statlimit10;
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_stat(phonenum1bigint,phonenum2bigint)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectmax(phonenum),min(phonenum)fromydb_example_shu whereydbpartion='20151231' andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000'limit0,10");
       

4.分类汇总统计groupby

低维值列ydb_sex的单列groupby【159vs7快22倍】

类型时间备注
原生spark159秒selectydb_sex,count(*)fromydb_example_shu_sparkgroupbyydb_sexlimit10;
Sparkonydb7秒selectydb_sex,sum(cnt)fromspark_on_ydb_groupby_ydb_age groupbyydb_sexlimit10;
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_groupby_ydb_age(ydb_sexstring,cntbigint)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectydb_sex,count(*) fromydb_example_shu whereydbpartion='20151231' andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyydb_sex limit0,10");
  

低维值列ydb_province的单列groupby【144vs8快18倍】

类型时间备注
原生spark144秒selectydb_province,count(*)fromydb_example_shu_sparkgroupbyydb_provincelimit10;
Sparkonydb8秒selectydb_province,sum(cnt)fromspark_on_ydb_groupby_ydb_ppp groupbyydb_provincelimit10;
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_groupby_ydb_ppp(ydb_provincestring,cntbigint)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectydb_province,count(*) fromydb_example_shu whereydbpartion='20151231' andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyydb_province limit0,10");
     

高维值列usernick的单列groupby【144vs282慢2倍】

类型时间备注
原生spark144秒selectusernick,count(*)fromydb_example_shu_sparkgroupbyusernicklimit10;
Sparkonydb282sselectusernick,sum(cnt)fromspark_on_ydb_groupby_usernick groupbyusernicklimit10;
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_groupby_usernick(usernickstring,cntbigint)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectusernick,count(*) fromydb_example_shu whereydbpartion='20151231' andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyusernicklimit0,10");
         

高维值列phonenum的单列groupby【384vs722慢1.8倍】

类型时间备注
原生spark384秒selectphonenum,count(*)fromydb_example_shu_sparkgroupbyphonenumlimit10;
Sparkonydb722秒selectphonenum,sum(cnt)fromspark_on_ydb_groupby_phonenum groupbyphonenumlimit10;
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_groupby_phonenum(phonenumbigint,cntbigint)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectphonenum,count(*) fromydb_example_shu whereydbpartion='20151231' andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyphonenum limit0,10");
            

低维值列ydb_sex,ydb_province的多列groupby与排序【180vs26快6.9倍】

类型时间备注
原生spark180秒selectydb_sex,ydb_province,count(*)ascntfromydb_example_shu_sparkgroupbyydb_sex,ydb_provinceorderbycntdesclimit10;
Sparkonydb26秒selectydb_sex,ydb_province,sum(cnt)ascntsumfromspark_on_ydb_groupby_ydb_age_sort_m groupbyydb_sex,ydb_provinceorderbycntsumlimit10;
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_groupby_ydb_age_sort_m(ydb_sexstring,ydb_provincestring,cntbigint)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectydb_sex,ydb_province,count(*) fromydb_example_shu whereydbpartion='20151231' andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyydb_sex,ydb_province limit0,10");
          

高维值与低维值列usernick,ydb_zhiye的多列groupby与排序【336vs720慢2.14倍】

类型时间备注
原生spark336秒selectusernick,ydb_zhiye,count(*)ascntfromydb_example_shu_sparkgroupbyusernick,ydb_zhiyeorderbycntdesclimit10;
Sparkonydb720秒selectusernick,ydb_zhiye,sum(cnt)ascntsumfromspark_on_ydb_groupby_zhiye_sort_m groupbyusernick,ydb_zhiyeorderbycntsumlimit10;
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_groupby_zhiye_sort_m(usernickstring,ydb_zhiyestring,cntbigint)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectusernick,ydb_zhiye,count(*) fromydb_example_shu whereydbpartion='20151231' andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyusernick,ydb_zhiye limit0,10");
        

5.count(distinct)排重统计

低维值列ydb_sex的count(distinct)【132vs4快33倍】

类型时间备注
原生spark132秒selectcount(distinctydb_sex),count(*)fromydb_example_shu_sparklimit10;
Sparkonydb4秒selectcount(distinctydb_sex),sum(cnt)fromspark_on_ydb_groupby_ydb_age_ddlimit10;
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_groupby_ydb_age_dd(ydb_sexstring,cntbigint)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectydb_sex,count(*) fromydb_example_shu whereydbpartion='20151231' andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyydb_sex limit0,10");
      

低维值列ydb_province的count(distinct)【144vs2快72倍】

类型时间备注
原生spark144秒selectcount(distinctydb_province),count(*)fromydb_example_shu_sparklimit10;
Sparkonydb2秒selectcount(distinctydb_province),sum(cnt)fromspark_on_ydb_groupby_ydb_ppp_ddlimit10;
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_groupby_ydb_ppp_dd(ydb_provincestring,cntbigint)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectydb_province,count(*) fromydb_example_shu whereydbpartion='20151231' andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyydb_province limit0,10");
         

高维值列usernick的count(distinct)【156vs276慢1.76倍】

类型时间备注
原生spark156秒selectcount(distinctusernick),count(*)fromydb_example_shu_sparklimit10;
Sparkonydb276sselectcount(distinctusernick),sum(cnt)fromspark_on_ydb_groupby_usernick_ddlimit10;
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_groupby_usernick_dd(usernickstring,cntbigint)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectusernick,count(*) fromydb_example_shu whereydbpartion='20151231' andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyusernicklimit0,10");
         

高维值列phonenum的count(distinct)【516vs996慢1.93倍】

类型时间备注
原生spark516秒selectcount(distinctphonenum),count(*)fromydb_example_shu_sparklimit10; 
Sparkonydb996秒selectcount(distinctphonenum),sum(cnt)fromspark_on_ydb_groupby_phonenum_dd limit10;
ydb与spark的映射配置CREATEexternal TABLEspark_on_ydb_groupby_phonenum_dd(phonenumbigint,cntbigint)   STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectphonenum,count(*) fromydb_example_shu whereydbpartion='20151231' andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyphonenum limit0,10");
    
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: