YDB on Spark 性能测试
2016-01-01 19:56
169 查看
YDBonSpark性能测试关于ydbonspark的介绍,请阅读该文:http://ycloud.net.cn/newsitem/277227270本文测试的目的是用来比对使用原生Spark与YDBonSpark上的性能差异。如果您感兴趣想要亲自测试,请访问http://ycloud.net.cn获取延云YDB,自行测试。
磁盘如下
四、原生spark与sparkonydb性能对比-小范围扫描(
五、原生spark与sparkonydb性能对比-中等范围扫描(
统计项 | 小范围扫描:命中18万 ydb_day='20151202'andydb_province='辽宁' | 中等范围扫描:命中1800万 (ydb_province='辽宁') | 全表扫描:总数据量2亿 | ||||||||||
分类 | 统计项 | 原生Spark查询耗时 | YdbonSpark查询耗时 | 比较 | 相差倍数 | 原生Spark查询耗时 | YdbonSpark查询耗时 | 比较 | 相差倍数 | 原生Spark查询耗时 | YdbonSpark查询耗时 | 比较 | 相差倍数 |
count(*) | count(*) | 132 | 0.439 | 快 | 300 | 144 | 1 | 快 | 144 | 156 | 0.06 | 快 | 2600 |
Top n排序 | 低维值列ydb_age排序 | 276 | 3 | 快 | 92 | 288 | 5 | 快 | 57 | 600 | 11 | 快 | 54 |
高纬值列phonenum排序 | 285 | 3 | 快 | 95 | 276 | 4 | 快 | 69 | 594 | 25 | 快 | 23 | |
max,min统计 | 高纬值列phonenum的max,min统计 | 150 | 0.689 | 快 | 217 | 144 | 2 | 快 | 72 | 186 | 16 | 快 | 11 |
group by | 低维值列ydb_sex的单列group by | 164 | 1.838 | 快 | 89 | 144 | 1.838 | 快 | 72 | 159 | 7 | 快 | 22 |
低维值列ydb_province的单列groupby | 146 | 1.22 | 快 | 119 | 150 | 2 | 快 | 75 | 144 | 8 | 快 | 18 | |
高维值列usernick的单列groupby | 144 | 1.034 | 快 | 139 | 162 | 1.034 | 快 | 5.4 | 144 | 282 | 慢 | 2 | |
高维值列phonenum的单列groupby | 150 | 2.043 | 快 | 73 | 144 | 72 | 快 | 2 | 384 | 722 | 慢 | 1.8 | |
低维值列ydb_sex,ydb_province的多列groupby与排序 | 150 | 1.3 | 快 | 115 | 138 | 1.3 | 快 | 46 | 180 | 26 | 快 | 6.9 | |
高维值与低维值列usernick,ydb_zhiye的多列groupby与排序 | 177 | 2 | 快 | 88 | 168 | 72 | 快 | 2.3 | 336 | 720 | 慢 | 2.14 | |
count(distinct) | 低维值列ydb_sex的count(distinct) | 168 | 0.428 | 快 | 392 | 156 | 1 | 快 | 156 | 132 | 4 | 快 | 33 |
低维值列ydb_province的count(distinct | 150 | 1 | 快 | 150 | 138 | 1 | 快 | 138 | 144 | 2 | 快 | 72 | |
高维值列usernick的count(distinct) | 150 | 3 | 快 | 50 | 162 | 29 | 快 | 5.5 | 156 | 276 | 慢 | 1.76 | |
高维值列phonenum的count(distinct) | 156 | 2 | 快 | 78 | 168 | 2 | 快 | 1.75 | 516 | 996 | 慢 | 1.93 |
一、数据以及硬件条件
1.数据条数:2亿
数据生成程序源码放在了ydb发布包的data_example目录下,文件名称为:MrMakeShuData.java2.硬件条件
机器为阿里云的虚拟机,非物理机器,购买的机器配置如下图3.sparkSql启动
./spark-sql--masterspark://10.44.20.175:7077--executor-memory512m2>spark.log二、数据表结构
YDB数据表结构: createtableydb_example_shu( phonenumlong, usernickstring, ydb_sexstring, ydb_provincestring, ydb_gradestring, ydb_agestring, ydb_bloodstring, ydb_zhiyestring, ydb_earnstring, ydb_preferstring, ydb_consumestring, ydb_daystring, contenttextcjk) spark的表结构CREATEexternal tableydb_example_shu_spark(phonenumbigint,usernickstring,ydb_sexstring,ydb_provincestring,ydb_gradestring,ydb_agestring,ydb_bloodstring,ydb_zhiyestring,ydb_earnstring,ydb_preferstring,ydb_consumestring,ydb_daystring,contentstring)rowformatdelimitedfieldsterminatedby','location'/data/ydb/shu';三、原生spark与sparkonydb占用存储空间对比【22vs10单位g】
类型 | 耗时 | 备注 |
原生spark | 22g | |
Sparkonydb |
四、原生spark与sparkonydb性能对比-小范围扫描(ydb_day
='
20151202'
and
ydb_province='
辽宁
'
,命中18万
)
1.count(*)结果【132vs0.439快300倍】
类型 | 耗时 | 备注 |
原生spark | 132秒 | selectcount(*)fromydb_example_shu_sparkwhereydb_day ='20151202 ' andydb_province =' 辽宁 'limit10 |
Sparkonydb | 439毫秒 | selectsum(cnt)fromspark_on_ydb_small__countlimit10 |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_small__count(cntbigint) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectcount(*)from ydb_example_shuwhereydbpartion='20151231' andydb_day ='20151202 ' andydb_province =' 辽宁 'andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' limit0,10"); |
2.Topn排序
低维值列ydb_age排序【276vs3快92倍】
类型 | 时间 | 备注 |
原生spark | 276秒 | selectphonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_dayfromydb_example_shu_sparkwhereydb_day ='20151202 ' andydb_province =' 辽宁 'orderbyydb_agedesclimit10; |
Sparkonydb | 3秒 | selectphonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_dayfromspark_on_ydb_small__orderby_ydb_age orderbyydb_age desclimit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_small__orderby_ydb_age(phonenumbigint,usernickstring,ydb_sexstring,ydb_provincestring,ydb_gradestring,ydb_agestring,ydb_bloodstring,ydb_zhiyestring,ydb_earnstring,ydb_preferstring,ydb_consumestring,ydb_daystring) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectphonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_dayfromydb_example_shu whereydbpartion='20151231' andydb_day ='20151202 ' andydb_province =' 辽宁 'andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' orderbyydb_age desclimit0,10"); |
高纬值列phonenum排序【285vs3快95倍】
类型 | 时间 | 备注 |
原生spark | 285秒 | selectphonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_dayfromydb_example_shu_sparkwhereydb_day ='20151202 ' andydb_province =' 辽宁 'orderbyphonenumdesclimit10; |
Sparkonydb | 3秒 | selectphonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_dayfromspark_on_ydb_small__orderby_phonenum orderbyphonenumdesclimit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_small__orderby_phonenum(phonenumbigint,usernickstring,ydb_sexstring,ydb_provincestring,ydb_gradestring,ydb_agestring,ydb_bloodstring,ydb_zhiyestring,ydb_earnstring,ydb_preferstring,ydb_consumestring,ydb_daystring) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectphonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_dayfromydb_example_shu whereydbpartion='20151231'andydb_day ='20151202 ' andydb_province =' 辽宁 'andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' orderbyphonenumdesclimit0,10"); |
3.高纬值列phonenum的max,min统计【150vs0.689快217倍】
按照ydb_age排序(维值低)类型 | 时间 | 备注 |
原生spark | 150秒 | selectmax(phonenum),min(phonenum)fromydb_example_shu_sparkwhereydb_day ='20151202 ' andydb_province =' 辽宁 'limit10; |
Sparkonydb | 0.689秒 | selectmax(phonenum1),min(phonenum2)fromspark_on_ydb_small__statlimit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_small__stat(phonenum1bigint,phonenum2bigint) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectmax(phonenum),min(phonenum)fromydb_example_shu whereydbpartion='20151231'andydb_day ='20151202 ' andydb_province =' 辽宁 'andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000'limit0,10"); |
4.分类汇总统计groupby
低维值列ydb_sex的单列groupby【164vs1.838快89倍】
类型 | 时间 | 备注 |
原生spark | 164秒 | selectydb_sex,count(*)fromydb_example_shu_sparkwhereydb_day ='20151202 ' andydb_province =' 辽宁 'groupbyydb_sexlimit10; |
Sparkonydb | 1.838秒 | selectydb_sex,sum(cnt)fromspark_on_ydb_small__groupby_ydb_age groupbyydb_sexlimit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_small__groupby_ydb_age(ydb_sexstring,cntbigint) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectydb_sex,count(*) fromydb_example_shu whereydbpartion='20151231' and ydb_day ='20151202 ' andydb_province =' 辽宁 'andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyydb_sex limit0,10"); |
低维值列ydb_province的单列groupby【146vs1.22快119倍】
类型 | 时间 | 备注 |
原生spark | 146秒 | selectydb_province,count(*)fromydb_example_shu_sparkwhereydb_day ='20151202 ' andydb_province =' 辽宁 'groupbyydb_provincelimit10; |
Sparkonydb | 1.22秒 | selectydb_province,sum(cnt)fromspark_on_ydb_small__groupby_ydb_ppp groupbyydb_provincelimit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_small__groupby_ydb_ppp(ydb_provincestring,cntbigint) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectydb_province,count(*) fromydb_example_shu whereydbpartion='20151231'andydb_day ='20151202 ' andydb_province =' 辽宁 'andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyydb_province limit0,10"); |
高维值列usernick的单列groupby【144vs1.034快139倍】
类型 | 时间 | 备注 |
原生spark | 144秒 | selectusernick,count(*)fromydb_example_shu_sparkwhereydb_day ='20151202 ' andydb_province =' 辽宁 'groupbyusernicklimit10; |
Sparkonydb | 1.034秒 | selectusernick,sum(cnt)fromspark_on_ydb_small__groupby_usernick groupbyusernicklimit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_small__groupby_usernick(usernickstring,cntbigint) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectusernick,count(*) fromydb_example_shu whereydbpartion='20151231' andydb_day ='20151202 ' andydb_province =' 辽宁 'andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyusernicklimit0,10"); |
高维值列phonenum的单列groupby【150vs2.043快73倍】
类型 | 时间 | 备注 |
原生spark | 150秒 | selectphonenum,count(*)fromydb_example_shu_sparkwhereydb_day ='20151202 ' andydb_province =' 辽宁 'groupbyphonenumlimit10; |
Sparkonydb | 2.043秒 | selectphonenum,sum(cnt)fromspark_on_ydb_small__groupby_phonenum groupbyphonenumlimit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_small__groupby_phonenum(phonenumbigint,cntbigint) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectphonenum,count(*) fromydb_example_shu whereydbpartion='20151231'andydb_day ='20151202 ' andydb_province =' 辽宁 'andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyphonenum limit0,10"); |
低维值列ydb_sex,ydb_province的多列groupby与排序【150vs1.3快115倍】
类型 | 时间 | 备注 |
原生spark | 150秒 | selectydb_sex,ydb_province,count(*)ascntfromydb_example_shu_sparkwhereydb_day ='20151202 ' andydb_province =' 辽宁 'groupbyydb_sex,ydb_provinceorderbycntdesclimit10; |
Sparkonydb | 1.3秒 | selectydb_sex,ydb_province,sum(cnt)ascntsumfromspark_on_ydb_small__groupby_ydb_age_sort_m groupbyydb_sex,ydb_provinceorderbycntsumlimit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_small__groupby_ydb_age_sort_m(ydb_sexstring,ydb_provincestring,cntbigint) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectydb_sex,ydb_province,count(*) fromydb_example_shu whereydbpartion='20151231' andydb_day ='20151202 ' andydb_province =' 辽宁 'andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyydb_sex,ydb_province limit0,10"); |
高维值与低维值列usernick,ydb_zhiye的多列groupby与排序【177vs2快88倍】
类型 | 时间 | 备注 |
原生spark | 177秒 | selectusernick,ydb_zhiye,count(*)ascntfromydb_example_shu_sparkwhereydb_day ='20151202 ' andydb_province =' 辽宁 'groupbyusernick,ydb_zhiyeorderbycntdesclimit10; |
Sparkonydb | 2秒 | selectusernick,ydb_zhiye,sum(cnt)ascntsumafromspark_on_ydb_small__groupby_zhiye_sort_m groupbyusernick,ydb_zhiyeorderbycntsumadesclimit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_small__groupby_zhiye_sort_m(usernickstring,ydb_zhiyestring,cntbigint) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectusernick,ydb_zhiye,count(*) fromydb_example_shu whereydbpartion='20151231' andydb_day ='20151202 ' andydb_province =' 辽宁 'andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyusernick,ydb_zhiye limit0,10"); |
5.count(distinct)排重统计
低维值列ydb_sex的count(distinct)【168vs0.428快392倍】
类型 | 时间 | 备注 |
原生spark | 168秒 | selectcount(distinctydb_sex),count(*)fromydb_example_shu_sparkwhereydb_day ='20151202 ' andydb_province =' 辽宁 'limit10; |
Sparkonydb | 0.428秒 | selectcount(distinctydb_sex),sum(cnt)fromspark_on_ydb_small__groupby_ydb_age_ddlimit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_small__groupby_ydb_age_dd(ydb_sexstring,cntbigint) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectydb_sex,count(*) fromydb_example_shu whereydbpartion='20151231' andydb_day ='20151202 ' andydb_province =' 辽宁 'andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyydb_sex limit0,10"); |
低维值列ydb_province的count(distinct)【150vs1快150倍】
类型 | 时间 | 备注 |
原生spark | 150秒 | selectcount(distinctydb_province),count(*)fromydb_example_shu_sparkwhereydb_day ='20151202 ' andydb_province =' 辽宁 'limit10; |
Sparkonydb | 1秒 | selectcount(distinctydb_province),sum(cnt)fromspark_on_ydb_small__groupby_ydb_ppp_ddlimit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_small__groupby_ydb_ppp_dd(ydb_provincestring,cntbigint) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectydb_province,count(*) fromydb_example_shu whereydbpartion='20151231'andydb_day ='20151202 ' andydb_province =' 辽宁 'andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyydb_province limit0,10"); |
高维值列usernick的count(distinct)【150vs3快50倍】
类型 | 时间 | 备注 |
原生spark | 150秒 | selectcount(distinctusernick),count(*)fromydb_example_shu_sparkwhereydb_day ='20151202 ' andydb_province =' 辽宁 'limit10; |
Sparkonydb | 3s | selectcount(distinctusernick),sum(cnt)fromspark_on_ydb_small__groupby_usernick_ddlimit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_small__groupby_usernick_dd(usernickstring,cntbigint) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectusernick,count(*) fromydb_example_shu whereydbpartion='20151231'andydb_day ='20151202 ' andydb_province =' 辽宁 'andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyusernicklimit0,10"); |
高维值列phonenum的count(distinct)【156vs2快78倍】
类型 | 时间 | 备注 |
原生spark | 156秒 | selectcount(distinctphonenum),count(*)fromydb_example_shu_sparkwhereydb_day ='20151202 ' andydb_province =' 辽宁 'limit10; |
Sparkonydb | 2秒 | selectcount(distinctphonenum),sum(cnt)fromspark_on_ydb_small__groupby_phonenum_dd limit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_small__groupby_phonenum_dd(phonenumbigint,cntbigint) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectphonenum,count(*) fromydb_example_shu whereydbpartion='20151231' andydb_day ='20151202 ' andydb_province =' 辽宁 'andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyphonenum limit0,10"); |
五、原生spark与sparkonydb性能对比-中等范围扫描(ydb_province='辽宁'
命中1800万
)
1.count(*)结果【144vs1快144倍】
类型 | 耗时 | 备注 |
原生spark | 144秒 | selectcount(*)fromydb_example_shu_sparkwhereydb_province='辽宁'limit10 |
Sparkonydb | 1秒 | selectsum(cnt)fromspark_on_ydb_middle__countlimit10 |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_middle__count(cntbigint) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectcount(*)from ydb_example_shuwhereydbpartion='20151231' andydb_province='辽宁'andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' limit0,10"); |
2.Topn排序
低维值列ydb_age排序【288vs5快57倍】
类型 | 时间 | 备注 |
原生spark | 288秒 | selectphonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_dayfromydb_example_shu_sparkwhereydb_province='辽宁'orderbyydb_agedesclimit10; |
Sparkonydb | 5秒 | selectphonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_dayfromspark_on_ydb_middle__orderby_ydb_age orderbyydb_age desclimit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_middle__orderby_ydb_age(phonenumbigint,usernickstring,ydb_sexstring,ydb_provincestring,ydb_gradestring,ydb_agestring,ydb_bloodstring,ydb_zhiyestring,ydb_earnstring,ydb_preferstring,ydb_consumestring,ydb_daystring) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectphonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_dayfromydb_example_shu whereydbpartion='20151231' andydb_province='辽宁'andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' orderbyydb_age desclimit0,10"); |
高纬值列phonenum排序【276vs4快69倍】
类型 | 时间 | 备注 |
原生spark | 276秒 | selectphonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_dayfromydb_example_shu_sparkwhereydb_province='辽宁'orderbyphonenumdesclimit10; |
Sparkonydb | 4秒 | selectphonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_dayfromspark_on_ydb_middle__orderby_phonenum orderbyphonenumdesclimit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_middle__orderby_phonenum(phonenumbigint,usernickstring,ydb_sexstring,ydb_provincestring,ydb_gradestring,ydb_agestring,ydb_bloodstring,ydb_zhiyestring,ydb_earnstring,ydb_preferstring,ydb_consumestring,ydb_daystring) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectphonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_dayfromydb_example_shu whereydbpartion='20151231'andydb_province='辽宁'andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' orderbyphonenumdesclimit0,10"); |
3.高纬值列phonenum的max,min统计【144vs2快72倍】
类型 | 时间 | 备注 |
原生spark | 144秒 | selectmax(phonenum),min(phonenum)fromydb_example_shu_sparkwhereydb_province='辽宁'limit10; |
Sparkonydb | 2秒 | selectmax(phonenum1),min(phonenum2)fromspark_on_ydb_middle__statlimit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_middle__stat(phonenum1bigint,phonenum2bigint) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectmax(phonenum),min(phonenum)fromydb_example_shu whereydbpartion='20151231'andydb_province='辽宁'andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000'limit0,10"); |
4.分类汇总统计groupby
低维值列ydb_sex的单列groupby【144vs1.838快72倍】
类型 | 时间 | 备注 |
原生spark | 144秒 | selectydb_sex,count(*)fromydb_example_shu_sparkwhereydb_province='辽宁'groupbyydb_sexlimit10; |
Sparkonydb | 2秒 | selectydb_sex,sum(cnt)fromspark_on_ydb_middle__groupby_ydb_age groupbyydb_sexlimit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_middle__groupby_ydb_age(ydb_sexstring,cntbigint) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectydb_sex,count(*) fromydb_example_shu whereydbpartion='20151231' and ydb_province='辽宁'andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyydb_sex limit0,10"); |
低维值列ydb_province的单列groupby【150vs2快75倍】
类型 | 时间 | 备注 |
原生spark | 150秒 | selectydb_province,count(*)fromydb_example_shu_sparkwhereydb_province='辽宁'groupbyydb_provincelimit10; |
Sparkonydb | 2秒 | selectydb_province,sum(cnt)fromspark_on_ydb_middle__groupby_ydb_ppp groupbyydb_provincelimit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_middle__groupby_ydb_ppp(ydb_provincestring,cntbigint) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectydb_province,count(*) fromydb_example_shu whereydbpartion='20151231'andydb_province='辽宁'andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyydb_province limit0,10"); |
高维值列usernick的单列groupby【162vs1.034快5.4倍】
类型 | 时间 | 备注 |
原生spark | 162秒 | selectusernick,count(*)fromydb_example_shu_sparkwhereydb_province='辽宁'groupbyusernicklimit10; |
Sparkonydb | 30秒 | selectusernick,sum(cnt)fromspark_on_ydb_middle__groupby_usernick groupbyusernicklimit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_middle__groupby_usernick(usernickstring,cntbigint) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectusernick,count(*) fromydb_example_shu whereydbpartion='20151231' andydb_province='辽宁'andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyusernicklimit0,10"); |
高维值列phonenum的单列groupby【144vs72快2倍】
类型 | 时间 | 备注 |
原生spark | 144秒 | selectphonenum,count(*)fromydb_example_shu_sparkwhereydb_province='辽宁'groupbyphonenumlimit10; |
Sparkonydb | 72秒 | selectphonenum,sum(cnt)fromspark_on_ydb_middle__groupby_phonenum groupbyphonenumlimit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_middle__groupby_phonenum(phonenumbigint,cntbigint) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectphonenum,count(*) fromydb_example_shu whereydbpartion='20151231'andydb_province='辽宁'andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyphonenum limit0,10"); |
低维值列ydb_sex,ydb_province的多列groupby与排序【138vs1.3快46倍】
类型 | 时间 | 备注 |
原生spark | 138秒 | selectydb_sex,ydb_province,count(*)ascntfromydb_example_shu_sparkwhereydb_province='辽宁'groupbyydb_sex,ydb_provinceorderbycntdesclimit10; |
Sparkonydb | 3秒 | selectydb_sex,ydb_province,sum(cnt)ascntsumfromspark_on_ydb_middle__groupby_ydb_age_sort_m groupbyydb_sex,ydb_provinceorderbycntsumlimit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_middle__groupby_ydb_age_sort_m(ydb_sexstring,ydb_provincestring,cntbigint) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectydb_sex,ydb_province,count(*) fromydb_example_shu whereydbpartion='20151231' andydb_province='辽宁'andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyydb_sex,ydb_province limit0,10"); |
高维值与低维值列usernick,ydb_zhiye的多列groupby与排序【168vs72快2.3倍】
类型 | 时间 | 备注 |
原生spark | 168秒 | selectusernick,ydb_zhiye,count(*)ascntfromydb_example_shu_sparkwhereydb_province='辽宁'groupbyusernick,ydb_zhiyeorderbycntdesclimit10; |
Sparkonydb | 72秒 | selectusernick,ydb_zhiye,sum(cnt)ascntsumafromspark_on_ydb_middle__groupby_zhiye_sort_m groupbyusernick,ydb_zhiyeorderbycntsumadesclimit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_middle__groupby_zhiye_sort_m(usernickstring,ydb_zhiyestring,cntbigint) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectusernick,ydb_zhiye,count(*) fromydb_example_shu whereydbpartion='20151231' andydb_province='辽宁'andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyusernick,ydb_zhiye limit0,10"); |
5.count(distinct)排重统计
低维值列ydb_sex的count(distinct)【156vs1快156倍】
类型 | 时间 | 备注 |
原生spark | 156秒 | selectcount(distinctydb_sex),count(*)fromydb_example_shu_sparkwhereydb_province='辽宁'limit10; |
Sparkonydb | 1秒 | selectcount(distinctydb_sex),sum(cnt)fromspark_on_ydb_middle__groupby_ydb_age_ddlimit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_middle__groupby_ydb_age_dd(ydb_sexstring,cntbigint) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectydb_sex,count(*) fromydb_example_shu whereydbpartion='20151231' andydb_province='辽宁'andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyydb_sex limit0,10"); |
低维值列ydb_zhiye的count(distinct)【138vs1快138倍】
类型 | 时间 | 备注 |
原生spark | 138秒 | selectcount(distinctydb_zhiye),count(*)fromydb_example_shu_sparkwhereydb_province='辽宁'limit10; |
Sparkonydb | 1秒 | selectcount(distinctydb_zhiye),sum(cnt)fromspark_on_ydb_middle__groupby_ydb_ppp_ddlimit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_middle__groupby_ydb_ppp_dd(ydb_zhiyestring,cntbigint) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectydb_zhiye,count(*) fromydb_example_shu whereydbpartion='20151231'andydb_province='辽宁'andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyydb_zhiye limit0,10"); |
高维值列usernick的count(distinct)【162vs29快5.5倍】
类型 | 时间 | 备注 |
原生spark | 162秒 | selectcount(distinctusernick),count(*)fromydb_example_shu_sparkwhereydb_province='辽宁'limit10; |
Sparkonydb | 29秒 | selectcount(distinctusernick),sum(cnt)fromspark_on_ydb_middle__groupby_usernick_ddlimit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_middle__groupby_usernick_dd(usernickstring,cntbigint) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectusernick,count(*) fromydb_example_shu whereydbpartion='20151231'andydb_province='辽宁'andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyusernicklimit0,10"); |
高维值列phonenum的count(distinct)【168vs2快1.75倍】
类型 | 时间 | 备注 |
原生spark | 168秒 | selectcount(distinctphonenum),count(*)fromydb_example_shu_sparkwhereydb_province='辽宁'limit10; |
Sparkonydb | 96秒 | selectcount(distinctphonenum),sum(cnt)fromspark_on_ydb_middle__groupby_phonenum_dd limit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_middle__groupby_phonenum_dd(phonenumbigint,cntbigint) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectphonenum,count(*) fromydb_example_shu whereydbpartion='20151231' andydb_province='辽宁'andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyphonenum limit0,10"); |
六、原生spark与sparkonydb性能对比-全表扫描
1.count(*)结果【156vs0.06快2600倍】
类型 | 耗时 | 备注 |
原生spark | 156秒 | selectcount(*)fromydb_example_shu_sparklimit10 |
Sparkonydb | 60毫秒 | selectsum(cnt)fromspark_on_ydb_countlimit10 |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_count(cntbigint) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectcount(*)from ydb_example_shuwhereydbpartion='20151231' andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' limit0,10"); |
2.Topn排序
低维值列ydb_age排序【600vs11快54倍】
类型 | 时间 | 备注 |
原生spark | 600秒 | selectphonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_dayfromydb_example_shu_sparkorderbyydb_agedesclimit10; |
Sparkonydb | 11秒 | selectphonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_dayfromspark_on_ydb_orderby_ydb_age orderbyydb_age desclimit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_orderby_ydb_age(phonenumbigint,usernickstring,ydb_sexstring,ydb_provincestring,ydb_gradestring,ydb_agestring,ydb_bloodstring,ydb_zhiyestring,ydb_earnstring,ydb_preferstring,ydb_consumestring,ydb_daystring) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectphonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_dayfromydb_example_shu whereydbpartion='20151231' andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' orderbyydb_age desclimit0,10"); |
高纬值列phonenum排序【594vs25快23倍】
类型 | 时间 | 备注 |
原生spark | 594秒 | selectphonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_dayfromydb_example_shu_sparkorderbyphonenumdesclimit10; |
Sparkonydb | 25秒 | selectphonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_dayfromspark_on_ydb_orderby_phonenum orderbyphonenumdesclimit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_orderby_phonenum(phonenumbigint,usernickstring,ydb_sexstring,ydb_provincestring,ydb_gradestring,ydb_agestring,ydb_bloodstring,ydb_zhiyestring,ydb_earnstring,ydb_preferstring,ydb_consumestring,ydb_daystring) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectphonenum,usernick,ydb_sex,ydb_province,ydb_grade,ydb_age,ydb_blood,ydb_zhiye,ydb_earn,ydb_prefer,ydb_consume,ydb_dayfromydb_example_shu whereydbpartion='20151231' andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' orderbyphonenumdesclimit0,10"); |
3.高纬值列phonenum的max,min统计【186vs16快11倍】
按照ydb_age排序(维值低)类型 | 时间 | 备注 |
原生spark | 186秒 | selectmax(phonenum),min(phonenum)fromydb_example_shu_sparklimit10; |
Sparkonydb | 16秒 | selectmax(phonenum1),min(phonenum2)fromspark_on_ydb_statlimit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_stat(phonenum1bigint,phonenum2bigint) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectmax(phonenum),min(phonenum)fromydb_example_shu whereydbpartion='20151231' andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000'limit0,10"); |
4.分类汇总统计groupby
低维值列ydb_sex的单列groupby【159vs7快22倍】
类型 | 时间 | 备注 |
原生spark | 159秒 | selectydb_sex,count(*)fromydb_example_shu_sparkgroupbyydb_sexlimit10; |
Sparkonydb | 7秒 | selectydb_sex,sum(cnt)fromspark_on_ydb_groupby_ydb_age groupbyydb_sexlimit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_groupby_ydb_age(ydb_sexstring,cntbigint) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectydb_sex,count(*) fromydb_example_shu whereydbpartion='20151231' andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyydb_sex limit0,10"); |
低维值列ydb_province的单列groupby【144vs8快18倍】
类型 | 时间 | 备注 |
原生spark | 144秒 | selectydb_province,count(*)fromydb_example_shu_sparkgroupbyydb_provincelimit10; |
Sparkonydb | 8秒 | selectydb_province,sum(cnt)fromspark_on_ydb_groupby_ydb_ppp groupbyydb_provincelimit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_groupby_ydb_ppp(ydb_provincestring,cntbigint) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectydb_province,count(*) fromydb_example_shu whereydbpartion='20151231' andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyydb_province limit0,10"); |
高维值列usernick的单列groupby【144vs282慢2倍】
类型 | 时间 | 备注 |
原生spark | 144秒 | selectusernick,count(*)fromydb_example_shu_sparkgroupbyusernicklimit10; |
Sparkonydb | 282s | selectusernick,sum(cnt)fromspark_on_ydb_groupby_usernick groupbyusernicklimit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_groupby_usernick(usernickstring,cntbigint) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectusernick,count(*) fromydb_example_shu whereydbpartion='20151231' andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyusernicklimit0,10"); |
高维值列phonenum的单列groupby【384vs722慢1.8倍】
类型 | 时间 | 备注 |
原生spark | 384秒 | selectphonenum,count(*)fromydb_example_shu_sparkgroupbyphonenumlimit10; |
Sparkonydb | 722秒 | selectphonenum,sum(cnt)fromspark_on_ydb_groupby_phonenum groupbyphonenumlimit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_groupby_phonenum(phonenumbigint,cntbigint) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectphonenum,count(*) fromydb_example_shu whereydbpartion='20151231' andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyphonenum limit0,10"); |
低维值列ydb_sex,ydb_province的多列groupby与排序【180vs26快6.9倍】
类型 | 时间 | 备注 |
原生spark | 180秒 | selectydb_sex,ydb_province,count(*)ascntfromydb_example_shu_sparkgroupbyydb_sex,ydb_provinceorderbycntdesclimit10; |
Sparkonydb | 26秒 | selectydb_sex,ydb_province,sum(cnt)ascntsumfromspark_on_ydb_groupby_ydb_age_sort_m groupbyydb_sex,ydb_provinceorderbycntsumlimit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_groupby_ydb_age_sort_m(ydb_sexstring,ydb_provincestring,cntbigint) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectydb_sex,ydb_province,count(*) fromydb_example_shu whereydbpartion='20151231' andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyydb_sex,ydb_province limit0,10"); |
高维值与低维值列usernick,ydb_zhiye的多列groupby与排序【336vs720慢2.14倍】
类型 | 时间 | 备注 |
原生spark | 336秒 | selectusernick,ydb_zhiye,count(*)ascntfromydb_example_shu_sparkgroupbyusernick,ydb_zhiyeorderbycntdesclimit10; |
Sparkonydb | 720秒 | selectusernick,ydb_zhiye,sum(cnt)ascntsumfromspark_on_ydb_groupby_zhiye_sort_m groupbyusernick,ydb_zhiyeorderbycntsumlimit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_groupby_zhiye_sort_m(usernickstring,ydb_zhiyestring,cntbigint) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectusernick,ydb_zhiye,count(*) fromydb_example_shu whereydbpartion='20151231' andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyusernick,ydb_zhiye limit0,10"); |
5.count(distinct)排重统计
低维值列ydb_sex的count(distinct)【132vs4快33倍】
类型 | 时间 | 备注 |
原生spark | 132秒 | selectcount(distinctydb_sex),count(*)fromydb_example_shu_sparklimit10; |
Sparkonydb | 4秒 | selectcount(distinctydb_sex),sum(cnt)fromspark_on_ydb_groupby_ydb_age_ddlimit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_groupby_ydb_age_dd(ydb_sexstring,cntbigint) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectydb_sex,count(*) fromydb_example_shu whereydbpartion='20151231' andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyydb_sex limit0,10"); |
低维值列ydb_province的count(distinct)【144vs2快72倍】
类型 | 时间 | 备注 |
原生spark | 144秒 | selectcount(distinctydb_province),count(*)fromydb_example_shu_sparklimit10; |
Sparkonydb | 2秒 | selectcount(distinctydb_province),sum(cnt)fromspark_on_ydb_groupby_ydb_ppp_ddlimit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_groupby_ydb_ppp_dd(ydb_provincestring,cntbigint) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectydb_province,count(*) fromydb_example_shu whereydbpartion='20151231' andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyydb_province limit0,10"); |
高维值列usernick的count(distinct)【156vs276慢1.76倍】
类型 | 时间 | 备注 |
原生spark | 156秒 | selectcount(distinctusernick),count(*)fromydb_example_shu_sparklimit10; |
Sparkonydb | 276s | selectcount(distinctusernick),sum(cnt)fromspark_on_ydb_groupby_usernick_ddlimit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_groupby_usernick_dd(usernickstring,cntbigint) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectusernick,count(*) fromydb_example_shu whereydbpartion='20151231' andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyusernicklimit0,10"); |
高维值列phonenum的count(distinct)【516vs996慢1.93倍】
类型 | 时间 | 备注 |
原生spark | 516秒 | selectcount(distinctphonenum),count(*)fromydb_example_shu_sparklimit10; |
Sparkonydb | 996秒 | selectcount(distinctphonenum),sum(cnt)fromspark_on_ydb_groupby_phonenum_dd limit10; |
ydb与spark的映射配置 | CREATEexternal TABLEspark_on_ydb_groupby_phonenum_dd(phonenumbigint,cntbigint) STOREDBY'cn.net.ycloud.ydb.handle.YdbStorageHandler' TBLPROPERTIES ("ydb.handler.hostport"="101.200.130.48:8080","ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata", "ydb.handler.sql"="selectphonenum,count(*) fromydb_example_shu whereydbpartion='20151231' andydbkv='export.joinchar:%01'andydbkv='export.max.return.docset.size:100000000'andydbkv='max.return.docset.size:100000000' groupbyphonenum limit0,10"); |
相关文章推荐
- YDB on Spark 性能测试
- YDB on Spark 性能测试
- YDB on Spark 性能测试
- YDB on Spark 性能测试
- YDB on Spark 性能测试
- YDB on Spark 性能测试
- YDB on Spark 性能测试
- YDB on Spark 性能测试
- YDB on Spark 性能测试
- YDB on Spark 性能测试
- 用CToolBarCtrl类为对话框创建工具栏
- Setting Cisco Switch Automatic boot
- 【慕课笔记】第一章 JAVA初体验 第2节 JAVA开发环境搭建
- C++模板元编程(二)
- 吃货如何理解线性规划的对偶
- 在部署HEXO时出现ERROR Deployer not found : github的问题解决办法
- git 使用学习笔记
- 合并两个排序的链表
- linux笔记 rpm包安装与卸载,chaxun
- 总结 XSS 与 CSRF 两种跨站攻击