您的位置：首页 > 数据库

PostgreSQL 百亿级数据范围查询, 分组排序窗口取值极致优化 case

2017-05-16 16:20 1346 查看

本文仅用于学习：

本文将对一个任意范围按ID分组查出每个ID对应的最新记录的CASE做一个极致的优化体验。

优化后性能维持在可控范围内，任意数据量，毫秒级返回，性能平稳可控。

比优化前性能提升1万倍。

CASE

有一张数据表，结构：

CREATE TABLE target_position (
target_id varchar(80),
time bigint,
content text
);

数据量是 100 亿条左右
target_id 大约 20 万个

数据库使用的是 PostgreSQL 9.4

需求：
查询每个目标指定时间段的最新一条数据，要求1秒内返回数据。
时间段不确定

现在是使用窗口函数来实现，如下：
select target_id,time,content from (select *,row_number() over (partition by target_id order by time desc) rid from target_position where time>开始时间 and time<=结束时间) as t where rid=1;
效果很差。

分析一下原理，这个case其实慢就慢在扫描的时间段，因为需要遍历整个时间段的数据，然后分组排序，取出该时间段内每个target_id的最新一条记录。

这个语句决定了时间段越大，可能的扫描量就越大，时间越久。

直奔最优方案，CASE里有提到，target_id大约20万个，理论上不管要扫描的范围有多大，最多只需要扫描20万条tuple。

怎样做到呢，用函数即可。

首先要开另外一种表维护target_id的唯一值，方便取数据，这个需要应用层配合来做到这一点，其实不难的，就是关系解耦。

下面是测试样本

postgres=# create unlogged table t1(id int, crt_time timestamp);
CREATE TABLE
postgres=# create unlogged table t2(id int primary key);
CREATE TABLE
postgres=# insert into t1 select trunc(random()*200000),clock_timestamp() from generate_series(1,100000000);
INSERT 0 100000000
postgres=# create index idx_t1_1 on t1(id,crt_time desc);
CREATE INDEX
postgres=# select * from t1 limit 10;
id   |          crt_time
--------+----------------------------
49092 | 2016-05-06 16:50:29.88595
947 | 2016-05-06 16:50:29.887553
179124 | 2016-05-06 16:50:29.887562
197308 | 2016-05-06 16:50:29.887564
93558 | 2016-05-06 16:50:29.887566
127133 | 2016-05-06 16:50:29.887568
163507 | 2016-05-06 16:50:29.887569
110546 | 2016-05-06 16:50:29.887571
65363 | 2016-05-06 16:50:29.887573
122666 | 2016-05-06 16:50:29.887575
(10 rows)
postgres=# insert into t2 select generate_series(1,200000);
INSERT 0 200000

来看一个未优化的查询计划和耗时，从查询计划来看，已经很优了，但是由于提供的查询范围内数据量有450多万，所以最后查询的耗时也达到了15秒。

postgres=# explain analyze select * from (select *,row_number() over(partition by id order by crt_time desc) rn from t1 where crt_time between '2016-05-06 16:50:29.887566' and '2016-05-06 16:50:34.887566') t where rn=1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Subquery Scan on t  (cost=0.57..1819615.87 rows=2500 width=20) (actual time=0.083..15301.915 rows=200000 loops=1)
Filter: (t.rn = 1)
Rows Removed by Filter: 4320229
->  WindowAgg  (cost=0.57..1813365.87 rows=500000 width=12) (actual time=0.078..14012.867 rows=4520229 loops=1)
->  Index Only Scan using idx_t1_1 on t1  (cost=0.57..1804615.87 rows=500000 width=12) (actual time=0.066..10603.161 rows=4520229 loops=1)
Index Cond: ((crt_time >= '2016-05-06 16:50:29.887566'::timestamp without time zone) AND (crt_time <= '2016-05-06 16:50:34.887566'::timestamp without time zone))
Heap Fetches: 4520229
Planning time: 0.202 ms
Execution time: 15356.066 ms
(9 rows)

优化阶段1

通过online code循环，性能提升到了秒级。

postgres=# do language plpgsql $$
declare
x int;
begin
for x in select id from t2 loop
perform * from t1 where id=x and crt_time between '2016-05-06 16:50:29.887566' and '2016-05-06 16:50:34.887566' order by crt_time desc limit 1;
end loop;
end;
$$;
DO
Time: 2311.081 ms

写成函数更通用

postgres=# create or replace function f(start_time timestamp, end_time timestamp) returns setof t1 as $$
declare
x int;
begin
for x in select id from t2 loop
return query select * from t1 where id=x and crt_time between '2016-05-06 16:50:29.887566' and '2016-05-06 16:50:32.887566' order by crt_time desc limit 1;
end loop;
return;
end;
$$ language plpgsql strict;
CREATE FUNCTION

postgres=# explain analyze select * from f('2016-05-06 16:50:29.887566', '2016-05-06 16:50:34.887566');
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Function Scan on f  (cost=0.25..10.25 rows=1000 width=12) (actual time=2802.565..2850.445 rows=199999 loops=1)
Planning time: 0.036 ms
Execution time: 2885.924 ms
(3 rows)
Time: 2886.314 ms

postgres=# select * from f('2016-05-06 16:50:29.887566', '2016-05-06 16:50:34.887566') limit 10;
id |          crt_time
----+----------------------------
1 | 2016-05-06 16:50:32.507124
2 | 2016-05-06 16:50:32.774655
3 | 2016-05-06 16:50:32.48621
4 | 2016-05-06 16:50:32.874258
5 | 2016-05-06 16:50:32.677812
6 | 2016-05-06 16:50:32.091517
7 | 2016-05-06 16:50:32.724287
8 | 2016-05-06 16:50:32.669251
9 | 2016-05-06 16:50:32.815634
10 | 2016-05-06 16:50:32.812239
(10 rows)
Time: 3108.222 ms

把时间范围放大到扫描约5000万记录的范围。

用原来的方法需要104秒，时间随数据量范围变大而增加。

postgres=# explain analyze select * from (select *,row_number() over(partition by id order by crt_time desc) rn from t1 where crt_time between '2016-05-06 16:50:29.887566' and '2016-05-06 16:51:19.887566') t where rn=1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan on t  (cost=0.57..1819615.87 rows=2500 width=20) (actual time=0.042..103886.966 rows=200000 loops=1)
Filter: (t.rn = 1)
Rows Removed by Filter: 46031611
->  WindowAgg  (cost=0.57..1813365.87 rows=500000 width=12) (actual time=0.037..92722.913 rows=46231611 loops=1)
->  Index Only Scan using idx_t1_1 on t1  (cost=0.57..1804615.87 rows=500000 width=12) (actual time=0.030..62673.221 rows=46231611 loops=1)
Index Cond: ((crt_time >= '2016-05-06 16:50:29.887566'::timestamp without time zone) AND (crt_time <= '2016-05-06 16:51:19.887566'::timestamp without time zone))
Heap Fetches: 46231611
Planning time: 0.119 ms
Execution time: 103950.955 ms
(9 rows)
Time: 103951.638 ms

用优化后的方法时间依旧不变，只需要2.9秒出结果

postgres=# explain analyze select * from f('2016-05-06 16:50:29.887566', '2016-05-06 16:51:19.887566');
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Function Scan on f  (cost=0.25..10.25 rows=1000 width=12) (actual time=2809.562..2858.468 rows=199999 loops=1)
Planning time: 0.037 ms
Execution time: 2894.181 ms
(3 rows)
Time: 2894.605 ms

优化阶段2

继续优化，把SQL抽象成函数

postgres=# create or replace function f1(int, timestamp, timestamp) returns t1 as $$
select * from t1 where id=$1 and crt_time between $2 and $3 order by crt_time desc limit 1;
$$ language sql strict;
CREATE FUNCTION
Time: 0.564 ms

循环在外头，比函数中的FOR效率更高，内核中的代码开销更少，所以效率提升到2.3秒了。

postgres=# explain analyze select f1(id,'2016-05-06 16:50:29.887566','2016-05-06 16:50:34.887566') from t2;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Seq Scan on t2  (cost=0.00..59560.50 rows=225675 width=4) (actual time=0.206..2213.069 rows=200000 loops=1)
Planning time: 0.121 ms
Execution time: 2261.185 ms
(3 rows)
Time: 2261.740 ms

postgres=# select count(*) from (select f1(id,'2016-05-06 16:50:29.887566','2016-05-06 16:50:34.887566') from t2)t;
count
--------
200000
(1 row)
Time: 2359.005 ms

因为循环放到外面了，所以可以用游标，可以用limit限制，返回20万记录可以使用分页，对用户体验来说大大提升。

postgres=# select f1(id,'2016-05-06 16:50:29.887566','2016-05-06 16:50:34.887566') from t2 limit 10;
f1
-----------------------------------
(1,"2016-05-06 16:50:34.818639")
(2,"2016-05-06 16:50:34.874603")
(3,"2016-05-06 16:50:34.741072")
(4,"2016-05-06 16:50:34.727868")
(5,"2016-05-06 16:50:34.507418")
(6,"2016-05-06 16:50:34.715711")
(7,"2016-05-06 16:50:34.817961")
(8,"2016-05-06 16:50:34.786087")
(9,"2016-05-06 16:50:34.76778")
(10,"2016-05-06 16:50:34.836663")
(10 rows)
Time: 0.771 ms

优化阶段3

但是返回所有记录还是没有到1秒内对吧，还有优化的空间么？

我的目标除了优化，还需要榨干硬件性能。

所以，如果你的硬件资源足够，那么其实这个时候就需要并行了，因为取单条记录是很快的，但是循环20万次就慢了。

来看看1万次循环要多久，降低到115毫秒了，符合要求。

postgres=# select count(*) from (select f1(id,'2016-05-06 16:50:29.887566','2016-05-06 16:50:34.887566') from (select * from t2 limit 10000) t) t;
count
-------
10000
(1 row)
Time: 115.690 ms

所以要降低到1秒以内，可以开20个并行，每个查一部分ID，组成一个大的结果集即可。

目前还不支持数据库层的并行，将来PG 9.6会支持。

现在可以在应用层这么来做，但是如何做到并行的数据一致性呢？

这里不得不提一下PG的黑科技，shared export snapshot，允许会话间共享事务快照，所有的事务看到的状态是一致的，这个黑科技已经应用在并行备份中。

现在，应用层如果有跨会话的一致性视角要求，也能使用这个黑科技哦，例如 :

首先

开启会话1

postgres=# begin transaction isolation level repeatable read;
BEGIN
Time: 0.173 ms
postgres=# select pg_export_snapshot();
pg_export_snapshot
--------------------
0FC9C2A3-1
(1 row)

开启会话2, 并导入快照

postgres=# begin transaction isolation level repeatable read;
BEGIN
postgres=# SET TRANSACTION SNAPSHOT '0FC9C2A3-1';
SET

开启会话3, 并导入快照

postgres=# begin transaction isolation level repeatable read;
BEGIN
postgres=# SET TRANSACTION SNAPSHOT '0FC9C2A3-1';
SET

并行的分别在三个会话执行如下

postgres=# select count(*) from (select f1(id,'2016-05-06 16:50:29.887566','2016-05-06 16:50:34.887566') from (select * from t2 order by id limit 70000 offset 0) t) t;
count
-------
70000
(1 row)
Time: 775.071 ms
postgres=# select count(*) from (select f1(id,'2016-05-06 16:50:29.887566','2016-05-06 16:50:34.887566') from (select * from t2 order by id limit 70000 offset 70000) t) t;
count
-------
70000
(1 row)
Time: 763.747 ms
postgres=# select count(*) from (select f1(id,'2016-05-06 16:50:29.887566','2016-05-06 16:50:34.887566') from (select * from t2 order by id limit 70000 offset 140000) t) t;
count
-------
60000
(1 row)

Time: 665.743 ms

并行执行降到1秒内了。

以上查询还有优化的空间哦，就在offset这里，其实ID是PK，所以没有必要用offset，价格范围更好。

但是瓶颈其实不在扫描T2表，所以就是这么任性，不管了。

如果还要优化，把t2再打散即可，做到10毫秒是没有问题的，也就是千万范围的数据能提升1万倍哦。

从优化原理来看，数据量到百亿性能也是一样的，不信可以试试的。

优化阶段4

优化到这里就结束了吗? 当然还没有，因为前面的优化是把ID抽象出来了的，所以不管你要取值的范围是多大，都需要扫描所有的ID，虽然都走索引，但是还有提升的空间。

因此还有优化手段，可以减少扫描的ID次数，例如我给你100万的数据范围，但是这些范围内只有100个唯一ID，理论上只需要扫描100次，但是使用前面的方法，它依旧要扫描20万次。

方法很简单：

（假设需要扫描的时间字段是有流式属性的，既自增，那么可以使用PostgreSQL的黑科技brin索引来提速，如果不是流式的，那就要用传统的btree索引走index only scan了 on(crt_time,id)）

这个索引是为了快速的得到这个范围内的最大ID。

postgres=# create index idx_t2_1 on t1 using brin(crt_time);
CREATE INDEX

插入100万流式数据，但是这100万记录中只有100个唯一ID。

postgres=# insert into t1 select trunc(random()*100),clock_timestamp() from generate_series(1,1000000);
INSERT 0 1000000
Time: 4065.084 ms
postgres=# select now();
now
------------------------------
2016-05-07 11:32:12.93416+08
(1 row)
Time: 0.346 ms

创建一个函数，用来获取输入的ID的下一个ID的最大时间，放在递归查询里面使用。

create or replace function f2(int,timestamp,timestamp) returns t1 as $$
select * from t1 where id is not null and id>$1 and crt_time between $2 and $3 order by id,crt_time desc limit 1;
$$ language sql strict set enable_sort=off;

创建另一个函数，使用递归查询，得到给定范围的所有ID的最大时间。

create or replace function f3(start_time timestamp, end_time timestamp) returns setof t1 as $$
declare
maxid int;
begin
select max(id) into maxid from t1 where crt_time between start_time and end_time;
return query with recursive skip as (
(
select id,crt_time from t1 where crt_time between start_time and end_time order by id,crt_time desc limit 1
)
union all
(
select (f2(s1.id, start_time, end_time)).* from skip s1 where s1.id <> maxid and s1.id is not null
)
) select * from skip;
end;
$$ language plpgsql strict;

postgres=# select * from f3('2016-05-07 09:50:29.887566','2016-05-07 16:50:29.987566');
id |          crt_time
----+----------------------------
0 | 2016-05-07 11:32:00.983203
1 | 2016-05-07 11:32:00.982906
...
97 | 2016-05-07 11:32:00.983281
98 | 2016-05-07 11:32:00.983206
99 | 2016-05-07 11:32:00.983107
(100 rows)
Time: 177.203 ms

速度杠杠的，只需要177毫秒。

使用阶段3的优化方法需要的时间是恒定的，约3秒多。

select count(*) from (select * from (select (f1(id,'2016-05-07 09:50:29.887566','2016-05-07 16:50:29.987566')).* from t2) t where t.* is not null) t;
count
-------
100
(1 row)
Time: 3153.508 ms

但是阶段4的优化也不是万能的，因为它并不适用于给定范围的ID很多的情况。

请看：

postgres=# select count(*) from f3('2016-05-06 16:50:29.887566','2016-05-06 16:50:34.887566');
count
--------
200000
(1 row)
Time: 13344.261 ms

对于给定范围ID很多的情况，还是建议使用阶段3的优化方法。

postgres=#  select count(*) from (select * from (select (f1(id,'2016-05-06 16:50:29.887566','2016-05-06 16:50:34.887566')).* from t2) t where t.* is not null) t;
count
--------
200000
(1 row)
Time: 3846.156 ms

优化阶段5

怎么自动评估选定范围内的唯一的ID个数呢？

可以用到我前面文章提到的方法,使用以下评估函数

CREATE FUNCTION count_estimate(query text) RETURNS INTEGER AS
$func$
DECLARE
rec   record;
ROWS  INTEGER;
BEGIN
FOR rec IN EXECUTE 'EXPLAIN ' || query LOOP
ROWS := SUBSTRING(rec."QUERY PLAN" FROM ' rows=([[:digit:]]+)');
EXIT WHEN ROWS IS NOT NULL;
END LOOP;

RETURN ROWS;
END
$func$ LANGUAGE plpgsql;

postgres=# explain select distinct id from t1 where crt_time between '2016-05-06 16:50:29.887566' and '2016-05-06 16:50:34.887566';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate  (cost=672240.13..672329.49 rows=8936 width=4)
Group Key: id
->  Bitmap Heap Scan on t1  (cost=46663.05..660864.26 rows=4550347 width=4)
Recheck Cond: ((crt_time >= '2016-05-06 16:50:29.887566'::timestamp without time zone) AND (crt_time <= '2016-05-06 16:50:34.887566'::timestamp without time zone))
->  Bitmap Index Scan on idx_t2_1  (cost=0.00..45525.47 rows=4550347 width=0)
Index Cond: ((crt_time >= '2016-05-06 16:50:29.887566'::timestamp without time zone) AND (crt_time <= '2016-05-06 16:50:34.887566'::timestamp without time zone))
(6 rows)
Time: 0.645 ms

postgres=# explain select distinct id from t1 where crt_time between '2016-05-07 09:50:29.887566' and '2016-05-07 16:50:29.987566';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate  (cost=23.12..23.13 rows=1 width=4)
Group Key: id
->  Bitmap Heap Scan on t1  (cost=22.00..23.12 rows=1 width=4)
Recheck Cond: ((crt_time >= '2016-05-07 09:50:29.887566'::timestamp without time zone) AND (crt_time <= '2016-05-07 16:50:29.987566'::timestamp without time zone))
->  Bitmap Index Scan on idx_t2_1  (cost=0.00..22.00 rows=1 width=0)
Index Cond: ((crt_time >= '2016-05-07 09:50:29.887566'::timestamp without time zone) AND (crt_time <= '2016-05-07 16:50:29.987566'::timestamp without time zone))
(6 rows)
Time: 0.641 ms

postgres=# select count_estimate($$select distinct id from t1 where crt_time between '2016-05-06 16:50:29.887566' and '2016-05-06 16:50:34.887566'$$);
count_estimate
----------------
8936
(1 row)
Time: 1.139 ms

postgres=# select count_estimate($$select distinct id from t1 where crt_time between '2016-05-07 09:50:29.887566' and '2016-05-07 16:50:29.987566'$$);
count_estimate
----------------
1
(1 row)
Time: 0.706 ms

接下来你懂的，根据记录数选择应该使用阶段3还是阶段4的优化方法。

另外再奉上count(distinct xx) 以及 distinct xx的优化，也是极为变态的。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： sql postgresql 优化查询

相关文章推荐

新的分享

章节导航