本文信息基于PG13.1。
从PG9.6开始支持并行查询。PG11开始支持CREATE TABLE … AS、SELECT INTO以及CREATE MATERIALIZED VIEW的并行查询。
先说结论:
换用create table as 或者select into或者导入导出。
首先跟踪如下查询语句的执行计划:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
select count (*) from test t1,test1 t2 where t1.id = t2.id ; postgres=# explain analyze select count (*) from test t1,test1 t2 where t1.id = t2.id ; QUERY PLAN ------------------------------------------------------------------------------------------- Finalize Aggregate (cost=34244.16..34244.17 rows =1 width=8) (actual time =683.246..715.324 rows =1 loops=1) -> Gather (cost=34243.95..34244.16 rows =2 width=8) (actual time =681.474..715.311 rows =3 loops=1) Workers Planned: 2 Workers Launched: 2 -> Partial Aggregate (cost=33243.95..33243.96 rows =1 width=8) (actual time =674.689..675.285 rows =1 loops=3) -> Parallel Hash Join (cost=15428.00..32202.28 rows =416667 width=0) (actual time =447.799..645.689 rows =333333 loops=3) Hash Cond: (t1.id = t2.id) -> Parallel Seq Scan on test t1 (cost=0.00..8591.67 rows =416667 width=4) (actual time =0.025..74.010 rows =333333 loops=3) -> Parallel Hash (cost=8591.67..8591.67 rows =416667 width=4) (actual time =260.052..260.053 rows =333333 loops=3) Buckets: 131072 Batches: 16 Memory Usage: 3520kB -> Parallel Seq Scan on test1 t2 (cost=0.00..8591.67 rows =416667 width=4) (actual time =0.032..104.804 rows =333333 loops=3) Planning Time : 0.420 ms Execution Time : 715.447 ms (13 rows ) |
可以看到走了两个Workers。
下边看一下insert into select:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
postgres=# explain analyze insert into va select count (*) from test t1,test1 t2 where t1.id = t2.id ; QUERY PLAN ------------------------------------------------------------------------------------------- Insert on va (cost=73228.00..73228.02 rows =1 width=4) (actual time =3744.179..3744.187 rows =0 loops=1) -> Subquery Scan on "*SELECT*" (cost=73228.00..73228.02 rows =1 width=4) (actual time =3743.343..3743.352 rows =1 loops=1) -> Aggregate (cost=73228.00..73228.01 rows =1 width=8) (actual time =3743.247..3743.254 rows =1 loops=1) -> Hash Join (cost=30832.00..70728.00 rows =1000000 width=0) (actual time =1092.295..3511.301 rows =1000000 loops=1) Hash Cond: (t1.id = t2.id) -> Seq Scan on test t1 (cost=0.00..14425.00 rows =1000000 width=4) (actual time =0.030..421.537 rows =1000000 loops=1) -> Hash (cost=14425.00..14425.00 rows =1000000 width=4) (actual time =1090.078..1090.081 rows =1000000 loops=1) Buckets: 131072 Batches: 16 Memory Usage: 3227kB -> Seq Scan on test1 t2 (cost=0.00..14425.00 rows =1000000 width=4) (actual time =0.021..422.768 rows =1000000 loops=1) Planning Time : 0.511 ms Execution Time : 3745.633 ms (11 rows ) |
可以看到并没有Workers的指示,没有启用并行查询。
即使开启强制并行,也无法走并行查询。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
postgres=# set force_parallel_mode = on ; SET postgres=# explain analyze insert into va select count (*) from test t1,test1 t2 where t1.id = t2.id ; QUERY PLAN ------------------------------------------------------------------------------------------- Insert on va (cost=73228.00..73228.02 rows =1 width=4) (actual time =3825.042..3825.049 rows =0 loops=1) -> Subquery Scan on "*SELECT*" (cost=73228.00..73228.02 rows =1 width=4) (actual time =3824.976..3824.984 rows =1 loops=1) -> Aggregate (cost=73228.00..73228.01 rows =1 width=8) (actual time =3824.972..3824.978 rows =1 loops=1) -> Hash Join (cost=30832.00..70728.00 rows =1000000 width=0) (actual time =1073.587..3599.402 rows =1000000 loops=1) Hash Cond: (t1.id = t2.id) -> Seq Scan on test t1 (cost=0.00..14425.00 rows =1000000 width=4) (actual time =0.034..414.965 rows =1000000 loops=1) -> Hash (cost=14425.00..14425.00 rows =1000000 width=4) (actual time =1072.441..1072.443 rows =1000000 loops=1) Buckets: 131072 Batches: 16 Memory Usage: 3227kB -> Seq Scan on test1 t2 (cost=0.00..14425.00 rows =1000000 width=4) (actual time =0.022..400.624 rows =1000000 loops=1) Planning Time : 0.577 ms Execution Time : 3825.923 ms (11 rows ) |
原因在官方文档有写:
The query writes any data or locks any database rows. If a query contains a data-modifying operation either at the top level or within a CTE, no parallel plans for that query will be generated. As an exception, the commands CREATE TABLE … AS, SELECT INTO, and CREATE MATERIALIZED VIEW which create a new table and populate it can use a parallel plan.
解决方案有如下三种:
1.select into
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
postgres=# explain analyze select count (*) into vaa from test t1,test1 t2 where t1.id = t2.id ; QUERY PLAN ------------------------------------------------------------------------------------------- Finalize Aggregate (cost=34244.16..34244.17 rows =1 width=8) (actual time =742.736..774.923 rows =1 loops=1) -> Gather (cost=34243.95..34244.16 rows =2 width=8) (actual time =740.223..774.907 rows =3 loops=1) Workers Planned: 2 Workers Launched: 2 -> Partial Aggregate (cost=33243.95..33243.96 rows =1 width=8) (actual time =731.408..731.413 rows =1 loops=3) -> Parallel Hash Join (cost=15428.00..32202.28 rows =416667 width=0) (actual time =489.880..700.830 rows =333333 loops=3) Hash Cond: (t1.id = t2.id) -> Parallel Seq Scan on test t1 (cost=0.00..8591.67 rows =416667 width=4) (actual time =0.033..87.479 rows =333333 loops=3) -> Parallel Hash (cost=8591.67..8591.67 rows =416667 width=4) (actual time =266.839..266.840 rows =333333 loops=3) Buckets: 131072 Batches: 16 Memory Usage: 3520kB -> Parallel Seq Scan on test1 t2 (cost=0.00..8591.67 rows =416667 width=4) (actual time =0.058..106.874 rows =333333 loops=3) Planning Time : 0.319 ms Execution Time : 783.300 ms (13 rows ) |
2.create table as
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
postgres=# explain analyze create table vb as select count (*) from test t1,test1 t2 where t1.id = t2.id ; QUERY PLAN ------------------------------------------------------------------------------------------- Finalize Aggregate (cost=34244.16..34244.17 rows =1 width=8) (actual time =540.120..563.733 rows =1 loops=1) -> Gather (cost=34243.95..34244.16 rows =2 width=8) (actual time =537.982..563.720 rows =3 loops=1) Workers Planned: 2 Workers Launched: 2 -> Partial Aggregate (cost=33243.95..33243.96 rows =1 width=8) (actual time =526.602..527.136 rows =1 loops=3) -> Parallel Hash Join (cost=15428.00..32202.28 rows =416667 width=0) (actual time =334.532..502.793 rows =333333 loops=3) Hash Cond: (t1.id = t2.id) -> Parallel Seq Scan on test t1 (cost=0.00..8591.67 rows =416667 width=4) (actual time =0.018..57.819 rows =333333 loops=3) -> Parallel Hash (cost=8591.67..8591.67 rows =416667 width=4) (actual time =189.502..189.503 rows =333333 loops=3) Buckets: 131072 Batches: 16 Memory Usage: 3520kB -> Parallel Seq Scan on test1 t2 (cost=0.00..8591.67 rows =416667 width=4) (actual time =0.023..77.786 rows =333333 loops=3) Planning Time : 0.189 ms Execution Time : 565.448 ms (13 rows ) |
3.或者通过导入导出的方式,例如:
1
2
|
psql -h localhost -d postgres -U postgres -c "select count(*) from test t1,test1 t2 where t1.id = t2.id " -o result.csv -A -t -F "," psql -h localhost -d postgres -U postgres -c "COPY va FROM 'result.csv' WITH (FORMAT CSV, DELIMITER ',', HEADER FALSE, ENCODING 'windows-1252')" |
一些场景下也会比非并行快。
补充:POSTGRESQL: 动态SQL语句中不能使用SELECT INTO?
我的数据库版本是 PostgreSQL 8.4.7 。 下面是出错的存储过程:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
CREATE or Replace FUNCTION func_getnextid( tablename varchar (240), idname varchar (20) default 'id' ) RETURNS integer AS $funcbody$ Declare sqlstring varchar (240); currentId integer ; Begin sqlstring:= 'select max("' || idname || '") into currentId from "' || tablename || '";' ; EXECUTE sqlstring; if currentId is NULL or currentId = 0 then return 1; else return currentId + 1; end if; End ; $funcbody$ LANGUAGE plpgsq |
执行后出现这样的错误:
SQL error:
ERROR: EXECUTE of SELECT ... INTO is not implemented
CONTEXT: PL/pgSQL function "func_getnextbigid" line 6 at EXECUTE statement
改成这样的就对了:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
CREATE or Replace FUNCTION func_getnextid( tablename varchar (240), idname varchar (20) default 'id' ) RETURNS integer AS $funcbody$ Declare sqlstring varchar (240); currentId integer ; Begin sqlstring:= 'select max("' || idname || '") from "' || tablename || '";' ; EXECUTE sqlstring into currentId; if currentId is NULL or currentId = 0 then return 1; else return currentId + 1; end if; End ; $funcbody$ LANGUAGE plpgsql; |
以上为个人经验,希望能给大家一个参考,也希望大家多多支持服务器之家。如有错误或未考虑完全的地方,望不吝赐教。
原文链接:https://blog.csdn.net/pg_hgdb/article/details/112297250