数据库版本为:greenplum-db-6.11.2
有一张表,按日分区,每日数据量十万,在此表上进行多个子查询后将子查询结果逐个连接起来,在达到8个及以上slice的时候,必然在segment上出现类似 "WARNING","01000","interconnect may encountered a network error, please check your network","Failed to send packet (seq 1) to 127.0.0.1:4720 (pid 9487 cid -1) after 100 retries. ",0,,"ic_udpifc.c",5107,的错误。但因为是连接127.0.0.1所以不可能是网络层问题。此时检查postgres进程可以看到类似这样的,结合到查询计划中提示有broadcast的情况,盲猜是部分slice进程提前关闭了端口导致broadcast失败。
gpadmin 238853 15867 0 17:49 ? 0000 postgres: 6003, cygp db_ana 172.17.200.100(38304) con1796087 seg0 idle in transaction gpadmin 238854 15865 0 17:49 ? 0000 postgres: 6000, cygp db_ana 172.17.200.100(25848) con1796087 seg1 idle in transaction gpadmin 238855 15868 0 17:49 ? 0000 postgres: 6001, cygp db_ana 172.17.200.100(12888) con1796087 seg2 idle in transaction gpadmin 238856 15866 0 17:49 ? 0000 postgres: 6002, cygp db_ana 172.17.200.100(48002) con1796087 seg6 idle in transaction gpadmin 239113 15867 0 17:50 ? 0000 postgres: 6003, cygp db_ana 172.17.200.100(38632) con1796087 seg0 cmd6 slice8 MPPEXEC SELECT gpadmin 239114 15865 0 17:50 ? 0000 postgres: 6000, cygp db_ana 172.17.200.100(26176) con1796087 seg1 cmd6 slice8 MPPEXEC SELECT gpadmin 239115 15868 0 17:50 ? 0000 postgres: 6001, cygp db_ana 172.17.200.100(13216) con1796087 seg2 cmd6 slice8 MPPEXEC SELECT gpadmin 239116 15866 0 17:50 ? 0000 postgres: 6002, cygp db_ana 172.17.200.100(48330) con1796087 seg6 cmd6 slice8 MPPEXEC SELECT gpadmin 239121 15867 0 17:50 ? 0000 postgres: 6003, cygp db_ana 172.17.200.100(38656) con1796087 seg0 cmd6 slice1 MPPEXEC SELECT gpadmin 239122 15865 0 17:50 ? 0000 postgres: 6000, cygp db_ana 172.17.200.100(26200) con1796087 seg1 cmd6 slice1 MPPEXEC SELECT gpadmin 239123 15868 0 17:50 ? 0000 postgres: 6001, cygp db_ana 172.17.200.100(13240) con1796087 seg2 cmd6 slice1 MPPEXEC SELECT gpadmin 239124 15866 0 17:50 ? 0000 postgres: 6002, cygp db_ana 172.17.200.100(48354) con1796087 seg6 cmd6 slice1 MPPEXEC SELECT gpadmin 239129 15867 0 17:50 ? 0000 postgres: 6003, cygp db_ana 172.17.200.100(38680) con1796087 seg0 cmd6 slice2 MPPEXEC SELECT gpadmin 239130 15865 0 17:50 ? 0000 postgres: 6000, cygp db_ana 172.17.200.100(26224) con1796087 seg1 cmd6 slice2 MPPEXEC SELECT gpadmin 239131 15868 0 17:50 ? 0000 postgres: 6001, cygp db_ana 172.17.200.100(13264) con1796087 seg2 cmd6 slice2 MPPEXEC SELECT gpadmin 239132 15866 0 17:50 ? 0000 postgres: 6002, cygp db_ana 172.17.200.100(48378) con1796087 seg6 cmd6 slice2 MPPEXEC SELECT gpadmin 239137 15865 0 17:50 ? 0000 postgres: 6000, cygp db_ana 172.17.200.100(26248) con1796087 seg1 cmd6 slice3 MPPEXEC SELECT gpadmin 239138 15867 0 17:50 ? 0000 postgres: 6003, cygp db_ana 172.17.200.100(38704) con1796087 seg0 cmd6 slice3 MPPEXEC SELECT gpadmin 239139 15868 0 17:50 ? 0000 postgres: 6001, cygp db_ana 172.17.200.100(13288) con1796087 seg2 cmd6 slice3 MPPEXEC SELECT gpadmin 239140 15866 0 17:50 ? 0000 postgres: 6002, cygp db_ana 172.17.200.100(48402) con1796087 seg6 cmd6 slice3 MPPEXEC SELECT gpadmin 239145 15865 0 17:50 ? 0000 postgres: 6000, cygp db_ana 172.17.200.100(26272) con1796087 seg1 cmd6 slice4 MPPEXEC SELECT gpadmin 239146 15867 0 17:50 ? 0000 postgres: 6003, cygp db_ana 172.17.200.100(38728) con1796087 seg0 cmd6 slice4 MPPEXEC SELECT gpadmin 239147 15868 0 17:50 ? 0000 postgres: 6001, cygp db_ana 172.17.200.100(13312) con1796087 seg2 cmd6 slice4 MPPEXEC SELECT gpadmin 239148 15866 0 17:50 ? 0000 postgres: 6002, cygp db_ana 172.17.200.100(48426) con1796087 seg6 cmd6 slice4 MPPEXEC SELECT gpadmin 239153 15865 0 17:50 ? 0000 postgres: 6000, cygp db_ana 172.17.200.100(26296) con1796087 seg1 cmd6 slice5 MPPEXEC SELECT gpadmin 239154 15867 0 17:50 ? 0000 postgres: 6003, cygp db_ana 172.17.200.100(38752) con1796087 seg0 cmd6 slice5 MPPEXEC SELECT gpadmin 239155 15868 0 17:50 ? 0000 postgres: 6001, cygp db_ana 172.17.200.100(13336) con1796087 seg2 cmd6 slice5 MPPEXEC SELECT gpadmin 239156 15866 0 17:50 ? 0000 postgres: 6002, cygp db_ana 172.17.200.100(48450) con1796087 seg6 cmd6 slice5 MPPEXEC SELECT gpadmin 239161 15865 0 17:50 ? 0000 postgres: 6000, cygp db_ana 172.17.200.100(26320) con1796087 seg1 idle gpadmin 239162 15867 0 17:50 ? 0000 postgres: 6003, cygp db_ana 172.17.200.100(38776) con1796087 seg0 idle gpadmin 239163 15868 0 17:50 ? 0000 postgres: 6001, cygp db_ana 172.17.200.100(13360) con1796087 seg2 idle gpadmin 239164 15866 0 17:50 ? 0000 postgres: 6002, cygp db_ana 172.17.200.100(48474) con1796087 seg6 idle
生成的查询计划如下
Aggregate (cost=0.00..3879.02 rows=1 width=64) -> Gather Motion 12:1 (slice11; segments: 12) (cost=0.00..3879.02 rows=1 width=64) -> Aggregate (cost=0.00..3879.02 rows=1 width=64) -> Hash Left Join (cost=0.00..3879.02 rows=1 width=64) Hash Cond: (player_enter_7.player_id = player_enter_8.player_id) Join Filter: ((max(player_enter_8.log_time)) > (max(player_enter_7.log_time))) -> Hash Left Join (cost=0.00..3448.02 rows=1 width=72) Hash Cond: (player_enter_6.player_id = player_enter_7.player_id) Join Filter: ((max(player_enter_7.log_time)) > (max(player_enter_6.log_time))) -> Hash Left Join (cost=0.00..3017.02 rows=1 width=64) Hash Cond: (player_enter_5.player_id = player_enter_6.player_id) Join Filter: ((max(player_enter_6.log_time)) > (max(player_enter_5.log_time))) -> Hash Left Join (cost=0.00..2586.01 rows=1 width=56) Hash Cond: (player_enter_4.player_id = player_enter_5.player_id) Join Filter: ((max(player_enter_5.log_time)) > (max(player_enter_4.log_time))) -> Redistribute Motion 1:12 (slice6) (cost=0.00..2155.01 rows=5 width=48) -> Hash Left Join (cost=0.00..2155.01 rows=1 width=48) Hash Cond: (player_enter_3.player_id = player_enter_4.player_id) Join Filter: ((max(player_enter_4.log_time)) > (max(player_enter_3.log_time))) -> Hash Left Join (cost=0.00..1724.01 rows=1 width=40) Hash Cond: (player_enter_2.player_id = player_enter_3.player_id) Join Filter: ((max(player_enter_3.log_time)) > (max(player_enter_2.log_time))) -> Hash Left Join (cost=0.00..1293.00 rows=1 width=32) Hash Cond: (player_enter_1.player_id = player_enter_2.player_id) Join Filter: ((max(player_enter_2.log_time)) > (max(player_enter_1.log_time))) -> Hash Left Join (cost=0.00..862.00 rows=1 width=24) Hash Cond: (player_enter.player_id = player_enter_1.player_id) Join Filter: ((max(player_enter_1.log_time)) > (min(player_enter.log_time))) -> Gather Motion 12:1 (slice1; segments: 12) (cost=0.00..431.00 rows=1 width=16) -> Result (cost=0.00..431.00 rows=1 width=16) -> GroupAggregate (cost=0.00..431.00 rows=1 width=16) Group Key: player_enter.player_id -> Sort (cost=0.00..431.00 rows=1 width=16) Sort Key: player_enter.player_id -> Sequence (cost=0.00..431.00 rows=1 width=28) -> Partition Selector for player_enter (dynamic scan id: 1) (cost=10.00..100.00 rows=9 width=4) Partitions selected: 17 (out of 601) -> Dynamic Seq Scan on player_enter (dynamic scan id: 1) (cost=0.00..431.00 rows=1 width=28) Filter: (((channel)::text = ANY ('{a,b}'::text[])) AND (wash_date >= '2021-01-02'::date) AND (wash_date <= '2021-01-30'::date)) -> Hash (cost=431.00..431.00 rows=1 width=16) -> Gather Motion 12:1 (slice2; segments: 12) (cost=0.00..431.00 rows=1 width=16) -> Result (cost=0.00..431.00 rows=1 width=16) -> GroupAggregate (cost=0.00..431.00 rows=1 width=16) Group Key: player_enter_1.player_id -> Sort (cost=0.00..431.00 rows=1 width=16) Sort Key: player_enter_1.player_id -> Sequence (cost=0.00..431.00 rows=1 width=28) -> Partition Selector for player_enter (dynamic scan id: 2) (cost=10.00..100.00 rows=9 width=4) Partitions selected: 27 (out of 601) -> Dynamic Seq Scan on player_enter player_enter_1 (dynamic scan id: 2) (cost=0.00..431.00 rows=1 width=28) Filter: (((channel)::text = ANY ('{a,b}'::text[])) AND (wash_date >= '2021-02-02'::date) AND (wash_date <= '2021-02-28'::date)) -> Hash (cost=431.00..431.00 rows=1 width=16) -> Gather Motion 12:1 (slice3; segments: 12) (cost=0.00..431.00 rows=1 width=16) -> Result (cost=0.00..431.00 rows=1 width=16) -> GroupAggregate (cost=0.00..431.00 rows=1 width=16) Group Key: player_enter_2.player_id -> Sort (cost=0.00..431.00 rows=1 width=16) Sort Key: player_enter_2.player_id -> Sequence (cost=0.00..431.00 rows=1 width=28) -> Partition Selector for player_enter (dynamic scan id: 3) (cost=10.00..100.00 rows=9 width=4) Partitions selected: 29 (out of 601) -> Dynamic Seq Scan on player_enter player_enter_2 (dynamic scan id: 3) (cost=0.00..431.00 rows=1 width=28) Filter: (((channel)::text = ANY ('{a,b}'::text[])) AND (wash_date >= '2021-03-02'::date) AND (wash_date <= '2021-03-30'::date)) -> Hash (cost=431.00..431.00 rows=1 width=16) -> Gather Motion 12:1 (slice4; segments: 12) (cost=0.00..431.00 rows=1 width=16) -> Result (cost=0.00..431.00 rows=1 width=16) -> GroupAggregate (cost=0.00..431.00 rows=1 width=16) Group Key: player_enter_3.player_id -> Sort (cost=0.00..431.00 rows=1 width=16) Sort Key: player_enter_3.player_id -> Sequence (cost=0.00..431.00 rows=1 width=28) -> Partition Selector for player_enter (dynamic scan id: 4) (cost=10.00..100.00 rows=9 width=4) Partitions selected: 29 (out of 601) -> Dynamic Seq Scan on player_enter player_enter_3 (dynamic scan id: 4) (cost=0.00..431.00 rows=1 width=28) Filter: (((channel)::text = ANY ('{a,b}'::text[])) AND (wash_date >= '2021-04-02'::date) AND (wash_date <= '2021-04-30'::date)) -> Hash (cost=431.00..431.00 rows=1 width=16) -> Gather Motion 12:1 (slice5; segments: 12) (cost=0.00..431.00 rows=1 width=16) -> Result (cost=0.00..431.00 rows=1 width=16) -> GroupAggregate (cost=0.00..431.00 rows=1 width=16) Group Key: player_enter_4.player_id -> Sort (cost=0.00..431.00 rows=1 width=16) Sort Key: player_enter_4.player_id -> Sequence (cost=0.00..431.00 rows=1 width=28) -> Partition Selector for player_enter (dynamic scan id: 5) (cost=10.00..100.00 rows=9 width=4) Partitions selected: 29 (out of 601) -> Dynamic Seq Scan on player_enter player_enter_4 (dynamic scan id: 5) (cost=0.00..431.00 rows=1 width=28) Filter: (((channel)::text = ANY ('{a,b}'::text[])) AND (wash_date >= '2021-05-02'::date) AND (wash_date <= '2021-05-30'::date)) -> Hash (cost=431.00..431.00 rows=1 width=16) -> Broadcast Motion 12:12 (slice7; segments: 12) (cost=0.00..431.00 rows=1 width=16) -> Result (cost=0.00..431.00 rows=1 width=16) -> GroupAggregate (cost=0.00..431.00 rows=1 width=16) Group Key: player_enter_5.player_id -> Sort (cost=0.00..431.00 rows=1 width=16) Sort Key: player_enter_5.player_id -> Sequence (cost=0.00..431.00 rows=1 width=28) -> Partition Selector for player_enter (dynamic scan id: 6) (cost=10.00..100.00 rows=9 width=4) Partitions selected: 29 (out of 601) -> Dynamic Seq Scan on player_enter player_enter_5 (dynamic scan id: 6) (cost=0.00..431.00 rows=1 width=28) Filter: (((channel)::text = ANY ('{a,b}'::text[])) AND (wash_date >= '2021-06-02'::date) AND (wash_date <= '2021-06-30'::date)) -> Hash (cost=431.00..431.00 rows=1 width=16) -> Broadcast Motion 12:12 (slice8; segments: 12) (cost=0.00..431.00 rows=1 width=16) -> Result (cost=0.00..431.00 rows=1 width=16) -> GroupAggregate (cost=0.00..431.00 rows=1 width=16) Group Key: player_enter_6.player_id -> Sort (cost=0.00..431.00 rows=1 width=16) Sort Key: player_enter_6.player_id -> Sequence (cost=0.00..431.00 rows=1 width=28) -> Partition Selector for player_enter (dynamic scan id: 7) (cost=10.00..100.00 rows=9 width=4) Partitions selected: 29 (out of 601) -> Dynamic Seq Scan on player_enter player_enter_6 (dynamic scan id: 7) (cost=0.00..431.00 rows=1 width=28) Filter: (((channel)::text = ANY ('{a,b}'::text[])) AND (wash_date >= '2021-07-02'::date) AND (wash_date <= '2021-07-30'::date)) -> Hash (cost=431.00..431.00 rows=1 width=16) -> Broadcast Motion 12:12 (slice9; segments: 12) (cost=0.00..431.00 rows=1 width=16) -> Result (cost=0.00..431.00 rows=1 width=16) -> GroupAggregate (cost=0.00..431.00 rows=1 width=16) Group Key: player_enter_7.player_id -> Sort (cost=0.00..431.00 rows=1 width=16) Sort Key: player_enter_7.player_id -> Sequence (cost=0.00..431.00 rows=1 width=28) -> Partition Selector for player_enter (dynamic scan id: 8) (cost=10.00..100.00 rows=9 width=4) Partitions selected: 11 (out of 601) -> Dynamic Seq Scan on player_enter player_enter_7 (dynamic scan id: 8) (cost=0.00..431.00 rows=1 width=28) Filter: (((channel)::text = ANY ('{a,b}'::text[])) AND (wash_date >= '2020-01-02'::date) AND (wash_date <= '2020-01-30'::date)) -> Hash (cost=431.00..431.00 rows=1 width=16) -> Broadcast Motion 12:12 (slice10; segments: 12) (cost=0.00..431.00 rows=1 width=16) -> Result (cost=0.00..431.00 rows=1 width=16) -> GroupAggregate (cost=0.00..431.00 rows=1 width=16) Group Key: player_enter_8.player_id -> Sort (cost=0.00..431.00 rows=1 width=16) Sort Key: player_enter_8.player_id -> Sequence (cost=0.00..431.00 rows=1 width=28) -> Partition Selector for player_enter (dynamic scan id: 9) (cost=10.00..100.00 rows=9 width=4) Partitions selected: 9 (out of 601) -> Dynamic Seq Scan on player_enter player_enter_8 (dynamic scan id: 9) (cost=0.00..431.00 rows=1 width=28) Filter: (((channel)::text = ANY ('{a,b}'::text[])) AND (wash_date >= '2020-02-02'::date) AND (wash_date <= '2020-02-28'::date)) Optimizer: Pivotal Optimizer (GPORCA)
查询SQL如下:
SELECT COUNT(_step_1.oid) , COUNT(_step_2.oid) , COUNT(_step_3.oid) , COUNT(_step_4.oid) , COUNT(_step_5.oid) , COUNT(_step_6.oid) , COUNT(_step_7.oid) , COUNT(_step_8.oid) FROM ( SELECT tem.oid, MIN(tem.log_time) as log_time FROM ( SELECT player_enter.player_id AS oid, player_enter.log_time AS log_time FROM player_enter WHERE player_enter.channel in ('a','b') AND player_enter.wash_date BETWEEN '2021-01-02' AND '2021-01-30' ) AS tem GROUP BY tem.oid ) AS _step_1 LEFT JOIN ( SELECT tem.oid, MAX(tem.log_time) as log_time FROM ( SELECT player_enter.player_id AS oid, player_enter.log_time AS log_time FROM player_enter WHERE player_enter.channel in ('a','b') AND player_enter.wash_date BETWEEN '2021-02-02' AND '2021-02-28' ) AS tem GROUP BY tem.oid ) AS _step_2 ON _step_2.oid = _step_1.oid AND _step_2.log_time > _step_1.log_time LEFT JOIN ( SELECT tem.oid, MAX(tem.log_time) as log_time FROM ( SELECT player_enter.player_id AS oid, player_enter.log_time AS log_time FROM player_enter WHERE player_enter.channel in ('a','b') AND player_enter.wash_date BETWEEN '2021-03-02' AND '2021-03-30' ) AS tem GROUP BY tem.oid ) AS _step_3 ON _step_3.oid = _step_2.oid AND _step_3.log_time > _step_2.log_time LEFT JOIN ( SELECT tem.oid, MAX(tem.log_time) as log_time FROM ( SELECT player_enter.player_id AS oid, player_enter.log_time AS log_time FROM player_enter WHERE player_enter.channel in ('a','b') AND player_enter.wash_date BETWEEN '2021-04-02' AND '2021-04-30' ) AS tem GROUP BY tem.oid ) AS _step_4 ON _step_4.oid = _step_3.oid AND _step_4.log_time > _step_3.log_time LEFT JOIN ( SELECT tem.oid, MAX(tem.log_time) as log_time FROM ( SELECT player_enter.player_id AS oid, player_enter.log_time AS log_time FROM player_enter WHERE player_enter.channel in ('a','b') AND player_enter.wash_date BETWEEN '2021-05-02' AND '2021-05-30' ) AS tem GROUP BY tem.oid ) AS _step_5 ON _step_5.oid = _step_4.oid AND _step_5.log_time > _step_4.log_time LEFT JOIN ( SELECT tem.oid, MAX(tem.log_time) as log_time FROM ( SELECT player_enter.player_id AS oid, player_enter.log_time AS log_time FROM player_enter WHERE player_enter.channel in ('a','b') AND player_enter.wash_date BETWEEN '2021-06-02' AND '2021-06-30' ) AS tem GROUP BY tem.oid ) AS _step_6 ON _step_6.oid = _step_5.oid AND _step_6.log_time > _step_5.log_time LEFT JOIN ( SELECT tem.oid, MAX(tem.log_time) as log_time FROM ( SELECT player_enter.player_id AS oid, player_enter.log_time AS log_time FROM player_enter WHERE player_enter.channel in ('a','b') AND player_enter.wash_date BETWEEN '2021-07-02' AND '2021-07-30' ) AS tem GROUP BY tem.oid ) AS _step_7 ON _step_7.oid = _step_6.oid AND _step_7.log_time > _step_6.log_time LEFT JOIN ( SELECT tem.oid, MAX(tem.log_time) as log_time FROM ( SELECT player_enter.player_id AS oid, player_enter.log_time AS log_time FROM player_enter WHERE player_enter.channel in ('a','b') AND player_enter.wash_date BETWEEN '2020-01-02' AND '2020-01-30' ) AS tem GROUP BY tem.oid ) AS _step_8 ON _step_8.oid = _step_7.oid AND _step_8.log_time > _step_7.log_time LEFT JOIN ( SELECT tem.oid, MAX(tem.log_time) as log_time FROM ( SELECT player_enter.player_id AS oid, player_enter.log_time AS log_time FROM player_enter WHERE player_enter.channel in ('a','b') AND player_enter.wash_date BETWEEN '2020-02-02' AND '2020-02-28' ) AS tem GROUP BY tem.oid ) AS _step_9 ON _step_9.oid = _step_8.oid AND _step_9.log_time > _step_8.log_time