此问题在公司内部已经被两个项目组遇到了,表现为在删除操作的时候,PostgreSQL 进程会直接崩溃,服务器日志大概如下
2020-03-25 02:30:59.813 UTC [1] LOG: server process (PID 89) was terminated by signal 11: Segmentation fault
2020-03-25 02:30:59.813 UTC [1] DETAIL: Failed process was running: DELETE FROM a;
2020-03-25 02:30:59.815 UTC [1] LOG: terminating any other active server processes
2020-03-25 02:30:59.817 UTC [74] WARNING: terminating connection because of crash of another server process
2020-03-25 02:30:59.817 UTC [74] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-03-25 02:30:59.817 UTC [74] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2020-03-25 02:30:59.833 UTC [1] LOG: all server processes terminated; reinitializing
2020-03-25 02:30:59.861 UTC [91] LOG: database system was interrupted; last known up at 2020-03-25 02:30:49 UTC
2020-03-25 02:31:00.032 UTC [91] LOG: database system was not properly shut down; automatic recovery in progress
2020-03-25 02:31:00.036 UTC [91] LOG: redo starts at 0/1652540
解决方案
这是 PostgreSQL 的一个 bug,尝试运行
VACUUM FULL ANALYZE;
之后一般可以恢复正常。
终极解决方案还是升级 PostgreSQL 到 11.x 最新版本,至少是 11.1,小版本升级没兼容性问题,请勿直接升级到 12.x 版本。
问题复现方法
使用 PostgreSQL 11.0 版本,然后执行下面的 sql
CREATE TABLE a (
id bigint
);
INSERT INTO a (id) VALUES (1);
ALTER TABLE ONLY a ADD CONSTRAINT a_pkey PRIMARY KEY (id);
CREATE TABLE b (
a_id bigint
);
ALTER TABLE ONLY b ADD CONSTRAINT b_a_id_fkey FOREIGN KEY (a_id) REFERENCES a(id);
ALTER TABLE a ADD x BOOLEAN NOT NULL DEFAULT FALSE;
再执行 DELETE FROM a;
的时候就可以复现问题。
root=# DELETE FROM a;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
修复的代码见 https://git.postgresql.org/gitweb/?p=postgresql.git&a=commit&h=7636e5c60f