停止数据库的命令:
1
|
pg_ctl stop -D $PGDATA [-m shutdown-mode] |
shutdown-mode有如下几种模式:
1. smart: 等所有的连接中止后,关闭数据库。如果客户端连接不终止, 则无法关闭数据库。
开启一个空会话:
1
2
3
4
5
6
|
[root@localhost ~]# su - postgres [postgres@localhost ~]$ psql psql (9.4.4) Type "help" for help. postgres=# |
用smart关闭数据库:
1
2
3
4
5
|
[postgres@localhost ~]$ pg_ctl stop -D $PGDATA -m smart waiting for server to shut down............................................................... failed pg_ctl: server does not shut down HINT: The "-m fast" option immediately disconnects sessions rather than waiting for session-initiated disconnection |
2. fast: 快速关闭数据库, 断开客户端的连接,让已有的事务回滚,然后正常关闭数据库。
1
2
3
|
[postgres@localhost ~]$ pg_ctl stop -D $PGDATA -m fast waiting for server to shut down.... done server stopped |
查看关闭日志:
1
2
3
4
5
|
LOG: received fast shutdown request LOG: aborting any active transactions FATAL: terminating connection due to administrator command LOG: shutting down LOG: database system is shut down |
会话被强制中断,然后关闭数据库。
起一个事务,然后测试关闭:
1
2
3
4
5
6
7
|
postgres=# create table t(id int primary key , name varchar (9)); CREATE TABLE postgres=# begin ; BEGIN postgres=# insert into t values (1, 'a' ) postgres-# ; INSERT 0 1 |
不提交, 然后用FAST MODE去关闭数据库:
1
2
3
|
[postgres@localhost ~]$ pg_ctl stop -D $PGDATA -m fast waiting for server to shut down.... done server stopped |
查看日志:
1
2
3
4
5
6
|
LOG: received fast shutdown request LOG: aborting any active transactions LOG: autovacuum launcher shutting down FATAL: terminating connection due to administrator command LOG: shutting down LOG: database system is shut down |
同样是直接中断会话, 而不去管事务有没有提交。
1
2
3
4
|
postgres=# select * from t; id | name ----+------ (0 rows ) |
没有提交的数据, 在重启之后并不能查到。
3. immediate: 立即关闭数据库,立即停止数据库进程,直接退出,下次启动时会进行实例恢复。
1
2
3
4
5
6
7
8
|
postgres=# insert into t values (1, 'a' ) ; INSERT 0 1 postgres=# select * from t; id | name ----+------ 1 | a (1 row) |
关闭数据库:
1
2
3
|
[postgres@localhost ~]$ pg_ctl stop -D $PGDATA -m immediate waiting for server to shut down.... done server stopped |
查看日志:
1
2
3
4
5
6
7
|
LOG: received immediate shutdown request WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. |
启动数据库:
1
2
|
[postgres@localhost ~]$ pg_ctl -D /apps/pgsql/pgdata -l 1.log start server starting |
查看日志:
1
2
3
4
5
6
7
8
9
|
LOG: database system was interrupted; last known up at 2017-04-27 18:56:47 PDT LOG: database system was not properly shut down; automatic recovery in progress #提示非正常关机,自动开启恢复。 LOG: redo starts at 0/181F910 LOG: record with zero length at 0/181FA90 LOG: redo done at 0/181FA60 LOG: last completed transaction was at log time 2017-04-27 18:59:13.727213-07 LOG: MultiXact member wraparound protections are now enabled LOG: autovacuum launcher started LOG: database system is ready to accept connections |
查看数据:
1
2
3
4
5
6
7
8
9
|
[postgres@localhost ~]$ psql psql (9.4.4) Type "help" for help. postgres=# select * from t; id | name ----+------ 1 | a (1 row) |
提交的数据已通过实例恢复。
小结:
对比以上三种关库模式:
smart最为安全,但最慢, 需要将所有连接都断开后,才会关库,默认关库模式。
fast强制中断会话,而不管有操作有没有提交,在做系统维护(系统维护时一般应用都正常关闭了,或者不再会有事务操作。)时,需要这种模式来关闭数据库。
immediate最暴力的方式,不管数据有没有落盘(POSGRE是遵循WAL机制),就直接关掉, 待启动时进行实例恢复, 如果在关库前有大量的事务没有写入磁盘, 那这个恢复过程可能会非常的漫长。
补充:postgresql 异步 stream replication 环境关闭 master 的验证
os: ubuntu 16.04
db: postgresql 9.6.8
验证在异步 stream replication环境下,主动关闭master时,数据是否有丢失,能丢失多少。
版本
1
2
3
4
5
6
7
8
9
10
11
|
# lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 16.04.5 LTS Release: 16.04 Codename: xenial # su - postgres -c "psql -c \"select version();\"" version ---------------------------------------------------------------------------------------------------------------------------------------------- PostgreSQL 9.6.8 on x86_64-pc-linux-gnu (Ubuntu 9.6.8-1.pgdg16.04+1), compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609, 64- bit (1 row) |
用 pgbench 模拟数据库的大量数据操作
1
2
3
4
|
postgres=# create database pgbenchdb; # su - postgres $ pgbench -i -s 20 pgbenchdb $ pgbench -r -j2 -c4 -T300 pgbenchdb |
关闭 master
1
2
|
# su - postgres $ /usr/lib/postgresql/9.6/bin/pg_ctl stop -m fast -D "/data/pg9.6/main" |
提升 slave
1
2
|
# su - postgres $ /usr/lib/postgresql/9.6/bin/pg_ctl promote -D "/data/pg9.6/main" |
查看 old master 的 xlog location
1
2
3
|
$ /usr/lib/postgresql/9.6/bin/pg_xlogdump 000000010000000000000016 rmgr: XLOG len (rec/tot): 106/ 106, tx: 0, lsn: 0/16000028, prev 0/152C9A10, desc : CHECKPOINT_SHUTDOWN redo 0/16000028; tli 1; prev tli 1; fpw true ; xid 0:118746; oid 16432; multi 1; offset 0; oldest xid 543 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 550/118745; oldest running xid 0; shutdown pg_xlogdump: FATAL: error in WAL record at 0/16000028: invalid record length at 0/16000098: wanted 24, got 0 |
可以看到 lsn: 0/16000028, prev 0/152C9A10, desc: CHECKPOINT_SHUTDOWN redo 0/16000028;
查看 new master 的 .history文件
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
$ ls -lt|more total 360456 -rw ------- 1 postgres postgres 16777216 Nov 30 10:32 000000020000000000000016 drwx ------ 2 postgres postgres 4096 Nov 30 10:16 archive_status -rw ------- 1 postgres postgres 42 Nov 30 10:16 00000002.history -rw ------- 1 postgres postgres 16777216 Nov 30 10:16 000000010000000000000016.partial -rw ------- 1 postgres postgres 16777216 Nov 30 10:16 000000010000000000000015 -rw ------- 1 postgres postgres 16777216 Nov 30 10:16 000000010000000000000014 -rw ------- 1 postgres postgres 16777216 Nov 30 10:05 000000010000000000000013 $ cat 00000002.history 1 0/16000098 no recovery target specified $ /usr/lib/postgresql/9.6/bin/pg_xlogdump 000000010000000000000016 rmgr: XLOG len (rec/tot): 106/ 106, tx: 0, lsn: 0/16000028, prev 0/152C9A10, desc : CHECKPOINT_SHUTDOWN redo 0/16000028; tli 1; prev tli 1; fpw true ; xid 0:118746; oid 16432; multi 1; offset 0; oldest xid 543 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 550/118745; oldest running xid 0; shutdown rmgr: XLOG len (rec/tot): 42/ 42, tx: 0, lsn: 0/16000098, prev 0/16000028, desc : END_OF_RECOVERY tli 2; prev tli 1; time 2018-11-30 10:16:57.249408 CST rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/160000C8, prev 0/16000098, desc : RUNNING_XACTS nextXid 118746 latestCompletedXid 118745 oldestRunningXid 118746 rmgr: XLOG len (rec/tot): 51/ 312, tx: 0, lsn: 0/16000100, prev 0/160000C8, desc : FPI_FOR_HINT , blkref #0: rel 1664/0/1260 blk 0 FPW rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/16000238, prev 0/16000100, desc : RUNNING_XACTS nextXid 118746 latestCompletedXid 118745 oldestRunningXid 118746 pg_xlogdump: FATAL: error in WAL record at 0/16000238: invalid record length at 0/16000270: wanted 24, got 0 |
可以看到关键记录
1
2
|
lsn: 0/16000028, prev 0/152C9A10, desc : CHECKPOINT_SHUTDOWN redo 0/16000028 lsn: 0/16000098, prev 0/16000028, desc : END_OF_RECOVERY |
而 END_OF_RECOVERY 对应的 lsn 为 0/16000098,和 00000002.history 时间线文件的内容完全一致。
所以在异步 stream replication 环境下,主动关闭master时,会将最后一条记录(CHECKPOINT_SHUTDOWN)发送给slave,不会造成数据的丢失。
而 synchronous_commit = on 保证事务有两份持久化的落盘数据。
分析 pg_log 日志
old master 上的最后几条日志
1
2
3
4
5
6
7
8
9
10
11
12
|
2018-11-30 10:16:40.986 CST, "postgres" , "pgbenchdb" ,7559, "[local]" ,5c009d79.1d87,4, "UPDATE waiting" ,2018-11-30 10:16:25 CST,,0,LOG,00000, "disconnection: session time: 0:00:15.723 user=postgres database=pgbenchdb host=[local]" ,,,,,,,,, "pgbench" 2018-11-30 10:16:40.993 CST, "postgres" , "pgbenchdb" ,7558, "[local]" ,5c009d79.1d86,3, "idle" ,2018-11-30 10:16:25 CST,4/0,0,FATAL,57P01, "terminating connection due to administrator command" ,,,,,,,,, "pgbench" 2018-11-30 10:16:40.994 CST, "postgres" , "pgbenchdb" ,7560, "[local]" ,5c009d79.1d88,3, "idle" ,2018-11-30 10:16:25 CST,5/0,0,FATAL,57P01, "terminating connection due to administrator command" ,,,,,,,,, "pgbench" 2018-11-30 10:16:40.994 CST, "postgres" , "pgbenchdb" ,7558, "[local]" ,5c009d79.1d86,4, "idle" ,2018-11-30 10:16:25 CST,,0,LOG,00000, "disconnection: session time: 0:00:15.729 user=postgres database=pgbenchdb host=[local]" ,,,,,,,,, "pgbench" 2018-11-30 10:16:40.994 CST, "postgres" , "pgbenchdb" ,7560, "[local]" ,5c009d79.1d88,4, "idle" ,2018-11-30 10:16:25 CST,,0,LOG,00000, "disconnection: session time: 0:00:15.725 user=postgres database=pgbenchdb host=[local]" ,,,,,,,,, "pgbench" 2018-11-30 10:16:40.999 CST, "postgres" , "pgbenchdb" ,7561, "[local]" ,5c009d79.1d89,3, "idle" ,2018-11-30 10:16:25 CST,6/0,0,FATAL,57P01, "terminating connection due to administrator command" ,,,,,,,,, "pgbench" 2018-11-30 10:16:41.001 CST, "postgres" , "pgbenchdb" ,7561, "[local]" ,5c009d79.1d89,4, "idle" ,2018-11-30 10:16:25 CST,,0,LOG,00000, "disconnection: session time: 0:00:15.731 user=postgres database=pgbenchdb host=[local]" ,,,,,,,,, "pgbench" 2018-11-30 10:16:41.010 CST,,,7156,,5c009735.1bf4,7,,2018-11-30 09:49:41 CST,,0,LOG,00000, "shutting down" ,,,,,,,,, "" 2018-11-30 10:16:41.209 CST,,,7156,,5c009735.1bf4,8,,2018-11-30 09:49:41 CST,,0,LOG,00000, "checkpoint starting: shutdown immediate" ,,,,,,,,, "" 2018-11-30 10:16:47.623 CST,,,7156,,5c009735.1bf4,9,,2018-11-30 09:49:41 CST,,0,LOG,00000, "checkpoint complete: wrote 29357 buffers (89.6%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=5.931 s, sync=0.399 s, total=6.418 s; sync files=53, longest=0.179 s, average=0.007 s; distance=311294 kB, estimate=311294 kB" ,,,,,,,,, "" 2018-11-30 10:16:47.683 CST, "repl" , "" ,7227, "192.168.56.90:52556" ,5c009795.1c3b,3, "streaming 0/16000098" ,2018-11-30 09:51:17 CST,,0,LOG,00000, "disconnection: session time: 0:25:30.149 user=repl database= host=192.168.56.90 port=52556" ,,,,,,,,, "walreceiver" 2018-11-30 10:16:47.730 CST,,,7153,,5c009735.1bf1,5,,2018-11-30 09:49:41 CST,,0,LOG,00000, "database system is shut down" ,,,,,,,,, "" |
注意倒数第二条信息 streaming 0/16000098 ,说明当时的master关闭时,已经和salve沟通过,确认已经接收到 END_OF_RECOVERY 之前所有的数据了。
old slave 日志
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
2018-11-30 10:16:47.660 CST,,,7256,,5c009795.1c58,2,,2018-11-30 09:51:17 CST,,0,LOG,00000, "replication terminated by primary server" , "End of WAL reached on timeline 1 at 0/16000098." ,,,,,,,, "" 2018-11-30 10:16:47.660 CST,,,7256,,5c009795.1c58,3,,2018-11-30 09:51:17 CST,,0,FATAL,XX000, "could not send end-of-streaming message to primary: no COPY in progress " ,,,,,,,,, "" 2018-11-30 10:16:47.660 CST,,,7255,,5c009795.1c57,5,,2018-11-30 09:51:17 CST,1/0,0,LOG,00000, "invalid record length at 0/16000098: wanted 24, got 0" ,,,,,,,,, "" 2018-11-30 10:16:47.854 CST,,,7443,,5c009d8f.1d13,1,,2018-11-30 10:16:47 CST,,0,FATAL,XX000, "could not connect to the primary server: could not connect to server: Connection refused Is the server running on host " "192.168.56.119" " and accepting TCP/IP connections on port 5432? " ,,,,,,,,, "" 2018-11-30 10:16:52.668 CST,,,7444,,5c009d94.1d14,1,,2018-11-30 10:16:52 CST,,0,FATAL,XX000, "could not connect to the primary server: could not connect to server: Connection refused Is the server running on host " "192.168.56.119" " and accepting TCP/IP connections on port 5432? " ,,,,,,,,, "" 2018-11-30 10:16:56.875 CST,,,7255,,5c009795.1c57,6,,2018-11-30 09:51:17 CST,1/0,0,LOG,00000, "received promote request" ,,,,,,,,, "" 2018-11-30 10:16:56.875 CST,,,7255,,5c009795.1c57,7,,2018-11-30 09:51:17 CST,1/0,0,LOG,00000, "redo done at 0/16000028" ,,,,,,,,, "" 2018-11-30 10:16:56.875 CST,,,7255,,5c009795.1c57,8,,2018-11-30 09:51:17 CST,1/0,0,LOG,00000, "last completed transaction was at log time 2018-11-30 10:16:40.986869+08" ,,,,,,,,, "" 2018-11-30 10:16:56.888 CST,,,7255,,5c009795.1c57,9,,2018-11-30 09:51:17 CST,1/0,0,LOG,00000, "selected new timeline ID: 2" ,,,,,,,,, "" 2018-11-30 10:16:57.166 CST,,,7255,,5c009795.1c57,10,,2018-11-30 09:51:17 CST,1/0,0,LOG,00000, "archive recovery complete" ,,,,,,,,, "" 2018-11-30 10:16:57.267 CST,,,7255,,5c009795.1c57,11,,2018-11-30 09:51:17 CST,1/0,0,LOG,00000, "MultiXact member wraparound protections are now enabled" ,,,,,,,,, "" 2018-11-30 10:16:57.267 CST,,,7257,,5c009795.1c59,1,,2018-11-30 09:51:17 CST,,0,LOG,00000, "checkpoint starting: force" ,,,,,,,,, "" 2018-11-30 10:16:57.275 CST,,,7253,,5c009795.1c55,3,,2018-11-30 09:51:17 CST,,0,LOG,00000, "database system is ready to accept connections" ,,,,,,,,, "" 2018-11-30 10:16:57.276 CST,,,7447,,5c009d99.1d17,1,,2018-11-30 10:16:57 CST,,0,LOG,00000, "autovacuum launcher started" ,,,,,,,,, "" |
信息也是相当的清晰。
wal_retrieve_retry_interval = 5s 控制 salve 到 master 失败时,再次重试的等待时间。
以上为个人经验,希望能给大家一个参考,也希望大家多多支持服务器之家。如有错误或未考虑完全的地方,望不吝赐教。
原文链接:https://www.cnblogs.com/o-to-s/articles/6881646.html