#Post Title | #Site Title

Patroni and ETCD: Upgrade from PostgreSQL 10 to PostgreSQL 14

Begin by installing PostgreSQL 14. In this case we're installing all PostgreSQL 14 packages.

yum install postgresql14*

Check version after installation:

[root@psql04 ~]# /usr/pgsql-14/bin/psql –version
psql (PostgreSQL) 14.5
[root@psql04 ~]#

[root@psql04 ~]# grep -Ei bin_dir /etc/patroni.yml
bin_dir: /usr/pgsql-14/bin
[root@psql04 ~]#

Fix config file:

Oct 29 23:32:06 psql09.nix.mds.xyz patroni[24559]: 2022-10-30 03:32:06.506 GMT [24740] LOG: unrecognized configuration parameter "wal_keep_segments" in file "/data/patroni/postgresql.conf" line 17
Oct 29 23:32:06 psql09.nix.mds.xyz patroni[24559]: 2022-10-30 03:32:06.506 GMT [24740] FATAL: configuration file "/data/patroni/postgresql.conf" contains errors

https://www.depesz.com/2020/07/27/waiting-for-postgresql-14-rename-wal_keep_segments-to-wal_keep_size/

Rename wal_keep_segments to wal_keep_size. N * 16 to get wal_keep_size.

Add the old value:

[root@psql07 patroni]# vi /etc/patroni.yml
[root@psql07 patroni]# cat /etc/patroni.yml

postgresql:
parameters:
wal_keep_segments: 8

[root@psql07 patroni]#

patroni will recalculate. If not, change PG_VERSION though this turned out to be a mistake since changing the DB on an existing DB is not the way to proceed with this upgrade:

[root@psql07 patroni]# cat PG_VERSION
10
[root@psql07 patroni]# vi PG_VERSION
[root@psql07 patroni]# cat PG_VERSION
14
[root@psql07 patroni]#

The next error you'll get is:

Oct 30 10:26:09 psql07.nix.mds.xyz patroni[24601]: 2022-10-30 10:26:09.282 EDT [24774] FATAL: database files are incompatible with server
Oct 30 10:26:09 psql07.nix.mds.xyz patroni[24601]: 2022-10-30 10:26:09.282 EDT [24774] DETAIL: The database cluster was initialized with PG_CONTROL_VERSION 1002, buerver was compiled with PG_CONTROL_VERSION 1300.

Stop patroni, etcd, postgres(if running):

systemctl stop etcd patroni

initdb the new PostgreSQL 14 DB. An empty DB is needed:

-bash-4.2$ /usr/pgsql-14/bin/initdb -D /data/patroni/
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /data/patroni … ok
creating subdirectories … ok
selecting dynamic shared memory implementation … posix
selecting default max_connections … 100
selecting default shared_buffers … 128MB
selecting default time zone … America/Toronto
creating configuration files … ok
running bootstrap script … ok
performing post-bootstrap initialization … ok
syncing data to disk … ok

initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
–auth-local and –auth-host, the next time you run initdb.

Success. You can now start the database server using:

/usr/pgsql-14/bin/pg_ctl -D /data/patroni/ -l logfile start

-bash-4.2$
-bash-4.2$ pwd
/var/lib/pgsql
-bash-4.2$ whoami
postgres
-bash-4.2$

Try the upgrade once more. This again was a mistake since in place upgrade on the same folder does NOT work:

-bash-4.2$ /usr/pgsql-14/bin/pg_upgrade –old-datadir=/data/patroni-10/ –new-datadir=/data/patroni –old-bindir=/usr/pgsql-10/bin –new-bindir=/usr/pgsql-14/bin –check
Performing Consistency Checks
—————————–
Checking cluster versions ok

old cluster uses data checksums but the new one does not
Failure, exiting
-bash-4.2$ pwd
/data/patroni
-bash-4.2$

Try the initialize with checksums. This again was a mistake since we're messing with the existing data directory, which is not correct:

-bash-4.2$ /usr/pgsql-14/bin/initdb -D /data/patroni/ –data-checksums
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are enabled.

Success. You can now start the database server using:

/usr/pgsql-14/bin/pg_ctl -D /data/patroni/ -l logfile start

-bash-4.2$

Create a new postgres data dir on the slaves called /data/patroni-14 with correct permissions. Let's focus on the leader. Replicas will copy from leader when leader is ready. Now, postgres will have to be running so the old patroni DB has to stay online while we initialize a new empty directory for PSQL 14:

*failure*
Consult the last few lines of "pg_upgrade_server.log" for
the probable cause of the failure.

connection to server on socket "/var/lib/pgsql/.s.PGSQL.50432" failed: No such file or directory
Is the server running locally and accepting connections on that socket?

could not connect to source postmaster started with the command:
"/usr/pgsql-10/bin/pg_ctl" -w -l "pg_upgrade_server.log" -D "/data/patroni-10" -o "-p 50432 -b -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/var/lib/pgsql'" start
Failure, exiting
-bash-4.2$

meaning, let's create a new 14 folder an dupgrade while patroni is running. Initialize the new folder:

-bash-4.2$ /usr/pgsql-14/bin/initdb -D /data/patroni-14/ –data-checksums

The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are enabled.

fixing permissions on existing directory /data/patroni-14 … ok
creating subdirectories … ok
selecting dynamic shared memory implementation … posix
selecting default max_connections … 100
selecting default shared_buffers … 128MB
selecting default time zone … America/Toronto
creating configuration files … ok
running bootstrap script … ok
performing post-bootstrap initialization … ok
syncing data to disk … ok

Success. You can now start the database server using:

/usr/pgsql-14/bin/pg_ctl -D /data/patroni-14/ -l logfile start

-bash-4.2$

change the /etc/patroni.yml config to point to

postgresql:
data_dir: /data/patroni-10
bin_dir: /usr/pgsql-10/bin

This time it's getting further:

-bash-4.2$ /usr/pgsql-14/bin/pg_upgrade –old-datadir=/data/patroni-10/ –new-datadir=/data/patroni-14 –old-bindir=/usr/pgsql-10/bin –new-bindir=/usr/pgsql-14/bin –check
Performing Consistency Checks on Old Live Server
————————————————
Checking cluster versions ok
could not translate host name "." to address: No address associated with hostname
Failure, exiting
-bash-4.2$

/usr/pgsql-14/bin/pg_upgrade –old-datadir=/data/patroni-10/ –new-datadir=/data/patroni-14 –old-bindir=/usr/pgsql-10/bin –new-bindir=/usr/pgsql-14/bin –check –verbose

First log segment after reset: 000000010000000000000002
could not translate host name "." to address: Temporary failure in name resolution

After disabling firewalld:

/usr/pgsql-14/bin/pg_upgrade –old-datadir=/data/patroni-10/ –new-datadir=/data/patroni-14 –old-bindir=/usr/pgsql-10/bin –new-bindir=/usr/pgsql-14/bin –check –verbose

First log segment after reset: 000000010000000000000002
could not translate host name "." to address: No address associated with hostname

If you get the two above errors, it's because of this log entry:

$ view pg_upgrade_server.log

—————————————————————–
pg_upgrade run on Sun Oct 30 22:03:51 2022
—————————————————————–

command: "/usr/pgsql-10/bin/pg_ctl" -w -l "pg_upgrade_server.log" -D "/data/patroni-10" -o "-p 50432 -c autovacuum=off -c autovacuum_freeze_max_age=2000000000 -c listen_addresses='' -c unix_socket_permissions=0700" start >> "pg_upgrade_server.log" 2>&1
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start….2022-10-30 22:03:51.552 EDT [5592] FATAL: lock file "postmaster.pid" already exists
2022-10-30 22:03:51.552 EDT [5592] HINT: Is another postmaster (PID 1499) running in data directory "/data/patroni-10"?
stopped waiting
pg_ctl: could not start server
Examine the log output.

Stop patroni on all the nodes, which will stop postgres then rerun the upgrade:

/usr/pgsql-14/bin/pg_upgrade –old-datadir=/data/patroni-10/ –new-datadir=/data/patroni-14 –old-bindir=/usr/pgsql-10/bin –new-bindir=/usr/pgsql-14/bin –check –verbose

Now change the /etc/patroni.yml configuration from:

[root@psql08 data]# grep -Ei "\-10|postgresql:" /etc/patroni.yml
postgresql:
bin_dir: /usr/pgsql-10/bin
data_dir: /data/patroni-10
unix_socket_directories: /data/patroni-10
[root@psql08 data]#

[root@psql08 data]# grep -Ei "\-14|postgresql:" /etc/patroni.yml
postgresql:
bin_dir: /usr/pgsql-14/bin
data_dir: /data/patroni-14
unix_socket_directories: /data/patroni-14
[root@psql08 data]#

on all the nodes. ( Assuming the /data/patroni-14 is created with right permissions but is empty. ) Don't restart or start Patroni yet! We'll do so after the upgrade below.

Now remove "–check" and run the actual upgrade:

/usr/pgsql-14/bin/pg_upgrade –old-datadir=/data/patroni-10/ –new-datadir=/data/patroni-14 –old-bindir=/usr/pgsql-10/bin –new-bindir=/usr/pgsql-14/bin –verbose

A result similar to this should appear:

[……………]
ok
"/usr/pgsql-14/bin/pg_ctl" -w -D "/data/patroni-14" -o "" -m smart stop >> "pg_upgrade_server.log" 2>&1

Upgrade Complete
—————-
Optimizer statistics are not transferred by pg_upgrade.
Once you start the new server, consider running:
/usr/pgsql-14/bin/vacuumdb –all –analyze-in-stages

Running this script will delete the old cluster's data files:
./delete_old_cluster.sh
-bash-4.2$

The suggestion to delete the old cluster is fine after you verify the database upgraded without issues:

[root@psql08 data]# cat /var/lib/pgsql/delete_old_cluster.sh
#!/bin/sh

rm -rf '/data/patroni-10'
[root@psql08 data]#

Starting with the leader, start patroni. Let data replicate naturally. DO NOT run upgrade on replicas. Not necessary since Patroni will copy the data. First let's check ETCD. As this is an upgrade, there is a new DB ID:

2022-10-30 23:20:21,416 CRITICAL: system ID mismatch, node psql08 belongs to a different cluster: 6617627977882355208 != 7160491406504506775

Then clear the ETCD DB to let it reinitialize:

[root@psql07 patroni-14]# systemctl stop etcd
[root@psql07 patroni-14]# cd /var/lib/etcd/
[root@psql07 etcd]# ls -altri
total 4
134299846 drwxr-xr-x. 40 root root 4096 Mar 9 2022 ..
134580878 drwxr-xr-x. 3 etcd etcd 25 Mar 9 2022 .
69151445 drwx——. 3 etcd etcd 19 Oct 30 21:33 default.etcd
[root@psql07 etcd]# mv default.etcd/ default.etcd-backup01
[root@psql07 etcd]# systemctl restart etcd
[root@psql07 etcd]# patronictl –config-file=/etc/patroni.yml list
+ Cluster: postgres (uninitialized) +———–+
| Member | Host | Role | State | TL | Lag in MB |
+——–+——+——+——-+—-+———–+
+——–+——+——+——-+—-+———–+
[root@psql07 etcd]# etcdctl cluster-health
member 5f18ca2eace413eb is healthy: got healthy result from http://192.168.0.139:2379
member 932bb7207c3efe0a is healthy: got healthy result from http://192.168.0.245:2379
member b791c1a05993c35f is healthy: got healthy result from http://192.168.0.177:2379
cluster is healthy
[root@psql07 etcd]#

Now restart patroni starting with the leader:

give it a minute to initialize to a leader. If not, check the pg_hba.conf file:

[root@psql08 patroni-14]# diff pg_hba.conf /data/patroni-10/pg_hba.conf

>
> host replication replicator 127.0.0.1/32 md5
> host replication replicator 192.168.0.108/0 md5
> host replication replicator 192.168.0.124/0 md5
> host replication replicator 192.168.0.118/0 md5
> host all all 0.0.0.0/0 md5

and allow replication. Restart the node Patroni service again:

or standby leader, in the case of a regional replicated cluster:

you should have a leader now. Restart the replicas one by one, verifying that data is replicated. If you get messages like this:

Oct 30 23:50:18 psql09 patroni: pg_basebackup: removing contents of data directory "/data/patroni-14"
Oct 30 23:50:23 psql09 patroni: pg_basebackup: error: could not initiate base backup: ERROR: WAL generated with full_page_writes=off was replayed since last restartpoint
Oct 30 23:50:23 psql09 patroni: HINT: This means that the backup being taken on the standby is corrupt and should not be used. Enable full_page_writes and run CHECKPOINT on the primary, and then try an online backup again.
Oct 30 23:50:23 psql09 patroni: pg_basebackup: removing contents of data directory "/data/patroni-14"

and you are on a standby leader, promote this cluster to a Leader instead:

[root@psql08 log]# patronictl -c /etc/patroni.yml edit-config
Not changed
[root@psql08 log]# curl -s -XPATCH -d '{"standby_cluster":null}' psql08:8008/config
{"ttl": 30, "loop_wait": 10, "retry_timeout": 10, "maximum_lag_on_failover": 1048576, "postgresql": {"use_pg_rewind": true}}[root@psql08 log]#
[root@psql08 log]#

and restart the replicas:

Once you have a working cluster, reinitialize the former Leader (It now should get set to Standby Leader) after copying the initial database over with the proper system ID. Before taking a tar.gz, stop the replica on which you'll get the data from, otherwise file corruption will creep into the tar.gz due to constant writes:

[root@psql07 patroni-14]# tar -zcvf dual-site-cluster01.tar.gz *

[root@psql07 patroni-14]# scp dual-site-cluster01.tar.gz root@psql04:/data/patroni-14/
Password:
dual-site-cluster01.tar.gz 100% 702MB 42.1MB/s 00:16
[root@psql07 patroni-14]#

Ensure the config matches on the secondary, soon to become Standby Leader, cluster. Example of folder changes:

[root@psql04 patroni-14]# grep -Ei "[il]-1|postgresql:" /etc/patroni.yml
postgresql:
postgresql:
bin_dir: /usr/pgsql-14/bin
data_dir: /data/patroni-14
unix_socket_directories: /data/patroni-14
[root@psql04 patroni-14]#

Extract the file after copying:

[root@psql04 patroni-14]# ls -altri
total 719016
520076 drwxr-xr-x. 4 root root 49 Oct 31 00:20 ..
204583251 drwx——. 2 postgres postgres 39 Oct 31 00:21 .
204583252 -rw-r–r–. 1 root root 736268709 Oct 31 00:21 dual-site-cluster01.tar.gz
[root@psql04 patroni-14]# tar -zxf dual-site-cluster01.tar.gz

Remove the old initialization from ETCD on the soon to be Standby Leader, set cluster to new and reinitialize:

[root@psql04 patroni-14]# etcdctl rm -r /db
[root@psql04 patroni-14]# etcdctl ls -r
[root@psql04 patroni-14]#
[root@psql04 patroni-14]#
[root@psql04 patroni-14]#
[root@psql04 patroni-14]# vi /etc/etcd/etcd.conf
[root@psql04 patroni-14]# systemctl restart etcd
[root@psql04 patroni-14]# vi /etc/etcd/etcd.conf
ETCD_LISTEN_PEER_URLS="http://192.168.0.202:2380"
ETCD_LISTEN_CLIENT_URLS="http://localhost:2379,http://192.168.0.202:2379"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.0.202:2380"
ETCD_INITIAL_CLUSTER="etcd04=http://192.168.0.202:2380,etcd05=http://192.168.0.103:2380,etcd06=http://192.168.0.186:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://192.168.0.202:2379"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-c02"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
ETCD_NAME="etcd04"

Once done, change the above ETCD_INITIAL_CLUSTER_STATE="new" to ETCD_INITIAL_CLUSTER_STATE="existing". Restart etcd again. ETCD should show empty:

[root@psql06 data]# etcdctl ls -r /
[root@psql06 data]#

Start patroni on the primary node. The node where you copied the tar.gz. As before, replicas will copy data on their own. Npw. set the cluster as standby leader, to copy from the leader cluster, after starting etcd on all standby cluster nodes:

curl -s -XPATCH -d '{"standby_cluster":{"host":"psql-c03","port":5432}}' psql04:8008/config

Replication should have started from the Primary Leader cluster once running the above:

2022-10-31 00:33:06.468 EDT [6121] LOG: starting PostgreSQL 14.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
2022-10-31 00:33:06.474 EDT [6121] LOG: listening on IPv4 address "192.168.0.202", port 5432
2022-10-31 00:33:06.476 EDT [6121] LOG: listening on Unix socket "./.s.PGSQL.5432"
2022-10-31 00:33:06.488 EDT [6125] LOG: database system was interrupted while in recovery at log time 2022-10-31 00:33:00 EDT
2022-10-31 00:33:06.488 EDT [6125] HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
2022-10-31 00:33:06.511 EDT [6127] FATAL: the database system is starting up
2022-10-31 00:33:06.537 EDT [6129] FATAL: the database system is starting up
2022-10-31 00:33:06.820 EDT [6125] LOG: entering standby mode
2022-10-31 00:33:06.828 EDT [6125] LOG: redo starts at 32/62000060
2022-10-31 00:33:06.832 EDT [6125] LOG: consistent recovery state reached at 32/62000230
2022-10-31 00:33:06.832 EDT [6125] LOG: invalid record length at 32/62000230: wanted 24, got 0
2022-10-31 00:33:06.833 EDT [6121] LOG: database system is ready to accept read-only connections
2022-10-31 00:33:06.856 EDT [6134] LOG: started streaming WAL from primary at 32/62000000 on timeline 2
^C
[root@psql04 log]# tail -f postgresql-Mon.log ^C
[root@psql04 log]#
[root@psql04 log]# pwd
/data/patroni-14/log
[root@psql04 log]#

There should not be any spectarular issues. Start the replicas on the standby cluster. If you notice the following error:

[root@psql06 data]# tail -f /var/log/messages
Oct 31 00:37:02 psql06 automount[30482]: st_ready: st_ready(): state = 2 path /net
Oct 31 00:37:03 psql06 patroni: pg_basebackup: error: could not create directory "/data/patroni-14": Permission denied
Oct 31 00:37:08 psql06 patroni: pg_basebackup: error: could not create directory "/data/patroni-14": Permission denied
Oct 31 00:37:12 psql06 automount[30482]: st_expire: state 1 path /misc
Oct 31 00:37:12 psql06 automount[30482]: expire_proc: exp_proc = 139742978586368 path /misc
Oct 31 00:37:12 psql06 automount[30482]: expire_cleanup: got thid 139742978586368 path /misc stat 0
Oct 31 00:37:12 psql06 automount[30482]: expire_cleanup: sigchld: exp 139742978586368 finished, switching from 2 to 1
Oct 31 00:37:12 psql06 automount[30482]: st_ready: st_ready(): state = 2 path /misc
Oct 31 00:37:13 psql06 patroni: pg_basebackup: error: could not create directory "/data/patroni-14": Permission denied
Oct 31 00:37:18 psql06 patroni: pg_basebackup: error: could not create directory "/data/patroni-14": Permission denied
Oct 31 00:37:23 psql06 patroni: pg_basebackup: error: could not create directory "/data/patroni-14": Permission denied

Set the ownership of the containing folder to postgres.postgres for auto folder recreation on failure:

[root@psql06 data]# pwd
/data
[root@psql06 data]# ls -altri
total 8
128 dr-xr-xr-x. 25 root root 4096 Mar 6 2022 ..
135582283 drwx——. 21 postgres postgres 4096 Oct 30 23:46 patroni-10-backup01
69165434 drwxr-xr-x. 3 root root 32 Oct 31 00:11 .
[root@psql06 data]# chown postgres.postgres .
[root@psql06 data]# ls -altri
total 8
128 dr-xr-xr-x. 25 root root 4096 Mar 6 2022 ..
135582283 drwx——. 21 postgres postgres 4096 Oct 30 23:46 patroni-10-backup01
69165434 drwxr-xr-x. 3 postgres postgres 32 Oct 31 00:11 .
[root@psql06 data]#

You should now see a recreated replica:

Finished!

This entry was posted on Monday, October 31st, 2022 at 12:41 am and is filed under NIX Posts. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

You must be logged in to post a comment.

Thoughts and Scribbles | MicroDevSys.com

Patroni and ETCD: Upgrade from PostgreSQL 10 to PostgreSQL 14

Leave a Reply

Meta

Recent Entries

Categories

Blogroll

Databases

Java

Languages

Linux

Miscellaneous

Perl

Scripting

Web


	Copyright © 2003 - 2013 Tom Kacperski (microdevsys.com). All rights reserved. This work is licensed under a Creative Commons Attribution 3.0 Unported License Privacy / Use / Terms / Disclaimer Policy.

Thoughts and Scribbles | MicroDevSys.com

Patroni and ETCD: Upgrade from PostgreSQL 10 to PostgreSQL 14

Share this:

Leave a Reply

Meta

Recent Entries

Categories

Blogroll

Databases

Java

Languages

Linux

Miscellaneous

Perl

Scripting

Web