Header Shadow Image


Unexpected error. Unable to verify database connection. / Cannot find CDH’\”s bigtop-detect-javahome.

If you get these errors:

Unexpected error. Unable to verify database connection. 

Checking the logs, we see the following:


2019-06-24 23:42:18,402 INFO scm-web-118667:com.cloudera.enterprise.JavaMelodyFacade: Entering HTTP Operation: Method:POST, Path:/clusters/2/healthStatusBar
2019-06-24 23:42:18,404 INFO scm-web-119745:com.cloudera.enterprise.JavaMelodyFacade: Entering HTTP Operation: Method:POST, Path:/clusters/2/statusContent
2019-06-24 23:42:18,412 INFO scm-web-118667:com.cloudera.enterprise.JavaMelodyFacade: Exiting HTTP Operation: Method:POST, Path:/clusters/2/healthStatusBar, Status:200
2019-06-24 23:42:18,420 INFO scm-web-119745:com.cloudera.enterprise.JavaMelodyFacade: Exiting HTTP Operation: Method:POST, Path:/clusters/2/statusContent, Status:200
2019-06-24 23:42:19,481 INFO scm-web-119111:com.cloudera.enterprise.JavaMelodyFacade: Entering HTTP Operation: Method:POST, Path:/dbTestConn/testConnection
2019-06-24 23:42:19,489 INFO scm-web-119111:com.cloudera.cmf.service.ServiceHandlerRegistry: Executing host command HiveTestDatabaseConnection with sensitive arguments. Host: DbHost{id=28, hostId=a787204f-931d-47e0-8aef-534ab6355a7a, hostName=cm-r01en01.mws.mds.xyz}
2019-06-24 23:42:19,500 INFO scm-web-119111:com.cloudera.cmf.service.ServiceHandlerRegistry: Executing host command OozieTestDatabaseConnection with sensitive arguments. Host: DbHost{id=28, hostId=a787204f-931d-47e0-8aef-534ab6355a7a, hostName=cm-r01en01.mws.mds.xyz}
2019-06-24 23:42:19,506 INFO scm-web-119111:com.cloudera.cmf.service.ServiceHandlerRegistry: Executing host command HueTestDatabaseConnection with sensitive arguments. Host: DbHost{id=28, hostId=a787204f-931d-47e0-8aef-534ab6355a7a, hostName=cm-r01en01.mws.mds.xyz}
2019-06-24 23:42:19,809 INFO scm-web-119111:com.cloudera.enterprise.JavaMelodyFacade: Exiting HTTP Operation: Method:POST, Path:/dbTestConn/testConnection, Status:200
2019-06-24 23:42:19,826 INFO avro-servlet-hb-processor-1:com.cloudera.server.common.AgentAvroServlet: (45 skipped) AgentAvroServlet: heartbeat processing stats: average=12ms, min=8ms, max=79ms.
2019-06-24 23:42:21,847 INFO scm-web-120167:com.cloudera.enterprise.JavaMelodyFacade: Entering HTTP Operation: Method:POST, Path:/dbTestConn/checkConnectionResult
2019-06-24 23:42:21,854 INFO scm-web-120167:com.cloudera.enterprise.JavaMelodyFacade: Exiting HTTP Operation: Method:POST, Path:/dbTestConn/checkConnectionResult, Status:200
2019-06-24 23:42:23,890 INFO scm-web-120477:com.cloudera.enterprise.JavaMelodyFacade: Entering HTTP Operation: Method:POST, Path:/dbTestConn/checkConnectionResult
2019-06-24 23:42:23,896 INFO scm-web-120477:com.cloudera.enterprise.JavaMelodyFacade: Exiting HTTP Operation: Method:POST, Path:/dbTestConn/checkConnectionResult, Status:200
2019-06-24 23:42:24,373 WARN avro-servlet-hb-processor-0:com.cloudera.server.cmf.AgentProtocolImpl: Received Process Heartbeat for unknown (or duplicate) process. Ignoring. This is expected to happen once after old process eviction or process deletion (as happens in restarts). id=242 name=null host=a787204f-931d-47e0-8aef-534ab6355a7a/cm-r01en01.mws.mds.xyz
2019-06-24 23:42:24,374 WARN avro-servlet-hb-processor-0:com.cloudera.server.cmf.AgentProtocolImpl: Received Process Heartbeat for unknown (or duplicate) process. Ignoring. This is expected to happen once after old process eviction or process deletion (as happens in restarts). id=243 name=null host=a787204f-931d-47e0-8aef-534ab6355a7a/cm-r01en01.mws.mds.xyz
2019-06-24 23:42:24,374 WARN avro-servlet-hb-processor-0:com.cloudera.server.cmf.AgentProtocolImpl: Received Process Heartbeat for unknown (or duplicate) process. Ignoring. This is expected to happen once after old process eviction or process deletion (as happens in restarts). id=244 name=null host=a787204f-931d-47e0-8aef-534ab6355a7a/cm-r01en01.mws.mds.xyz
2019-06-24 23:42:25,935 INFO scm-web-119807:com.cloudera.enterprise.JavaMelodyFacade: Entering HTTP Operation: Method:POST, Path:/dbTestConn/checkConnectionResult
2019-06-24 23:42:25,942 INFO scm-web-119807:com.cloudera.enterprise.JavaMelodyFacade: Exiting HTTP Operation: Method:POST, Path:/dbTestConn/checkConnectionResult, Status:200
2019-06-24 23:42:28,049 INFO scm-web-119778:com.cloudera.enterprise.JavaMelodyFacade: Entering HTTP Operation: Method:POST, Path:/dbTestConn/checkConnectionResult
2019-06-24 23:42:28,056 INFO scm-web-119778:com.cloudera.enterprise.JavaMelodyFacade: Exiting HTTP Operation: Method:POST, Path:/dbTestConn/checkConnectionResult, Status:200
2019-06-24 23:42:29,383 INFO scm-web-120247:com.cloudera.enterprise.JavaMelodyFacade: Entering HTTP Operation: Method:POST, Path:/clusters/2/healthStatusBar
2019-06-24 23:42:29,394 INFO scm-web-120247:com.cloudera.enterprise.JavaMelodyFacade: Exiting HTTP Operation: Method:POST, Path:/clusters/2/healthStatusBar, Status:200
2019-06-24 23:42:29,395 INFO scm-web-120167:com.cloudera.enterprise.JavaMelodyFacade: Entering HTTP Operation: Method:POST, Path:/clusters/2/statusContent
2019-06-24 23:42:29,422 INFO scm-web-120167:com.cloudera.enterprise.JavaMelodyFacade: Exiting HTTP Operation: Method:POST, Path:/clusters/2/statusContent, Status:200
2019-06-24 23:42:30,084 INFO scm-web-118667:com.cloudera.enterprise.JavaMelodyFacade: Entering HTTP Operation: Method:POST, Path:/dbTestConn/checkConnectionResult
2019-06-24 23:42:30,091 INFO scm-web-118667:com.cloudera.enterprise.JavaMelodyFacade: Exiting HTTP Operation: Method:POST, Path:/dbTestConn/checkConnectionResult, Status:200
2019-06-24 23:42:32,124 INFO scm-web-120167:com.cloudera.enterprise.JavaMelodyFacade: Entering HTTP Operation: Method:POST, Path:/dbTestConn/checkConnectionResult
2019-06-24 23:42:32,131 INFO scm-web-120167:com.cloudera.enterprise.JavaMelodyFacade: Exiting HTTP Operation: Method:POST, Path:/dbTestConn/checkConnectionResult, Status:200
2019-06-24 23:42:33,205 INFO CommandPusher:com.cloudera.cmf.service.AbstractOneOffHostCommand: Unsuccessful 'HueTestDatabaseConnection'
2019-06-24 23:42:33,207 INFO CommandPusher:com.cloudera.cmf.service.AbstractDbConnectionTestCommand: Command exited with code: 1
2019-06-24 23:42:33,207 INFO CommandPusher:com.cloudera.cmf.service.AbstractDbConnectionTestCommand: [24/Jun/2019 23:42:28 +0000] 30846 MainThread redactor     INFO     Started launcher: /opt/cloudera/cm-agent/service/hue/hue.sh is_db_alive
[24/Jun/2019 23:42:28 +0000] 30846 MainThread redactor     ERROR    Redaction rules file doesn't exist, not redacting logs. file: redaction-rules.json, directory: /run/cloudera-scm-agent/process/247-HUE-test-db-connection
[24/Jun/2019 23:42:28 +0000] 30846 MainThread redactor     INFO     Re-exec watcher: /opt/cloudera/cm-agent/bin/cm proc_watcher 30867
+ date
+ date
Mon Jun 24 23:42:30 EDT 2019
++ dirname /opt/cloudera/cm-agent/service/hue/hue.sh
+ cloudera_config=/opt/cloudera/cm-agent/service/hue
++ cd /opt/cloudera/cm-agent/service/hue/../common
++ pwd
+ cloudera_config=/opt/cloudera/cm-agent/service/common
+ . /opt/cloudera/cm-agent/service/common/cloudera-config.sh
++ : /opt/cloudera/cm
++ export CLOUDERA_DIR
++ set -x
+ source_parcel_environment
+ '[‘ ‘!’ -z ” ‘]'
+ locate_cdh_java_home
+ '[‘ -z ” ‘]'
+ '[‘ -z /usr/libexec/bigtop-utils ‘]'
+ local BIGTOP_DETECT_JAVAHOME=
+ for candidate in '"${JSVC_HOME}"' '"${JSVC_HOME}/.."' '"/usr/lib/bigtop-utils"' '"/usr/libexec"'
+ '[‘ -e /usr/libexec/bigtop-utils/bigtop-detect-javahome ‘]'
+ for candidate in '"${JSVC_HOME}"' '"${JSVC_HOME}/.."' '"/usr/lib/bigtop-utils"' '"/usr/libexec"'
+ '[‘ -e /usr/libexec/bigtop-utils/../bigtop-detect-javahome ‘]'
+ for candidate in '"${JSVC_HOME}"' '"${JSVC_HOME}/.."' '"/usr/lib/bigtop-utils"' '"/usr/libexec"'
+ '[‘ -e /usr/lib/bigtop-utils/bigtop-detect-javahome ‘]'
+ for candidate in '"${JSVC_HOME}"' '"${JSVC_HOME}/.."' '"/usr/lib/bigtop-utils"' '"/usr/libexec"'
+ '[‘ -e /usr/libexec/bigtop-detect-javahome ‘]'
+ '[‘ -z ” ‘]'
+ echo 'Cannot find CDH'\''s bigtop-detect-javahome.'
+ exit 1

2019-06-24 23:42:33,207 ERROR CommandPusher:com.cloudera.cmf.model.DbCommand: Command 742(HueTestDatabaseConnection) has completed. finalstate:FINISHED, success:false, msg:Unexpected error. Unable to verify database connection.
2019-06-24 23:42:34,168 INFO scm-web-119807:com.cloudera.enterprise.JavaMelodyFacade: Entering HTTP Operation: Method:POST, Path:/dbTestConn/checkConnectionResult
2019-06-24 23:42:34,175 INFO scm-web-119807:com.cloudera.enterprise.JavaMelodyFacade: Exiting HTTP Operation: Method:POST, Path:/dbTestConn/checkConnectionResult, Status:200
2019-06-24 23:42:36,228 INFO scm-web-120167:com.cloudera.enterprise.JavaMelodyFacade: Entering HTTP Operation: Method:POST, Path:/dbTestConn/checkConnectionResult
2019-06-24 23:42:36,234 INFO scm-web-120167:com.cloudera.enterprise.JavaMelodyFacade: Exiting HTTP Operation: Method:POST, Path:/dbTestConn/checkConnectionResult, Status:200
2019-06-24 23:42:37,481 INFO CommandPusher:com.cloudera.cmf.service.AbstractOneOffHostCommand: Successful 'OozieTestDatabaseConnection'
2019-06-24 23:42:37,483 INFO CommandPusher:com.cloudera.cmf.service.AbstractDbConnectionTestCommand: Command exited with code: 0
2019-06-24 23:42:37,483 INFO CommandPusher:com.cloudera.cmf.service.AbstractDbConnectionTestCommand: + local MISCJAVA_HOME_CANDIDATES
+ case ${BIGTOP_JAVA_MAJOR} in
+ JAVA_HOME_CANDIDATES=(${JAVA8_HOME_CANDIDATES[@]} ${MISCJAVA_HOME_CANDIDATES[@]} ${OPENJAVA8_HOME_CANDIDATES[@]} ${JAVA11_HOME_CANDIDATES[@]} ${OPENJAVA11_HOME_CANDIDATES[@]})
+ '[‘ -z ” ‘]'
+ for candidate_regex in '${JAVA_HOME_CANDIDATES[@]}'
++ ls -rvd /usr/java/jdk1.8.0_181-cloudera
+ for candidate in '`ls -rvd ${candidate_regex}* 2>/dev/null`'
+ '[‘ -e /usr/java/jdk1.8.0_181-cloudera/bin/java ‘]'
+ export JAVA_HOME=/usr/java/jdk1.8.0_181-cloudera
+ JAVA_HOME=/usr/java/jdk1.8.0_181-cloudera
+ break 2
+ verify_java_home
+ '[‘ -z /usr/java/jdk1.8.0_181-cloudera ‘]'
+ echo JAVA_HOME=/usr/java/jdk1.8.0_181-cloudera
+ source_parcel_environment
+ '[‘ ‘!’ -z ” ‘]'
+ JAVA=/usr/java/jdk1.8.0_181-cloudera/bin/java
+ [[ -z /opt/cloudera/cm ]]
+ MGMT_CLASSPATH='/opt/cloudera/cm/lib/*'
+ JDBC_JARS=/usr/share/java/mysql-connector-java.jar:/opt/cloudera/cm/lib/postgresql-42.1.4.jre7.jar:/usr/share/java/oracle-connector-java.jar
+ MGMT_CLASSPATH='/usr/share/java/mysql-connector-java.jar:/opt/cloudera/cm/lib/postgresql-42.1.4.jre7.jar:/usr/share/java/oracle-connector-java.jar:/opt/cloudera/cm/lib/*'
++ pwd
+ MGMT_CLASSPATH='/run/cloudera-scm-agent/process/246-OOZIE.OOZIE_SERVER-test-db-connection:/usr/share/java/mysql-connector-java.jar:/opt/cloudera/cm/lib/postgresql-42.1.4.jre7.jar:/usr/share/java/oracle-connector-java.jar:/opt/cloudera/cm/lib/*'
+ MGMT_JAVA_OPTS='-Djava.net.preferIPv4Stack=true '
+ exec /usr/java/jdk1.8.0_181-cloudera/bin/java -Djava.net.preferIPv4Stack=true -Djava.security.egd=file:///dev/urandom -cp '/run/cloudera-scm-agent/process/246-OOZIE.OOZIE_SERVER-test-db-connection:/usr/share/java/mysql-connector-java.jar:/opt/cloudera/cm/lib/postgresql-42.1.4.jre7.jar:/usr/share/java/oracle-connector-java.jar:/opt/cloudera/cm/lib/*' com.cloudera.enterprise.dbutil.DbCommandExecutor db.properties

2019-06-24 23:42:37,483 INFO CommandPusher:com.cloudera.cmf.model.DbCommand: Command 741(OozieTestDatabaseConnection) has completed. finalstate:FINISHED, success:true, msg:Successful
2019-06-24 23:42:37,497 INFO CommandPusher:com.cloudera.cmf.service.AbstractOneOffHostCommand: Successful 'HiveTestDatabaseConnection'
2019-06-24 23:42:37,498 INFO CommandPusher:com.cloudera.cmf.service.AbstractDbConnectionTestCommand: Command exited with code: 0
2019-06-24 23:42:37,498 INFO CommandPusher:com.cloudera.cmf.service.AbstractDbConnectionTestCommand: + MISCJAVA_HOME_CANDIDATES=('/Library/Java/Home' '/usr/java/default' '/usr/lib/jvm/default-java' '/usr/lib/jvm/java-openjdk' '/usr/lib/jvm/jre-openjdk')
+ local MISCJAVA_HOME_CANDIDATES
+ case ${BIGTOP_JAVA_MAJOR} in
+ JAVA_HOME_CANDIDATES=(${JAVA8_HOME_CANDIDATES[@]} ${MISCJAVA_HOME_CANDIDATES[@]} ${OPENJAVA8_HOME_CANDIDATES[@]} ${JAVA11_HOME_CANDIDATES[@]} ${OPENJAVA11_HOME_CANDIDATES[@]})
+ '[‘ -z ” ‘]'
+ for candidate_regex in '${JAVA_HOME_CANDIDATES[@]}'
++ ls -rvd /usr/java/jdk1.8.0_181-cloudera
+ for candidate in '`ls -rvd ${candidate_regex}* 2>/dev/null`'
+ '[‘ -e /usr/java/jdk1.8.0_181-cloudera/bin/java ‘]'
+ export JAVA_HOME=/usr/java/jdk1.8.0_181-cloudera
+ JAVA_HOME=/usr/java/jdk1.8.0_181-cloudera
+ break 2
+ verify_java_home
+ '[‘ -z /usr/java/jdk1.8.0_181-cloudera ‘]'
+ echo JAVA_HOME=/usr/java/jdk1.8.0_181-cloudera
+ source_parcel_environment
+ '[‘ ‘!’ -z ” ‘]'
+ JAVA=/usr/java/jdk1.8.0_181-cloudera/bin/java
+ [[ -z /opt/cloudera/cm ]]
+ MGMT_CLASSPATH='/opt/cloudera/cm/lib/*'
+ JDBC_JARS=/usr/share/java/mysql-connector-java.jar:/opt/cloudera/cm/lib/postgresql-42.1.4.jre7.jar:/usr/share/java/oracle-connector-java.jar
+ MGMT_CLASSPATH='/usr/share/java/mysql-connector-java.jar:/opt/cloudera/cm/lib/postgresql-42.1.4.jre7.jar:/usr/share/java/oracle-connector-java.jar:/opt/cloudera/cm/lib/*'
++ pwd
+ MGMT_CLASSPATH='/run/cloudera-scm-agent/process/245-HIVE-test-db-connection:/usr/share/java/mysql-connector-java.jar:/opt/cloudera/cm/lib/postgresql-42.1.4.jre7.jar:/usr/share/java/oracle-connector-java.jar:/opt/cloudera/cm/lib/*'
+ MGMT_JAVA_OPTS='-Djava.net.preferIPv4Stack=true '
+ exec /usr/java/jdk1.8.0_181-cloudera/bin/java -Djava.net.preferIPv4Stack=true -Djava.security.egd=file:///dev/urandom -cp '/run/cloudera-scm-agent/process/245-HIVE-test-db-connection:/usr/share/java/mysql-connector-java.jar:/opt/cloudera/cm/lib/postgresql-42.1.4.jre7.jar:/usr/share/java/oracle-connector-java.jar:/opt/cloudera/cm/lib/*' com.cloudera.enterprise.dbutil.DbCommandExecutor db.properties

2019-06-24 23:42:37,498 INFO CommandPusher:com.cloudera.cmf.model.DbCommand: Command 740(HiveTestDatabaseConnection) has completed. finalstate:FINISHED, success:true, msg:Successful
2019-06-24 23:42:38,336 INFO scm-web-118667:com.cloudera.enterprise.JavaMelodyFacade: Entering HTTP Operation: Method:POST, Path:/dbTestConn/checkConnectionResult
2019-06-24 23:42:38,351 INFO scm-web-118667:com.cloudera.enterprise.JavaMelodyFacade: Exiting HTTP Operation: Method:POST, Path:/dbTestConn/checkConnectionResult, Status:200
2019-06-24 23:42:40,393 INFO scm-web-120237:com.cloudera.enterprise.JavaMelodyFacade: Entering HTTP Operation: Method:POST, Path:/clusters/2/healthStatusBar
2019-06-24 23:42:40,418 INFO scm-web-119778:com.cloudera.enterprise.JavaMelodyFacade: Entering HTTP Operation: Method:POST, Path:/clusters/2/statusContent
2019-06-24 23:42:40,421 INFO scm-web-120237:com.cloudera.enterprise.JavaMelodyFacade: Exiting HTTP Operation: Method:POST, Path:/clusters/2/healthStatusBar, Status:200
2019-06-24 23:42:40,435 INFO scm-web-119778:com.cloudera.enterprise.JavaMelodyFacade: Exiting HTTP Operation: Method:POST, Path:/clusters/2/statusContent, Status:200
^C
[root@cm-r01nn01 ~]#

Verify you have the right bigtop packages installed:

[root@cm-r01nn01 cloudera-scm-agent]#
[root@cm-r01nn01 cloudera-scm-agent]#
[root@cm-r01nn01 cloudera-scm-agent]# ls -altri /usr/libexec/bigtop-detect-javahome
ls: cannot access /usr/libexec/bigtop-detect-javahome: No such file or directory
[root@cm-r01nn01 cloudera-scm-agent]# find / -iname bigtop-detect-javahome*
[root@cm-r01nn01 cloudera-scm-agent]# ssh cm-r01nn02
Password:
Last login: Tue Jun 25 23:21:59 2019 from cm-r01nn01.mws.mds.xyz
[root@cm-r01nn02 ~]#
[root@cm-r01nn02 ~]#
[root@cm-r01nn02 ~]#
[root@cm-r01nn02 ~]# find / -iname bigtop-detect-javahome*
/usr/bin/bigtop-detect-javahome
/usr/lib/bigtop-utils/bigtop-detect-javahome
[root@cm-r01nn02 ~]# rpm -qf /usr/bin/bigtop-detect-javahome
rpm -qf bigtop-utils-0.7.0+cdh6.2.0-967373.el7.noarch
[root@cm-r01nn02 ~]# rpm -qf /usr/lib/bigtop-utils/bigtop-detect-javahome
bigtop-utils-0.7.0+cdh6.2.0-967373.el7.noarch
[root@cm-r01nn02 ~]#

 

Thx,
TK

ERROR: cannot execute CREATE TABLE in a read-only transaction

If you're getting the following on your patroni cluster:

Jun 24 23:30:50 psql02 etcd: failed to send out heartbeat on time (exceeded the 250ms timeout for 1.229927109s)
Jun 24 23:30:50 psql02 etcd: server is likely overloaded
Jun 24 23:30:50 psql02 etcd: failed to send out heartbeat on time (exceeded the 250ms timeout for 1.22987947s)
Jun 24 23:30:50 psql02 etcd: server is likely overloaded

ERROR:  cannot execute CREATE TABLE in a read-only transaction

Then you likely commented out these:

[root@psql03 ~]# cat /etc/haproxy/haproxy.cfg
global
    log         127.0.0.1 local0
    stats       socket /var/run/haproxy.sock mode 0600 level admin
    user        haproxy
    group       haproxy
    daemon
    debug
    maxconn 1024

defaults
    mode tcp
    log global
    option                  dontlognull
    option                  redispatch
    retries 2
    timeout client 30m
    timeout connect 4s
    timeout server 30m
    timeout check 5s

frontend psql-in
    mode tcp
    bind psql-c01:5432
    option tcplog
    default_backend             psql-back


backend psql-back
    mode        tcp
    option      tcplog
#    option     httpchk
#    http-check         expect status 200

    default-server inter 3s fall 3 rise 2 on-marked-down shutdown-se
    server      psql01.nix.mds.xyz    psql01.nix.mds.xyz:5432 maxconn 1024 check port 8008
    server      psql02.nix.mds.xyz    psql02.nix.mds.xyz:5432 maxconn 1024 check port 8008
    server      psql03.nix.mds.xyz    psql03.nix.mds.xyz:5432 maxconn 1024 check port 8008

listen stats
    bind :9000
    mode http
    stats enable
    stats hide-version
    stats realm Haproxy\ Statistics
    stats uri /haproxy-stats
    stats auth admin:nopassword
[root@psql03 ~]#

Uncomment the two lines above to resolve the issue.

Thx,
TK

 

User cannot run DDL statements on the specified database. Attempt to create and drop a table failed.

In case you receive the following:


cloudera-scm-server.log:2019-06-21 01:16:57,763 ERROR CommandPusher:com.cloudera.cmf.model.DbCommand: Command 609(HiveTestDatabaseConnection) has completedFINISHED, success:false, msg:User cannot run DDL statements on the specified database. Attempt to create and drop a table failed.

check the logs or test creating a temporary table:

cloudera-scm-server.log:Caused by: org.postgresql.util.PSQLException: ERROR: cannot execute UPDATE in a read-only transaction
cloudera-scm-server.log:Caused by: java.sql.BatchUpdateException: Batch entry 0 update COMMANDS_DETAIL set OPTIMISTIC_LOCK_VERSION=4, INTERNAL_STATE=? where COMMAND_ID=618 and OPTIMISTIC_LOCK_VERSION=3 was aborted: ERROR: cannot execute UPDATE in a read-only transaction  Call getNextException to see other errors in the batch.
cloudera-scm-server.log:Caused by: org.postgresql.util.PSQLException: ERROR: cannot execute UPDATE in a read-only transaction
cloudera-scm-server.log:Caused by: java.sql.BatchUpdateException: Batch entry 0 update COMMANDS_DETAIL set OPTIMISTIC_LOCK_VERSION=6, INTERNAL_STATE=? where COMMAND_ID=618 and OPTIMISTIC_LOCK_VERSION=5 was aborted: ERROR: cannot execute UPDATE in a read-only transaction  Call getNextException to see other errors in the batch.
cloudera-scm-server.log:Caused by: org.postgresql.util.PSQLException: ERROR: cannot execute UPDATE in a read-only transaction
[root@cm-r01nn01 cloudera-scm-server]#

Try to test create a table:

hue_mws01=>
hue_mws01=> CREATE TABLE account(
   user_id serial PRIMARY KEY,
   username VARCHAR (50) UNIQUE NOT NULL,
   password VARCHAR (50) NOT NULL,
   email VARCHAR (355) UNIQUE NOT NULL,
   created_on TIMESTAMP NOT NULL,
   last_login TIMESTAMP
);
ERROR:  cannot execute CREATE TABLE in a read-only transaction
hue_mws01=>

In our case the above was resolved by restarting haproxy, keepalived and Patroni on psql02 (Node 2) of our PostgreSQL cluster.  We then tested and received:

[root@psql01 ~]# sudo su – postgres
Last login: Sat Jun 22 00:16:39 EDT 2019 on pts/0
-bash-4.2$ psql -h psql-c01.nix.mds.xyz -p 5432 -W -U hue_mws01 -d hue_mws01
Password for user hue_mws01:
psql (10.5)
Type "help" for help.

hue_mws01=> CREATE TABLE account(
   user_id serial PRIMARY KEY,
   username VARCHAR (50) UNIQUE NOT NULL,
   password VARCHAR (50) NOT NULL,
   email VARCHAR (355) UNIQUE NOT NULL,
   created_on TIMESTAMP NOT NULL,
   last_login TIMESTAMP
);
CREATE TABLE
hue_mws01=>

Tell us if this solved your issue.

Cheers,
TK

 

Cannot read XML: (41) Specification mandate value for attribute data-pjax-transient [Line: 38 | Column: 40].

When you get this:

Cannot read XML: (41) Specification mandate value for attribute data-pjax-transient [Line: 38 | Column: 40].

ensure you're download the xml or any file from github using the clone or download a zip option, instead of Save As / Save Link As ….

Thx,
TK

[ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out) at gcomm/src/pc.cpp:connect():158

Another issue we can run into is the following set of messages:

2019-06-08T04:56:24.518538Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
         at gcomm/src/pc.cpp:connect():158
2019-06-08T04:56:24.518591Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():209: Failed to open backend connection: -110 (Connection timed out)
2019-06-08T04:56:24.518764Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1458: Failed to open channel 'galera_cluster1' at 'gcomm://192.168.0.126,192.168.0.107,192.168.0.114': -110 (Connection timed out)
2019-06-08T04:56:24.518793Z 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2019-06-08T04:56:24.518812Z 0 [ERROR] WSREP: wsrep::connect(gcomm://192.168.0.126,192.168.0.107,192.168.0.114) failed: 7
2019-06-08T04:56:24.518835Z 0 [ERROR] Aborting

To solve this, take a look at your /etc/my.cnf file. The following three fields have to match the server that you are on:

wsrep_node_address="192.168.0.126"
wsrep_node_name="mysql01"
server_id=1
bind-address=192.168.0.126

If they don't, the above error is thrown.  server_id must be unique for each node.  This can happen when you're restoring your database from one node to another or trying recovery steps where you copy the /etc/my.cnf over to another host. 

 

Another solution involves copying the data directory from the master or most current host to the target host we want as a new master.  We do this because we wanted to test when the master is offline while working only with 2/3 nodes.  Here is our situation:

mysql01 GOOD
mysql02 BAD
mysql03 BAD

Copy the data dir from mysql01 to mysql02:

mysql02 # cd /var/lib/mysql; scp -rp mysql01:/var/lib/mysql/  .

Set the safe_to_bootstrap flag to 1:

cat grastate.dat
# GALERA saved state
version: 2.1
uuid:    f25fc12b-8a0b-11e9-b58d-bfb801e3b36d
seqno:   -1
safe_to_bootstrap: 1

Bootstrap this node:

mysql02 # /usr/bin/mysqld_bootstrap

On the third node, mysql03, remove all files from the /var/lib/mysql folder because we'll let it sync up from mysql02:

mysql03 # cd /var/lib/mysql; rm -rf *

Start mysql on mysql03 so it sync's from mysql02:

mysql03 # systemctl start mysqld

Let it sync.  You should have an accessible 2/3 node cluster at this point.

Thx,
TK

PostgreSQL Pull Backup

Let's setup some PostgreSQL backups.  

In this post, we'll set up a pull methodology in order to get backups of our PostgreSQL Cluster .  The backups will be saved remotely so any failures in our cluster will be independent of our PostgreSQL backups.

First, let's prepare the postgres account so we can login remotely without a pass.  This will allow us to run the pg_dump command remotely.  So generate a pair of keys in whatever account of your choice and exchange the keys with the remote hosts like this:

[root@mbpc-pc .ssh]# cat id_rsa.pub
ssh-rsa <SECRET KEY TEXT>
[root@mbpc-pc .ssh]#
[root@mbpc-pc .ssh]#
[root@mbpc-pc .ssh]# ssh postgres@psql01
FIPS integrity verification test failed.
Last login: Sun Jun  2 18:46:53 2019
-bash-4.2$ cat .ssh/authorized_keys
ssh-rsa <SECRET KEY TEXT>
-bash-4.2$ logout
Connection to psql01 closed.
[root@mbpc-pc .ssh]# ssh postgres@psql02
FIPS integrity verification test failed.
Last login: Sun Jun  2 17:15:30 2019 from mbpc-pc.nix.mds.xyz
-bash-4.2$ logout
Connection to psql02 closed.
[root@mbpc-pc .ssh]# ssh postgres@psql03
FIPS integrity verification test failed.
Last login: Sun Jun  2 18:30:51 2019 from psql01.nix.mds.xyz
-bash-4.2$

Once you have the keys exchanged, you'll need a .pgpass within the root folder of the postgres account on each of the above cluster hosts:

-bash-4.2$ cat .pgpass
psql-c01.nix.mds.xyz:5432:*:postgres:<SECRET>
psql01.nix.mds.xyz:5432:*:postgres:<SECRET>
psql02.nix.mds.xyz:5432:*:postgres:<SECRET>
psql03.nix.mds.xyz:5432:*:postgres:<SECRET>
-bash-4.2$

Next, we'll write a short script to login to a postgres node and take a backup saving the gzip file locally on the calling server:

[root@mbpc-pc .ssh]# cat /mnt/postgres-backup.sh
#!/bin/bash

PSQLH="";
PSQLR="";

# For a reason I've yet to investigate, cluster IP doesn't work here.  So determening the node with a running instance the real shitty way.
for KEY in $( echo psql-c01 psql01 psql02 psql03 ); do
        PSQLH=$( ssh postgres@$KEY "hostname" 2>/dev/null );
        PSQLR=$( ssh postgres@$KEY "ps -ef|grep -Ei \"pgsql-10.*postgres\"|grep -v grep" 2>/dev/null);
        [[ PSQLR != “” ]] && {
                echo $PSQLH"|"$PSQLR; break;
        };
done

[[ PSQLH == “” ]] && {
        echo "ERROR: PSQLH var was empty.  Should be a hostname.";
        exit 0;
};

ssh postgres@psql-c01.nix.mds.xyz "pg_dumpall -U postgres -h $PSQLH -p 5432 | gzip -vc" > ./psql-c01.sql.$(date +%s).gz && find /mnt/SomeBigDisk/psql-backup/ -type f -name '*.sql.*.gz' -mtime +180 -exec rm {} \;
[root@mbpc-pc .ssh]#

 

Notice the find line in the above.  It will clear out any old backups after 180 saves.

Schedule this via cron or any other scheduling software you prefer:

[root@mbpc-pc .ssh]# crontab -l|grep postgres
30 3 * * * /mnt/postgres-backup.sh
[root@mbpc-pc .ssh]#

The beauty of this method is that pg_dump will always be the correct version to match the PostgreSQL software running there and this will leave no intermediate files.  You won't need to ensure the pg_dump always matches your DB Cluster.  

Thx,
TK

PANIC:  replication checkpoint has wrong magic 0 instead of 307747550

So we run into a little problem getting out PostgreSQL Patroni w/ ETCD cluster going after a rather serious failure. 

# sudo su – postgres

$ /usr/pgsql-10/bin/postgres -D /data/patroni –config-file=/data/patroni/postgresql.conf –listen_addresses=192.168.0.118 –max_worker_processes=8 –max_locks_per_transaction=64 –wal_level=replica –track_commit_timestamp=off –max_prepared_transactions=0 –port=5432 –max_replication_slots=10 –max_connections=100 –hot_standby=on –cluster_name=postgres –wal_log_hints=on –max_wal_senders=10 -d 5

This resulted in one of the 3 messages above.  Hence the post here.  If I can start a single instance, I should be fine since I could then 1) replicate over to the other two or 2) simply take a dump, reinitialize all the databases then restore the dump.  

Using the above procedure I get one of three error messages when using the data files of each node:

[ PSQL01 ]
postgres: postgres: startup process waiting for 000000010000000000000008

[ PSQL02 ]
PANIC:replicationcheckpointhas wrong magic 0 instead of  307747550

[ PSQL03 ]
FATAL:syntax error inhistory file:f2W 

 

Unfortunately, we couldn't do anything about PSQL03 and PSQL02, the standby's, since the database base/ folder was way out of sync, meaning, there was no tables there:

[ PSQL03 ]

[root@psql03 base]# ls -altri
total 40
    42424 drwx——.  2 postgres postgres 8192 Oct 29  2018 1
 67714749 drwx——.  2 postgres postgres 8192 Oct 29  2018 13805
202037206 drwx——.  5 postgres postgres   38 Oct 29  2018 .
134312175 drwx——.  2 postgres postgres 8192 May 22 01:55 13806
    89714 drwxr-xr-x. 20 root     root     4096 May 22 22:43 ..
[root@psql03 base]#

 

[ PSQL02 ]

 [root@psql02 base]# ls -altri

total 412
201426668 drwx——.  2 postgres postgres  8192 Oct 29  2018 1
   743426 drwx——.  2 postgres postgres  8192 Mar 24 03:47 13805
135326327 drwx——.  2 postgres postgres 16384 Mar 24 20:15 40970
   451699 drwx——.  2 postgres postgres 40960 Mar 25 19:47 16395
  1441696 drwx——.  2 postgres postgres  8192 Mar 31 15:09 131137
 68396137 drwx——.  2 postgres postgres  8192 Mar 31 15:09 131138
135671065 drwx——.  2 postgres postgres  8192 Mar 31 15:09 131139
204353100 drwx——.  2 postgres postgres  8192 Mar 31 15:09 131140
135326320 drwx——. 17 postgres postgres  4096 Apr 14 10:08 .
 68574415 drwx——.  2 postgres postgres 12288 Apr 28 06:06 131142
   288896 drwx——.  2 postgres postgres 16384 Apr 28 06:06 131141
203015232 drwx——.  2 postgres postgres  8192 Apr 28 06:06 131136
135326328 drwx——.  2 postgres postgres 40960 May  5 22:09 24586
 67282461 drwx——.  2 postgres postgres  8192 May  5 22:09 13806
 67640961 drwx——.  2 postgres postgres 20480 May  5 22:09 131134
203500274 drwx——.  2 postgres postgres 16384 May  5 22:09 155710
134438257 drwxr-xr-x. 20 root     root      4096 May 22 01:44 ..
[root@psql02 base]# pwd
/root/postgres-patroni-backup/base
[root@psql02 base]#

 

[ PSQL01 ]

[root@psql01 base]# ls -altri
total 148
134704615 drwx——.  2 postgres postgres  8192 Oct 29  2018 1
201547700 drwx——.  2 postgres postgres  8192 Oct 29  2018 13805
   160398 drwx——.  2 postgres postgres  8192 Feb 24 23:53 13806
 67482137 drwx——.  7 postgres postgres    62 Feb 24 23:54 .
135909671 drwx——.  2 postgres postgres 24576 Feb 24 23:54 24586
134444555 drwx——.  2 postgres postgres 24576 Feb 24 23:54 16395
 67178716 drwxr-xr-x. 20 root     root      4096 May 22 01:53 ..
[root@psql01 base]# pwd
/root/postgresql-patroni-etcd/base
[root@psql01 base]#

So we could only work with PSQL02, the original primary node.  Everyother node has nothing.  

Looks like our replorigin_checkpoint is at fault resulting in a rather nasty replication error:

open("pg_wal/000000BE000000000000004C", O_RDONLY) = 5
open("pg_wal/000000BE000000000000004C", O_RDONLY) = 5
openat(AT_FDCWD, "base", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 6
openat(AT_FDCWD, "pg_tblspc", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 6
openat(AT_FDCWD, "pg_replslot", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 6
openat(AT_FDCWD, "pg_replslot", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 6
open("pg_logical/replorigin_checkpoint", O_RDONLY) = 6
write(2, "2019-06-02 14:50:34.777 EDT [283″…, 1062019-06-02 14:50:34.777 EDT [28362] PANIC:  replication checkpoint has wrong magic 0 instead of 307747550
-bash-4.2$ cat pg_logical/replorigin_checkpoint
cat: pg_logical/replorigin_checkpoint: No such file or directory
-bash-4.2$ pwd
/data/patroni/tmp
-bash-4.2$ cd ..
-bash-4.2$ cat pg_logical/replorigin_checkpoint
øÉKíÛ0ðð bø{nð- Ðð à Ð à4Ø4-bash-4.2$ PuTTY
-bash: PuTTY: command not found
-bash-4.2$
-bash-4.2$ strings pg_logical/replorigin_checkpoint
-bash-4.2$ ls -altri pg_logical/replorigin_checkpoint
67894871 -rw——-. 1 postgres postgres 16384 Oct 29  2018 pg_logical/replorigin_checkpoint
-bash-4.2$ ls -altri pg_logical/
total 20
 67894871 -rw——-.  1 postgres postgres 16384 Oct 29  2018 replorigin_checkpoint
136946383 drwx——.  2 postgres postgres     6 Oct 29  2018 snapshots
204367784 drwx——.  2 postgres postgres     6 Oct 29  2018 mappings
 67894870 drwx——.  4 postgres postgres    65 Apr 28 06:06 .
135326272 drwx——. 21 postgres postgres  4096 Jun  2 14:50 ..
-bash-4.2$


So let's copy a good one from another host (I guess we could delete it but I haven't tried):


[root@psql03 pg_logical]#
[root@psql03 pg_logical]# ls -altri
total 8
 68994432 drwx——.  2 postgres postgres    6 Oct 29  2018 snapshots
134984156 drwx——.  2 postgres postgres    6 Oct 29  2018 mappings
   566745 -rw——-.  1 postgres postgres    8 May 22 01:55 replorigin_checkpoint
   566731 drwx——.  4 postgres postgres   65 May 22 01:55 .
    89714 drwxr-xr-x. 20 root     root     4096 May 22 22:43 ..
[root@psql03 pg_logical]#
[root@psql03 pg_logical]#
[root@psql03 pg_logical]#
[root@psql03 pg_logical]# scp replorigin_checkpoint psql02:/data/patroni/pg_logical/
Password:
replorigin_checkpoint                                                                                 100%    8    10.1KB/s   00:00
[root@psql03 pg_logical]#
[root@psql03 pg_logical]#


Now we can get to the backend in standalone mode:


-bash-4.2$
-bash-4.2$ /usr/pgsql-10/bin/postgres –single -D /data/patroni –config-file=/data/patroni/postgresql.conf –hot_standby=off –listen_addresses=192.168.0.124 –max_worker_processes=8 –max_locks_per_transaction=64 –wal_level=replica –cluster_name=postgres –wal_log_hints=on –max_wal_senders=10 –track_commit_timestamp=off –max_prepared_transactions=0 –port=5432 –max_replication_slots=10 –max_connections=20 -d 5 2>&1
2019-06-02 15:00:48.981 EDT [29057] DEBUG:  invoking IpcMemoryCreate(size=144687104)
2019-06-02 15:00:48.982 EDT [29057] DEBUG:  mmap(144703488) with MAP_HUGETLB failed, huge pages disabled: Cannot allocate memory
2019-06-02 15:00:48.993 EDT [29057] DEBUG:  SlruScanDirectory invoking callback on pg_notify/0000
2019-06-02 15:00:48.993 EDT [29057] DEBUG:  removing file "pg_notify/0000"
2019-06-02 15:00:48.993 EDT [29057] DEBUG:  dynamic shared memory system will support 128 segments
2019-06-02 15:00:48.994 EDT [29057] DEBUG:  created dynamic shared memory control segment 1025202362 (3088 bytes)
2019-06-02 15:00:48.994 EDT [29057] DEBUG:  InitPostgres
2019-06-02 15:00:48.994 EDT [29057] DEBUG:  my backend ID is 1
2019-06-02 15:00:48.994 EDT [29057] LOG:  database system was interrupted; last known up at 2019-04-28 06:06:24 EDT
2019-06-02 15:00:49.265 EDT [29057] LOG:  invalid record length at 0/4C35CDF8: wanted 24, got 0
2019-06-02 15:00:49.266 EDT [29057] LOG:  invalid primary checkpoint record
2019-06-02 15:00:49.266 EDT [29057] LOG:  using previous checkpoint record at 0/4C34EDA8
2019-06-02 15:00:49.266 EDT [29057] DEBUG:  redo record is at 0/4C34ED70; shutdown FALSE
2019-06-02 15:00:49.266 EDT [29057] DEBUG:  next transaction ID: 0:1409831; next OID: 237578
2019-06-02 15:00:49.266 EDT [29057] DEBUG:  next MultiXactId: 48; next MultiXactOffset: 174
2019-06-02 15:00:49.266 EDT [29057] DEBUG:  oldest unfrozen transaction ID: 549, in database 1
2019-06-02 15:00:49.266 EDT [29057] DEBUG:  oldest MultiXactId: 1, in database 1
2019-06-02 15:00:49.266 EDT [29057] DEBUG:  commit timestamp Xid oldest/newest: 0/0
2019-06-02 15:00:49.266 EDT [29057] DEBUG:  transaction ID wrap limit is 2147484196, limited by database with OID 1
2019-06-02 15:00:49.266 EDT [29057] DEBUG:  MultiXactId wrap limit is 2147483648, limited by database with OID 1
2019-06-02 15:00:49.266 EDT [29057] DEBUG:  starting up replication slots
2019-06-02 15:00:49.266 EDT [29057] DEBUG:  starting up replication origin progress state
2019-06-02 15:00:49.266 EDT [29057] LOG:  database system was not properly shut down; automatic recovery in progress
2019-06-02 15:00:49.267 EDT [29057] DEBUG:  resetting unlogged relations: cleanup 1 init 0
2019-06-02 15:00:49.269 EDT [29057] LOG:  redo starts at 0/4C34ED70
2019-06-02 15:00:49.273 EDT [29057] DEBUG:  attempting to remove WAL segments newer than log file 000000BE000000000000004C
2019-06-02 15:00:49.273 EDT [29057] LOG:  invalid record length at 0/4C35CDC0: wanted 24, got 0
2019-06-02 15:00:49.273 EDT [29057] LOG:  redo done at 0/4C35CD90
2019-06-02 15:00:49.273 EDT [29057] LOG:  last completed transaction was at log time 2019-04-28 06:05:44.017446-04
2019-06-02 15:00:49.273 EDT [29057] DEBUG:  resetting unlogged relations: cleanup 0 init 1
2019-06-02 15:00:49.280 EDT [29057] DEBUG:  performing replication slot checkpoint
2019-06-02 15:00:49.288 EDT [29057] DEBUG:  attempting to remove WAL segments older than log file 000000000000000000000043
2019-06-02 15:00:49.289 EDT [29057] DEBUG:  MultiXactId wrap limit is 2147483648, limited by database with OID 1
2019-06-02 15:00:49.290 EDT [29057] DEBUG:  oldest MultiXactId member is at offset 1
2019-06-02 15:00:49.290 EDT [29057] DEBUG:  MultiXact member stop limit is now 4294914944 based on MultiXact 1
2019-06-02 15:00:49.292 EDT [29057] DEBUG:  StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0
2019-06-02 15:00:49.302 EDT [29057] DEBUG:  CommitTransaction(1) name: unnamed; blockState: STARTED; state: INPROGR, xid/subid/cid: 0/1/0

PostgreSQL stand-alone backend 10.5
backend>


But we choose not to use the backend capabilities at this time.  We'll start the database as Patroni would, using the following command:


-bash-4.2$ /usr/pgsql-10/bin/postgres -D /data/patroni –config-file=/data/patroni/postgresql.conf –hot_standby=off –listen_addresses=192.168.0.124 –max_worker_processes=8 –max_locks_per_transaction=64 –wal_level=replica –cluster_name=postgres –wal_log_hints=on –max_wal_senders=10 –track_commit_timestamp=off –max_prepared_transactions=0 –port=5432 –max_replication_slots=10 –max_connections=20 -d 5 2>&1
2019-06-02 15:11:55.379 EDT [29789] DEBUG:  postgres: PostmasterMain: initial environment dump:
2019-06-02 15:11:55.380 EDT [29789] DEBUG:  —————————————–
2019-06-02 15:11:55.380 EDT [29789] DEBUG:      XDG_SESSION_ID=171
2019-06-02 15:11:55.380 EDT [29789] DEBUG:      HOSTNAME=psql02.nix.mds.xyz
2019-06-02 15:11:55.380 EDT [29789] DEBUG:      SHELL=/bin/bash
2019-06-02 15:11:55.380 EDT [29789] DEBUG:      TERM=xterm
2019-06-02 15:11:55.380 EDT [29789] DEBUG:      HISTSIZE=1000
2019-06-02 15:11:55.380 EDT [29789] DEBUG:      USER=postgres
2019-06-02 15:11:55.380 EDT [29789] DEBUG:      LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36:
2019-06-02 15:11:55.380 EDT [29789] DEBUG:      MAIL=/var/spool/mail/postgres
2019-06-02 15:11:55.380 EDT [29789] DEBUG:      PATH=/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/pgsql-10/bin/
2019-06-02 15:11:55.380 EDT [29789] DEBUG:      PWD=/data/patroni/pg_logical
2019-06-02 15:11:55.380 EDT [29789] DEBUG:      LANG=en_US.UTF-8
2019-06-02 15:11:55.380 EDT [29789] DEBUG:      HISTCONTROL=ignoredups
2019-06-02 15:11:55.380 EDT [29789] DEBUG:      SHLVL=1
2019-06-02 15:11:55.380 EDT [29789] DEBUG:      HOME=/var/lib/pgsql
2019-06-02 15:11:55.380 EDT [29789] DEBUG:      LOGNAME=postgres
2019-06-02 15:11:55.380 EDT [29789] DEBUG:      PGDATA=/data/patroni
2019-06-02 15:11:55.380 EDT [29789] DEBUG:      LESSOPEN=||/usr/bin/lesspipe.sh %s
2019-06-02 15:11:55.380 EDT [29789] DEBUG:      _=/usr/pgsql-10/bin/postgres
2019-06-02 15:11:55.380 EDT [29789] DEBUG:      OLDPWD=/data/patroni
2019-06-02 15:11:55.380 EDT [29789] DEBUG:      PGLOCALEDIR=/usr/pgsql-10/share/locale
2019-06-02 15:11:55.380 EDT [29789] DEBUG:      PGSYSCONFDIR=/etc/sysconfig/pgsql
2019-06-02 15:11:55.380 EDT [29789] DEBUG:      LC_COLLATE=en_US.UTF-8
2019-06-02 15:11:55.380 EDT [29789] DEBUG:      LC_CTYPE=en_US.UTF-8
2019-06-02 15:11:55.380 EDT [29789] DEBUG:      LC_MESSAGES=en_US.UTF-8
2019-06-02 15:11:55.380 EDT [29789] DEBUG:      LC_MONETARY=C
2019-06-02 15:11:55.380 EDT [29789] DEBUG:      LC_NUMERIC=C
2019-06-02 15:11:55.380 EDT [29789] DEBUG:      LC_TIME=C
2019-06-02 15:11:55.380 EDT [29789] DEBUG:  —————————————–
2019-06-02 15:11:55.383 EDT [29789] LOG:  listening on IPv4 address "192.168.0.124", port 5432
2019-06-02 15:11:55.385 EDT [29789] LOG:  listening on Unix socket "./.s.PGSQL.5432"
2019-06-02 15:11:55.386 EDT [29789] DEBUG:  invoking IpcMemoryCreate(size=144687104)
2019-06-02 15:11:55.387 EDT [29789] DEBUG:  mmap(144703488) with MAP_HUGETLB failed, huge pages disabled: Cannot allocate memory
2019-06-02 15:11:55.398 EDT [29789] DEBUG:  SlruScanDirectory invoking callback on pg_notify/0000
2019-06-02 15:11:55.398 EDT [29789] DEBUG:  removing file "pg_notify/0000"
2019-06-02 15:11:55.398 EDT [29789] DEBUG:  dynamic shared memory system will support 128 segments
2019-06-02 15:11:55.398 EDT [29789] DEBUG:  created dynamic shared memory control segment 721092148 (3088 bytes)
2019-06-02 15:11:55.401 EDT [29789] DEBUG:  max_safe_fds = 985, usable_fds = 1000, already_open = 5
2019-06-02 15:11:55.404 EDT [29789] LOG:  redirecting log output to logging collector process
2019-06-02 15:11:55.404 EDT [29789] HINT:  Future log output will appear in directory "log".

And voila!  We are in our database and can see all of our databases:

-bash-4.2$ psql -h psql02 -p 5432 -W
Password:
psql (10.5)
Type "help" for help.

postgres=# \l
                                          List of databases
      Name       |    Owner     | Encoding |   Collate   |    Ctype    |      Access privileges
—————–+————–+———-+————-+————-+—————————–
 amon_mws01      | amon_mws01   | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 awx             | awx          | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 confluence      | postgres     | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =Tc/postgres               +
                 |              |          |             |             | postgres=CTc/postgres      +
                 |              |          |             |             | confluenceuser=CTc/postgres
 hue_mws01       | hue_mws01    | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 metastore_mws01 | hive_mws01   | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 nav_mws01       | nav_mws01    | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 navms_mws01     | navms_mws01  | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 oozie_mws01     | oozie_mws01  | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 postgres        | postgres     | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 rman_mws01      | rman_mws01   | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 scm_mws01       | scm_mws01    | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 sentry_mws01    | sentry_mws01 | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 template0       | postgres     | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres                +
                 |              |          |             |             | postgres=CTc/postgres
 template1       | postgres     | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres                +
                 |              |          |             |             | postgres=CTc/postgres
 twr             | postgres     | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =Tc/postgres               +
                 |              |          |             |             | postgres=CTc/postgres      +
                 |              |          |             |             | twr=CTc/postgres
(15 rows)

postgres=#

And checking after to ensure PostgreSQL / Patroni replicated everything just fine:

[root@psql01 base]# ls -altri
total 364
 67518653 drwx——.  2 postgres postgres  8192 Jun  2 20:35 1
134766555 drwx——.  2 postgres postgres  8192 Jun  2 20:35 13805
   152733 drwx——.  2 postgres postgres 24576 Jun  2 20:35 16395
 68698149 drwx——.  2 postgres postgres 24576 Jun  2 20:35 24586
134741283 drwx——.  2 postgres postgres 16384 Jun  2 20:35 40970
202922441 drwx——.  2 postgres postgres 16384 Jun  2 20:35 131134
   871098 drwx——.  2 postgres postgres  8192 Jun  2 20:35 131136
 68026687 drwx——.  2 postgres postgres  8192 Jun  2 20:35 131137
135079123 drwx——.  2 postgres postgres  8192 Jun  2 20:35 131138
202874795 drwx——.  2 postgres postgres  8192 Jun  2 20:35 131139
   871469 drwx——.  2 postgres postgres  8192 Jun  2 20:35 131140
 68280133 drwx——.  2 postgres postgres 16384 Jun  2 20:35 131141
135080185 drwx——.  2 postgres postgres 12288 Jun  2 20:35 131142
   152732 drwx——. 17 postgres postgres  4096 Jun  2 20:35 .
202879025 drwx——.  2 postgres postgres 16384 Jun  2 20:35 155710
 67482133 drwx——. 21 postgres postgres  4096 Jun  2 20:36 ..
201711623 drwx——.  2 postgres postgres  8192 Jun  2 20:36 13806
[root@psql01 base]#
[root@psql01 base]#
[root@psql01 base]#
[root@psql01 base]# pwd
/data/patroni/base
[root@psql01 base]#

 

[root@psql02 base]# ls -altri
total 368
204367267 drwx——.  2 postgres postgres  8192 Mar 24 03:47 13805
 68669097 drwx——. 17 postgres postgres  4096 Apr 14 10:08 .
204362619 drwx——.  2 postgres postgres 16384 Jun  2 20:31 40970
134473951 drwx——.  2 postgres postgres 24576 Jun  2 20:31 24586
 68669102 drwx——.  2 postgres postgres 24576 Jun  2 20:31 16395
138812710 drwx——.  2 postgres postgres  8192 Jun  2 20:31 1
204366769 drwx——.  2 postgres postgres 12288 Jun  2 20:31 131142
136945631 drwx——.  2 postgres postgres 16384 Jun  2 20:31 131141
 67894451 drwx——.  2 postgres postgres  8192 Jun  2 20:31 131140
  1403920 drwx——.  2 postgres postgres  8192 Jun  2 20:31 131139
204366412 drwx——.  2 postgres postgres  8192 Jun  2 20:31 131138
136945273 drwx——.  2 postgres postgres  8192 Jun  2 20:31 131137
 67894080 drwx——.  2 postgres postgres  8192 Jun  2 20:31 131136
  1403182 drwx——.  2 postgres postgres 16384 Jun  2 20:31 131134
  1404278 drwx——.  2 postgres postgres 16384 Jun  2 20:31 155710
 11395780 drwx——.  2 postgres postgres  8192 Jun  2 20:31 13806
135326272 drwx——. 21 postgres postgres  4096 Jun  2 20:31 ..
[root@psql02 base]# pwd
/data/patroni/base
[root@psql02 base]#

 

[root@psql03 audit]# cd /data/patroni/base/
[root@psql03 base]# ls -altri
total 372
 67130854 drwx——.  2 postgres postgres  8192 Jun  2 20:37 1
134446297 drwx——.  2 postgres postgres  8192 Jun  2 20:37 13805
    79298 drwx——.  2 postgres postgres 24576 Jun  2 20:37 16395
 69209007 drwx——.  2 postgres postgres 24576 Jun  2 20:37 24586
135152677 drwx——.  2 postgres postgres 16384 Jun  2 20:37 40970
201954381 drwx——.  2 postgres postgres 16384 Jun  2 20:37 131134
    80500 drwx——.  2 postgres postgres  8192 Jun  2 20:37 131136
 68241705 drwx——.  2 postgres postgres  8192 Jun  2 20:37 131137
134443358 drwx——.  2 postgres postgres  8192 Jun  2 20:37 131138
201808206 drwx——.  2 postgres postgres  8192 Jun  2 20:37 131139
    80871 drwx——.  2 postgres postgres  8192 Jun  2 20:37 131140
 68242063 drwx——.  2 postgres postgres 16384 Jun  2 20:37 131141
134443716 drwx——.  2 postgres postgres 12288 Jun  2 20:37 131142
    79297 drwx——. 17 postgres postgres  4096 Jun  2 20:37 .
201828372 drwx——.  2 postgres postgres 16384 Jun  2 20:37 155710
134812989 drwx——. 21 postgres postgres  4096 Jun  2 20:38 ..
201807458 drwx——.  2 postgres postgres  8192 Jun  2 20:38 13806
[root@psql03 base]# pwd
/data/patroni/base
[root@psql03 base]#

And that should get you up and running again.  Don't forget to get some decent PostgreSQL backups.

ALTERNATE

We have not tried this but the above could potentially be resolved by pg_resetwal as well:

[root@psql03 ~]# find / -iname pg_resetwal
/usr/pgsql-10/bin/pg_resetwal
[root@psql03 ~]#

 

Thx,
TK

FATAL:  the database system is starting up

If you are receiving the following when postgresql ( w/ Patroni ) is starting up:

2019-04-04 14:59:15.715 EDT [26025] FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment 000000390000000000000008 has already been removed
2019-04-04 14:59:16.420 EDT [26029] FATAL:  the database system is starting up

consider running the individual postgres line separately in debug mode like this to reveal the true cause:

-bash-4.2$ /usr/pgsql-10/bin/postgres -D /data/patroni –config-file=/data/patroni/postgresql.conf –listen_addresses=192.168.0.108 –max_worker_processes=8 –max_locks_per_transaction=64 –wal_level=replica –cluster_name=postgres –wal_log_hints=on –max_wal_senders=10 –track_commit_timestamp=off –max_prepared_transactions=0 –port=5432 –max_replication_slots=10 –max_connections=100 -d 5
2019-05-23 08:40:23.585 EDT [10792] DEBUG:  postgres: PostmasterMain: initial environment dump:
2019-05-23 08:40:23.586 EDT [10792] DEBUG:  —————————————–
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      XDG_SESSION_ID=25
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      HOSTNAME=psql01.nix.mds.xyz
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      SHELL=/bin/bash
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      TERM=xterm
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      HISTSIZE=1000
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      USER=postgres
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36:
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      MAIL=/var/spool/mail/postgres
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      PATH=/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/pgsql-10/bin/
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      PWD=/data/patroni
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      LANG=en_US.UTF-8
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      HISTCONTROL=ignoredups
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      SHLVL=1
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      HOME=/var/lib/pgsql
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      LOGNAME=postgres
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      PGDATA=/data/patroni
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      LESSOPEN=||/usr/bin/lesspipe.sh %s
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      _=/usr/pgsql-10/bin/postgres
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      OLDPWD=/data
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      PGLOCALEDIR=/usr/pgsql-10/share/locale
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      PGSYSCONFDIR=/etc/sysconfig/pgsql
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      LC_COLLATE=en_US.UTF-8
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      LC_CTYPE=en_US.UTF-8
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      LC_MESSAGES=en_US.UTF-8
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      LC_MONETARY=C
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      LC_NUMERIC=C
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      LC_TIME=C
2019-05-23 08:40:23.586 EDT [10792] DEBUG:  —————————————–
2019-05-23 08:40:23.589 EDT [10792] DEBUG:  registering background worker "logical replication launcher"
2019-05-23 08:40:23.590 EDT [10792] LOG:  listening on IPv4 address "192.168.0.108", port 5432
2019-05-23 08:40:23.595 EDT [10792] LOG:  listening on Unix socket "./.s.PGSQL.5432"
2019-05-23 08:40:23.597 EDT [10792] DEBUG:  invoking IpcMemoryCreate(size=148545536)
2019-05-23 08:40:23.598 EDT [10792] DEBUG:  mmap(148897792) with MAP_HUGETLB failed, huge pages disabled: Cannot allocate memory
2019-05-23 08:40:23.619 EDT [10792] DEBUG:  SlruScanDirectory invoking callback on pg_notify/0000
2019-05-23 08:40:23.619 EDT [10792] DEBUG:  removing file "pg_notify/0000"
2019-05-23 08:40:23.619 EDT [10792] DEBUG:  dynamic shared memory system will support 288 segments
2019-05-23 08:40:23.620 EDT [10792] DEBUG:  created dynamic shared memory control segment 499213675 (6928 bytes)
2019-05-23 08:40:23.623 EDT [10792] DEBUG:  max_safe_fds = 985, usable_fds = 1000, already_open = 5
2019-05-23 08:40:23.626 EDT [10792] LOG:  redirecting log output to logging collector process
2019-05-23 08:40:23.626 EDT [10792] HINT:  Future log output will appear in directory "log".
^C2019-05-23 08:41:04.346 EDT [10793] DEBUG:  logger shutting down
2019-05-23 08:41:04.346 EDT [10793] DEBUG:  shmem_exit(0): 0 before_shmem_exit callbacks to make
2019-05-23 08:41:04.346 EDT [10793] DEBUG:  shmem_exit(0): 0 on_shmem_exit callbacks to make
2019-05-23 08:41:04.346 EDT [10793] DEBUG:  proc_exit(0): 0 callbacks to make
2019-05-23 08:41:04.346 EDT [10793] DEBUG:  exit(0)
-bash-4.2$ 2019-05-23 08:41:04.346 EDT [10793] DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
2019-05-23 08:41:04.346 EDT [10793] DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
2019-05-23 08:41:04.346 EDT [10793] DEBUG:  proc_exit(-1): 0 callbacks to make

-bash-4.2$
 

-bash-4.2$ free
              total        used        free      shared  buff/cache   available
Mem:        3881708      218672     1687436      219292     1975600     3113380
Swap:       4063228           0     4063228
-bash-4.2$

 

The line above in red, indicates lack of system memory on this VM due to a lack of memory on the underlying physical host (overcommitment) .  You'll need to a) assign more memory to the VM, if you see the physical has plenty, or b) purchase more memory for the physical or c) relocate the VM to a host with more memory.  If this doesn't solve the problem, we need to look deeper and check the running process using strace:

[root@psql01 ~]# ps -ef|grep -Ei "patroni|postgres"
root      2217  2188  0 00:38 pts/1    00:00:00 tail -f postgresql-Thu.log
postgres  2512     1  4 00:42 ?        00:00:01 /usr/bin/python2 /bin/patroni /etc/patroni.yml
postgres  2533     1  0 00:42 ?        00:00:00 /usr/pgsql-10/bin/postgres -D /data/patroni –config-file=/data/patroni/postgresql.conf –hot_standby=on –listen_addresses=192.168.0.108 –max_worker_processes=8 –max_locks_per_transaction=64 –wal_level=replica –cluster_name=postgres –wal_log_hints=on –max_wal_senders=10 –track_commit_timestamp=off –max_prepared_transactions=0 –port=5432 –max_replication_slots=10 –max_connections=100
postgres  2535  2533  0 00:42 ?        00:00:00 postgres: postgres: logger process
postgres  2536  2533  0 00:42 ?        00:00:00 postgres: postgres: startup process   waiting for 000000010000000000000008
root      2664  2039  0 00:42 pts/0    00:00:00 grep –color=auto -Ei patroni|postgres
[root@psql01 ~]#

Then tracing the above line in red:

[root@psql01 ~]# strace -p 2536
read(5, 0x7fff9cb4eb87, 1)              = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x7fff9cb4eb87, 1)              = -1 EAGAIN (Resource temporarily unavailable)
open("pg_wal/00000098.history", O_RDONLY) = -1 ENOENT (No such file or directory)
epoll_create1(EPOLL_CLOEXEC)            = 3
epoll_ctl(3, EPOLL_CTL_ADD, 9, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=16954624, u64=16954624}}) = 0
epoll_ctl(3, EPOLL_CTL_ADD, 5, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=16954648, u64=16954648}}) = 0
epoll_wait(3, ^Cstrace: Process 2536 detached
 <detached …>
[root@psql01 ~]# 

Ensure you set permissions on the copied files as well or you may receive this:

[root@psql01 pg_wal]# tail -f ../log/postgresql-Fri.log
2019-05-24 01:22:32.979 EDT [13127] LOG:  aborting startup due to startup process failure
2019-05-24 01:22:32.982 EDT [13127] LOG:  database system is shut down
2019-05-24 01:22:33.692 EDT [13146] LOG:  database system was shut down in recovery at 2019-05-24 01:15:31 EDT
2019-05-24 01:22:33.693 EDT [13146] WARNING:  recovery command file "recovery.conf" specified neither primary_conninfo nor restore_command
2019-05-24 01:22:33.693 EDT [13146] HINT:  The database server will regularly poll the pg_wal subdirectory to check for files placed there.
2019-05-24 01:22:33.693 EDT [13146] FATAL:  could not open file "pg_wal/0000003A.history": Permission denied

 

Thx,
TK

psql01 etcd: read wal error (walpb: crc mismatch) and cannot be repaired

To fix:

May 22 00:29:31 psql01 etcd: read wal error (walpb: crc mismatch) and cannot be repaired

Do the following.  First copy the old wal files out of the way:

[root@psql01 wal]# ls -altri
total 375092
201347741 -rw——-. 1 etcd etcd 64000056 Mar 30 15:44 0000000000000027-000000000181bbc9.wal
201347715 -rw——-. 1 etcd etcd 64000104 Apr  1 18:46 0000000000000028-000000000188798c.wal
201347727 -rw——-. 1 etcd etcd 64000056 Apr  3 18:02 0000000000000029-00000000018f2f2c.wal
201347690 -rw——-. 1 etcd etcd 64000040 Apr 22 11:24 000000000000002a-0000000001959a44.wal
201547677 -rw——-. 1 etcd etcd 64000000 Apr 22 11:24 1.tmp
201528077 -rw——-. 1 etcd etcd 64000000 Apr 28 06:06 000000000000002b-0000000001aace2a.wal
 69149887 drwx——. 4 etcd etcd       27 May 22 00:29 ..
201547666 drwx——. 2 etcd etcd     4096 May 22 00:29 .
[root@psql01 wal]# systemctl stop etcd
[root@psql01 wal]# mkdir /root/etcd-backup
[root@psql01 wal]# mv * /root/etcd-backup/

 

Next, start ETCD on the other 2 members.  Once the other two ETCD servers start, start ETCD on psql01 (first cluster member or whatever member was failing in your cluster)

ETCD should now be restarted and synced up from it's donors ( other cluster members ).  

Thx,
TK

Linux LVM: Adding Disk Space to Virtual of Physical Drives

Linux LVM: Adding Disk Space to Virtual of Physical Drives

In this writeup, we will aim to increase the size of the root drive that has:

1) Standard drive partitioning using fdisk and no LVM: /dev/sda1
2) Has LVM for the OS and files: /dev/sda2

This procedure will help to avoid the can't find the centos-root logical volume.

Read the rest of this entry »


     
  Copyright © 2003 - 2013 Tom Kacperski (microdevsys.com). All rights reserved.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License