Header Shadow Image


FATAL:  the database system is starting up

If you are receiving the following when postgresql ( w/ Patroni ) is starting up:

2019-04-04 14:59:15.715 EDT [26025] FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment 000000390000000000000008 has already been removed
2019-04-04 14:59:16.420 EDT [26029] FATAL:  the database system is starting up

consider running the individual postgres line separately in debug mode like this to reveal the true cause:

-bash-4.2$ /usr/pgsql-10/bin/postgres -D /data/patroni –config-file=/data/patroni/postgresql.conf –listen_addresses=192.168.0.108 –max_worker_processes=8 –max_locks_per_transaction=64 –wal_level=replica –cluster_name=postgres –wal_log_hints=on –max_wal_senders=10 –track_commit_timestamp=off –max_prepared_transactions=0 –port=5432 –max_replication_slots=10 –max_connections=100 -d 5
2019-05-23 08:40:23.585 EDT [10792] DEBUG:  postgres: PostmasterMain: initial environment dump:
2019-05-23 08:40:23.586 EDT [10792] DEBUG:  —————————————–
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      XDG_SESSION_ID=25
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      HOSTNAME=psql01.nix.mds.xyz
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      SHELL=/bin/bash
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      TERM=xterm
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      HISTSIZE=1000
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      USER=postgres
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36:
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      MAIL=/var/spool/mail/postgres
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      PATH=/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/pgsql-10/bin/
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      PWD=/data/patroni
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      LANG=en_US.UTF-8
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      HISTCONTROL=ignoredups
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      SHLVL=1
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      HOME=/var/lib/pgsql
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      LOGNAME=postgres
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      PGDATA=/data/patroni
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      LESSOPEN=||/usr/bin/lesspipe.sh %s
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      _=/usr/pgsql-10/bin/postgres
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      OLDPWD=/data
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      PGLOCALEDIR=/usr/pgsql-10/share/locale
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      PGSYSCONFDIR=/etc/sysconfig/pgsql
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      LC_COLLATE=en_US.UTF-8
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      LC_CTYPE=en_US.UTF-8
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      LC_MESSAGES=en_US.UTF-8
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      LC_MONETARY=C
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      LC_NUMERIC=C
2019-05-23 08:40:23.586 EDT [10792] DEBUG:      LC_TIME=C
2019-05-23 08:40:23.586 EDT [10792] DEBUG:  —————————————–
2019-05-23 08:40:23.589 EDT [10792] DEBUG:  registering background worker "logical replication launcher"
2019-05-23 08:40:23.590 EDT [10792] LOG:  listening on IPv4 address "192.168.0.108", port 5432
2019-05-23 08:40:23.595 EDT [10792] LOG:  listening on Unix socket "./.s.PGSQL.5432"
2019-05-23 08:40:23.597 EDT [10792] DEBUG:  invoking IpcMemoryCreate(size=148545536)
2019-05-23 08:40:23.598 EDT [10792] DEBUG:  mmap(148897792) with MAP_HUGETLB failed, huge pages disabled: Cannot allocate memory
2019-05-23 08:40:23.619 EDT [10792] DEBUG:  SlruScanDirectory invoking callback on pg_notify/0000
2019-05-23 08:40:23.619 EDT [10792] DEBUG:  removing file "pg_notify/0000"
2019-05-23 08:40:23.619 EDT [10792] DEBUG:  dynamic shared memory system will support 288 segments
2019-05-23 08:40:23.620 EDT [10792] DEBUG:  created dynamic shared memory control segment 499213675 (6928 bytes)
2019-05-23 08:40:23.623 EDT [10792] DEBUG:  max_safe_fds = 985, usable_fds = 1000, already_open = 5
2019-05-23 08:40:23.626 EDT [10792] LOG:  redirecting log output to logging collector process
2019-05-23 08:40:23.626 EDT [10792] HINT:  Future log output will appear in directory "log".
^C2019-05-23 08:41:04.346 EDT [10793] DEBUG:  logger shutting down
2019-05-23 08:41:04.346 EDT [10793] DEBUG:  shmem_exit(0): 0 before_shmem_exit callbacks to make
2019-05-23 08:41:04.346 EDT [10793] DEBUG:  shmem_exit(0): 0 on_shmem_exit callbacks to make
2019-05-23 08:41:04.346 EDT [10793] DEBUG:  proc_exit(0): 0 callbacks to make
2019-05-23 08:41:04.346 EDT [10793] DEBUG:  exit(0)
-bash-4.2$ 2019-05-23 08:41:04.346 EDT [10793] DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
2019-05-23 08:41:04.346 EDT [10793] DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
2019-05-23 08:41:04.346 EDT [10793] DEBUG:  proc_exit(-1): 0 callbacks to make

-bash-4.2$
 

-bash-4.2$ free
              total        used        free      shared  buff/cache   available
Mem:        3881708      218672     1687436      219292     1975600     3113380
Swap:       4063228           0     4063228
-bash-4.2$

 

The line above in red, indicates lack of system memory on this VM due to a lack of memory on the underlying physical host (overcommitment) .  You'll need to a) assign more memory to the VM, if you see the physical has plenty, or b) purchase more memory for the physical or c) relocate the VM to a host with more memory.  If this doesn't solve the problem, we need to look deeper and check the running process using strace:

[root@psql01 ~]# ps -ef|grep -Ei "patroni|postgres"
root      2217  2188  0 00:38 pts/1    00:00:00 tail -f postgresql-Thu.log
postgres  2512     1  4 00:42 ?        00:00:01 /usr/bin/python2 /bin/patroni /etc/patroni.yml
postgres  2533     1  0 00:42 ?        00:00:00 /usr/pgsql-10/bin/postgres -D /data/patroni –config-file=/data/patroni/postgresql.conf –hot_standby=on –listen_addresses=192.168.0.108 –max_worker_processes=8 –max_locks_per_transaction=64 –wal_level=replica –cluster_name=postgres –wal_log_hints=on –max_wal_senders=10 –track_commit_timestamp=off –max_prepared_transactions=0 –port=5432 –max_replication_slots=10 –max_connections=100
postgres  2535  2533  0 00:42 ?        00:00:00 postgres: postgres: logger process
postgres  2536  2533  0 00:42 ?        00:00:00 postgres: postgres: startup process   waiting for 000000010000000000000008
root      2664  2039  0 00:42 pts/0    00:00:00 grep –color=auto -Ei patroni|postgres
[root@psql01 ~]#

Then tracing the above line in red:

[root@psql01 ~]# strace -p 2536
read(5, 0x7fff9cb4eb87, 1)              = -1 EAGAIN (Resource temporarily unavailable)
read(5, 0x7fff9cb4eb87, 1)              = -1 EAGAIN (Resource temporarily unavailable)
open("pg_wal/00000098.history", O_RDONLY) = -1 ENOENT (No such file or directory)
epoll_create1(EPOLL_CLOEXEC)            = 3
epoll_ctl(3, EPOLL_CTL_ADD, 9, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=16954624, u64=16954624}}) = 0
epoll_ctl(3, EPOLL_CTL_ADD, 5, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=16954648, u64=16954648}}) = 0
epoll_wait(3, ^Cstrace: Process 2536 detached
 <detached …>
[root@psql01 ~]# 

Ensure you set permissions on the copied files as well or you may receive this:

[root@psql01 pg_wal]# tail -f ../log/postgresql-Fri.log
2019-05-24 01:22:32.979 EDT [13127] LOG:  aborting startup due to startup process failure
2019-05-24 01:22:32.982 EDT [13127] LOG:  database system is shut down
2019-05-24 01:22:33.692 EDT [13146] LOG:  database system was shut down in recovery at 2019-05-24 01:15:31 EDT
2019-05-24 01:22:33.693 EDT [13146] WARNING:  recovery command file "recovery.conf" specified neither primary_conninfo nor restore_command
2019-05-24 01:22:33.693 EDT [13146] HINT:  The database server will regularly poll the pg_wal subdirectory to check for files placed there.
2019-05-24 01:22:33.693 EDT [13146] FATAL:  could not open file "pg_wal/0000003A.history": Permission denied

 

Thx,
TK

psql01 etcd: read wal error (walpb: crc mismatch) and cannot be repaired

To fix:

May 22 00:29:31 psql01 etcd: read wal error (walpb: crc mismatch) and cannot be repaired

Do the following.  First copy the old wal files out of the way:

[root@psql01 wal]# ls -altri
total 375092
201347741 -rw——-. 1 etcd etcd 64000056 Mar 30 15:44 0000000000000027-000000000181bbc9.wal
201347715 -rw——-. 1 etcd etcd 64000104 Apr  1 18:46 0000000000000028-000000000188798c.wal
201347727 -rw——-. 1 etcd etcd 64000056 Apr  3 18:02 0000000000000029-00000000018f2f2c.wal
201347690 -rw——-. 1 etcd etcd 64000040 Apr 22 11:24 000000000000002a-0000000001959a44.wal
201547677 -rw——-. 1 etcd etcd 64000000 Apr 22 11:24 1.tmp
201528077 -rw——-. 1 etcd etcd 64000000 Apr 28 06:06 000000000000002b-0000000001aace2a.wal
 69149887 drwx——. 4 etcd etcd       27 May 22 00:29 ..
201547666 drwx——. 2 etcd etcd     4096 May 22 00:29 .
[root@psql01 wal]# systemctl stop etcd
[root@psql01 wal]# mkdir /root/etcd-backup
[root@psql01 wal]# mv * /root/etcd-backup/

 

Next, start ETCD on the other 2 members.  Once the other two ETCD servers start, start ETCD on psql01 (first cluster member or whatever member was failing in your cluster)

ETCD should now be restarted and synced up from it's donors ( other cluster members ).  

Thx,
TK

Linux LVM: Adding Disk Space to Virtual of Physical Drives

Linux LVM: Adding Disk Space to Virtual of Physical Drives

In this writeup, we will aim to increase the size of the root drive that has:

1) Standard drive partitioning using fdisk and no LVM: /dev/sda1
2) Has LVM for the OS and files: /dev/sda2

This procedure will help to avoid the can't find the centos-root logical volume.

Read the rest of this entry »

DNS issue: Can’t ping but nslookup works

DNS issue: Can't ping but nslookup works

You can do several things in this case. Start Services then recycle DHCP Client.  ipconfig /flushdns and netsh int ip reset resettcpip.txt can fix this temporarily as well.

I've elected to simply stop DHCP Client and let the system do all lookups against my internal DNS servers.

This still leaves the problem of the DHCP Client not working correctly which I'm not 100% sure about. 

Can lookup event viewer to determine the issue however there was nothing in event viewer for this.  

Cheers,
TK

REF: https://merabheja.com/fix-nslookup-works-but-ping-fails-in-windows-10/ 

Setup a USB Null Modem for Kernel Dump Captures

We will setup a serial null modem cable for administering and connecting to a physical machine via another in the event that:  

1) We want to capture kernel crashes and dumps.  
2) Login to the machine machine remotely via another linux box to do things like restart the network.  

For this we will need:  

1) One of DB9 RS232 Serial Null Modem Cable F/F
2) Two of USB to RS232 Serial Port DB9 9 Pin Male

Connect the USB to Serial Adapter to both systems.  Following it set the tty specifc settings on ttyUSB0:

6889  stty -F /dev/ttyUSB0 115200 cs8 -cstopb -parenb
6890  stty -F /dev/ttyUSB0 -a

 

Test the serial connection by running the following:

6894  /sbin/agetty -L 115200 ttyUSB0
 

Use minicom from the connecting linux host.  When test running /sbin/agetty -L 115200 ttyUSB0, you should see a prompt:

[root@rfc1178-01 ~]# minicom

Welcome to minicom 2.6.2

OPTIONS: I18n
Compiled on Jun 25 2013, 10:33:48.
Port /dev/ttyUSB0, 11:30:08

Press CTRL-A Z for help on special keys

Scientific Linux release 6.10 (Carbon)
Kernel 4.18.19 on an x86_64

mbpc-pc login: root
Password:
Last login: Fri Apr 19 12:51:19 from 192.168.0.76
0;root@mbpc-pc:~[root@mbpc-pc ~]#
0;root@mbpc-pc:~[root@mbpc-pc ~]#
0;root@mbpc-pc:~[root@mbpc-pc ~]#
0;root@mbpc-pc:~[root@mbpc-pc ~]# uptime
 13:03:19 up 14 min,  1 user,  load average: 0.06, 0.13, 0.18
0;root@mbpc-pc:~[root@mbpc-pc ~]#

 

You should be able to login as above confirming the physical layer (USB to Serial -> Null Modem Female-to-Female -> Serial to USB) functions correctly and root is allowed to login.  Configure the kernel to send messages on the tty:

title Scientific Linux (4.18.19)
        root (hd0,0)
        kernel /vmlinuz-4.18.19 ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on crashkernel=0M-2G:128M,2G-6G:256M,6G-8G:512M,8G-:768M pci=nomsi nohpet clocksource=rtc console=ttyUSB0,115200n8 console=tty0

Configure:

[root@mbpc-pc ~]# cat /etc/securetty |grep USB
ttyUSB0
[root@mbpc-pc ~]# cat /etc/init/ttyUSB0.conf
# ttyUSB0 – agetty
#
# This service maintains a agetty on ttyUSB0.

stop on runlevel [S06]
start on runlevel [12435]

respawn
exec agetty -L /dev/ttyUSB0 115200
[root@mbpc-pc ~]#

 

Configure the minicom settings on the external host (CTRL – A, followed by Z.  Look for option cOnfigure Minicom..O or directly using CTRL – A followed by O):

+—–[configuration]——+
| Filenames and paths      |
| File transfer protocols  |
| Serial port setup        |
| Modem and dialing        |
| Screen and keyboard      |
| Save setup as dfl        |
| Save setup as..          |
| Exit                     |
+————————–+

Followed by the settings below:

+———————————————————————–+
| A –    Serial Device      : /dev/ttyUSB0                              |
|                                                                       |
| C –   Callin Program      :                                           |
| D –  Callout Program      :                                           |
| E –    Bps/Par/Bits       : 115200 8N1                                |
| F – Hardware Flow Control : No                                        |
| G – Software Flow Control : Yes                                       |
|                                                                       |
|    Change which setting?                                              |
+———————————————————————–+

Hit ESC when done and save the configuration:

| Save setup as dfl        |

Restart the server to ensure changes take effect.  You should now see messages from the minicom terminal on the secondary system:

Welcome to minicom 2.6.2

OPTIONS: I18n
Compiled on Jun 25 2013, 10:33:48.
Port /dev/ttyUSB0, 12:03:52

Press CTRL-A Z for help on special keys


Scientific Linux release 6.10 (Carbon)
Kernel 4.18.19 on an x86_64

mbpc-pc login:

Next, test restart with the console connected to see restart messages being printed:

Linux version 4.18.19 (root@mbpc-pc) (gcc version 4.4.7 201209
Command line: ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_8
x86/fpu: x87 FPU will use FXSAVE
BIOS-provided physical RAM map:
BIOS-e820: [mem 0x0000000000000000-0x0000000000093fff] usable
BIOS-e820: [mem 0x000000000009f800-0x000000000009ffff] reserved
BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
BIOS-e820: [mem 0x0000000000100000-0x00000000dfceffff] usable
BIOS-e820: [mem 0x00000000dfcf0000-0x00000000dfcf0fff] ACPI NVS
BIOS-e820: [mem 0x00000000dfcf1000-0x00000000dfcfffff] ACPI data
BIOS-e820: [mem 0x00000000dfd00000-0x00000000dfdfffff] reserved
BIOS-e820: [mem 0x00000000e0000000-0x00000000efffffff] reserved
BIOS-e820: [mem 0x00000000fec00000-0x00000000ffffffff] reserved
BIOS-e820: [mem 0x0000000100000000-0x000000011fffffff] usable
NX (Execute Disable) protection: active
SMBIOS 2.4 present.
DMI: Gigabyte Technology Co., Ltd. GA-890XA-UD3/GA-890XA-UD3, BIOS FC 08/02/2010
AGP: No AGP bridge found

 

Testing can be done using this:

[root@mbpc-pc cores]# echo "This is a ttyUSB0 test from mbpc-pc." > /dev/ttyUSB0
[root@mbpc-pc cores]#

 

Result on the console is:

[root@mbpc-pc ~]# This is a ttyUSB0 test from mbpc-pc.
CTRL-A Z for help |115200 8N1 | NOR | Minicom 2.6.2  | VT102 | Online 08:12

 

If you get a prompt but no kernel messages, ensure you compile the following options into the kernel:

CONFIG_USB_SERIAL=y
CONFIG_USB_SERIAL_CONSOLE=y
CONFIG_USB_SERIAL_EDGEPORT_TI=y
CONFIG_USB_SERIAL_MOS7840=y

You can find the above in the make menuconfig driver sections.  You can find the above by pressing forward slash ( / ) followed by the search string CONFIG_USB_SERIAL which will give you the path of the option:


  |   Location:                                               |
  |     -> Device Drivers                                     |
  |       -> USB support (USB_SUPPORT [=y])                   |
  |         -> USB Serial Converter support (USB_SERIAL [=y]) |

 

If you get kernel messages but no prompt (after enabling additional kernel parameters above) then try adding the following additional parameters:

[root@mbpc-pc linux-4.18.19]# cat /etc/init/ttyUSB0.conf
# ttyUSB0 – agetty
#
# This service maintains a agetty on ttyUSB0.

stop on runlevel [S06] and (
            not-container or
            container CONTAINER=lxc or
            container CONTAINER=lxc-libvirt)

start on runlevel [12435]

respawn
exec agetty -L /dev/ttyUSB0 115200 vt100
[root@mbpc-pc linux-4.18.19]#

 

However for us it was just a matter of restarting against since agetty didn't come up the first time.  If with the addition of the above items in green you now get a console, all is good and you should be all set to capture the kernel messages when crashes happen!

REF: https://wiki.freepbx.org/display/PC/Capturing+Kernel+Panic+via+Serial+Port

Cheers,
TK

com.cloudera.cmf.service.CommandException: java.io.IOException: Cannot create command directory: /var/lib/cloudera-scm-server/temp/commands/114

Getting this?

com.cloudera.cmf.service.CommandException: java.io.IOException: Cannot create command directory: /var/lib/cloudera-scm-server/temp/commands/114

it's because we blow the folder away.  Reinstall the packages:

[root@cm-r01nn01 ~]# yum reinstall cloudera-manager-daemons cloudera-manager-agent cloudera-manager-server -y

Thx,
TK

Kernel Panic and Disabling HPET

Recent kernel panics have pointed to an issue with the HPET timer on some motherboards.  To disable, add the following to the kernel line:

nohpet clocksource=rtc

Then also disable HPET from BIOS.

Thx,
TK

 

locale: Cannot set LC_CTYPE to default locale: No such file or directory

Getting this?

# locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory

 

Check this:

[root@cm-r01nn02 yum.repos.d]# cat /etc/locale.conf
LANG=en_EN.UTF-8
[root@cm-r01nn02 yum.repos.d]# 

should be:

[root@cm-r01nn01 ~]# cat /etc/locale.conf
LANG="en_US.UTF-8"
[root@cm-r01nn01 ~]#

You may also have empty libraries such as:

[root@cm-r01nn02 ~]# yum reinstall *glibc*
/sbin/ldconfig: File /lib64/libXcursor.so.1.0.2 is empty, not checked.

[root@cm-r01nn02 ~]# ls -altri /lib64/libXcursor.so.1.0.2
203505067 -rwxr-xr-x. 1 root root 0 Oct 30 12:38 /lib64/libXcursor.so.1.0.2
[root@cm-r01nn02 ~]#

What it should be:

[root@cm-r01nn01 ~]# ls -altri /lib64/libXcursor.so.1.0.2
201697422 -rwxr-xr-x. 1 root root 45200 Oct 30 12:38 /lib64/libXcursor.so.1.0.2
[root@cm-r01nn01 ~]#

Run the following to confirm if any files are empty:

[root@cm-r01nn01 ~]# ldconfig
[root@cm-r01nn01 ~]#

on a bad system:

[root@cm-r01nn02 ~]# ldconfig
ldconfig: File /lib64/libdrm.so.2.4.0 is empty, not checked.
ldconfig: File /lib64/libdrm_intel.so.1 is empty, not checked.
ldconfig: File /lib64/libdrm_intel.so.1.0.0 is empty, not checked.
ldconfig: File /lib64/libdrm_nouveau.so.2 is empty, not checked.
ldconfig: File /lib64/libdrm_nouveau.so.2.0.0 is empty, not checked.
ldconfig: File /lib64/libdrm_radeon.so.1 is empty, not checked.
ldconfig: File /lib64/libdrm_radeon.so.1.0.1 is empty, not checked.
ldconfig: File /lib64/libkms.so.1 is empty, not checked.
ldconfig: File /lib64/libkms.so.1.0.0 is empty, not checked.
ldconfig: File /lib64/libdrm.so.2 is empty, not checked.
ldconfig: File /lib64/libdrm_amdgpu.so.1.0.0 is empty, not checked.
ldconfig: File /lib64/libdrm_amdgpu.so.1 is empty, not checked.
ldconfig: File /lib64/libXfixes.so.3 is empty, not checked.
ldconfig: File /lib64/libXfixes.so.3.1.0 is empty, not checked.
ldconfig: File /lib64/libglapi.so.0 is empty, not checked.
ldconfig: File /lib64/libglapi.so.0.0.0 is empty, not checked.
ldconfig: File /lib64/libXdamage.so.1 is empty, not checked.
ldconfig: File /lib64/libXdamage.so.1.1.0 is empty, not checked.
ldconfig: File /lib64/libxshmfence.so.1 is empty, not checked.
ldconfig: File /lib64/libxshmfence.so.1.0.0 is empty, not checked.
ldconfig: File /lib64/libGLdispatch.so.0 is empty, not checked.
ldconfig: File /lib64/libGLdispatch.so.0.0.0 is empty, not checked.
ldconfig: File /lib64/libwayland-server.so.0 is empty, not checked.
ldconfig: File /lib64/libwayland-server.so.0.1.0 is empty, not checked.
ldconfig: File /lib64/libgbm.so.1 is empty, not checked.
ldconfig: File /lib64/libgbm.so.1.0.0 is empty, not checked.
ldconfig: File /lib64/libXcursor.so.1 is empty, not checked.
ldconfig: File /lib64/libXcursor.so.1.0.2 is empty, not checked.
ldconfig: File /lib64/libpcsclite.so.1 is empty, not checked.
ldconfig: File /lib64/libpcsclite.so.1.0.0 is empty, not checked.
ldconfig: File /lib64/libthai.so.0 is empty, not checked.
ldconfig: File /lib64/libthai.so.0.1.6 is empty, not checked.
ldconfig: File /lib64/libgraphite2.so.3 is empty, not checked.
ldconfig: File /lib64/libgraphite2.so.3.0.1 is empty, not checked.
ldconfig: File /lib64/libharfbuzz.so.0 is empty, not checked.
ldconfig: File /lib64/libharfbuzz.so.0.10705.0 is empty, not checked.
[root@cm-r01nn02 ~]#

Query the files using rpm -qf <FILE> then reinstall the package.  Reboot the machine.

This was all due to some XFS corruption that occurred in the past.   Likewise, check if any files on the OS are zero bytes:

for KEY in $( rpm –ql $(rpm -aq) ); do [[ ! -s $KEY && -r $KEY ]] && echo $KEY; done

Reinstall them if they are.  After a FS corruption, many files were zero on our system.  Reinstalling them can help by reinstalling the package itself.  Compare the file output to another host that is working fine.  You can use this command:

for KEY in $( cat t.txt ); do [[ -s $KEY ]] && echo $KEY; done

NOTE: Copy the file list found from corrupt host to the working host and run the above.  

In the event that a file is corrupted but its file size is not zero, it may not be easy to find the said file without a direct comparison with another host.  An alternative is to try and reinstall existing packages:

[root@cm-r01nn02 ~]# yum reinstall $(rpm -aq)

Lookup the LC_TYPE :

[root@cm-r01nn02 ~]# echo $LANG
C.UTF-8
[root@cm-r01nn02 ~]# echo $LC_CTYPE

[root@cm-r01nn02 ~]#

and in a good environment:

[root@cm-r01nn01 ~]# echo $LANG
en_US.UTF-8
[root@cm-r01nn01 ~]# echo $LC_CTYPE
en_US.UTF-8
[root@cm-r01nn01 ~]#

 

Finally, we copied the /usr/lib/locale/locale-archive from a good server to resolve the problem. But this begs the question: How is /usr/lib/locale/locale-archive generated?  The strace, a good one, should stop at the locale-archive and not go any further like this:

[root@cm-r01nn02 ~]# strace locale 2>&1|grep -Ei "open|stat|exec"
execve("/bin/locale", [“locale”], [/* 21 vars */]) = 0
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=37662, …}) = 0
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0755, st_size=2151672, …}) = 0
mmap(NULL, 3981792, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f371aab0000
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=106075056, …}) = 0
fstat(1, {st_mode=S_IFIFO|0600, st_size=0, …}) = 0
[root@cm-r01nn02 ~]#

 

But it did on a faulty server, suggesting this file may be generated on the target system.  So I reinstalled glibc-common once more replacing the good copy from the other server.  This time it worked despire generating a different locale-archive file:

[root@cm-r01nn02 ~]# history|grep strace
  196  strace -p 3951
  534  strace locale
  690  strace locale 2>&1|grep -Ei "open|stat"
  827  strace locale  | grep -Ei "open|stat"
  828  strace locale  2>&1 | grep -Ei "open|stat"
  832  strace locale  2>&1 | grep -Ei "open|stat"
  852  strace locale
  855  strace locale|grep -Ei "exec|open|access"
  857  strace -e locale
  858  strace -e open locale
  859  strace -e trace=open,read locale
  860  strace -ff -e trace=open locale
  863  strace -ff -o trace  locale
  979  strace locale
  980  strace locale 2>&1|grep -Ei "open|stat|exec"
 1003  strace locale
 1004  history|grep strace
[root@cm-r01nn02 ~]# strace locale 2>&1|grep -Ei "open|stat|exec"
execve("/bin/locale", [“locale”], [/* 21 vars */]) = 0
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=37662, …}) = 0
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0755, st_size=2151672, …}) = 0
mmap(NULL, 3981792, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f371aab0000
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=106075056, …}) = 0
fstat(1, {st_mode=S_IFIFO|0600, st_size=0, …}) = 0
[root@cm-r01nn02 ~]#
[root@cm-r01nn02 ~]#
[root@cm-r01nn02 ~]#
[root@cm-r01nn02 ~]# rpm -qf /usr/lib/locale/locale-archive
glibc-common-2.17-260.el7_6.4.x86_64
[root@cm-r01nn02 ~]#
[root@cm-r01nn02 ~]#
[root@cm-r01nn02 ~]# sha1sum /usr/lib/locale/locale-archive /root/locale-archive
8698125a0ab14cd3ae969d3c21b867b9cb490227  /usr/lib/locale/locale-archive
8698125a0ab14cd3ae969d3c21b867b9cb490227  /root/locale-archive
[root@cm-r01nn02 ~]# yum reinstall glibc-common -y
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
epel/x86_64/metalink                                                                      |  16 kB  00:00:00
 * base: mirror.csclub.uwaterloo.ca
 * epel: mirror.csclub.uwaterloo.ca
 * extras: mirror.csclub.uwaterloo.ca
 * updates: mirror.csclub.uwaterloo.ca
base                                                                                      | 3.6 kB  00:00:00
cloudera-manager                                                                          | 2.9 kB  00:00:00
epel                                                                                      | 4.7 kB  00:00:00
extras                                                                                    | 3.4 kB  00:00:00
updates                                                                                   | 3.4 kB  00:00:00
vmware-tools                                                                              |  951 B  00:00:00
(1/2): epel/x86_64/updateinfo                                                             | 983 kB  00:00:00
(2/2): epel/x86_64/primary_db                                                             | 6.7 MB  00:00:01
Resolving Dependencies
–> Running transaction check
—> Package glibc-common.x86_64 0:2.17-260.el7_6.4 will be reinstalled
–> Finished Dependency Resolution

Dependencies Resolved

=================================================================================================================
 Package                     Arch                  Version                          Repository              Size
=================================================================================================================
Reinstalling:
 glibc-common                x86_64                2.17-260.el7_6.4                 updates                 12 M

Transaction Summary
=================================================================================================================
Reinstall  1 Package

Total download size: 12 M
Installed size: 115 M
Downloading packages:
Delta RPMs disabled because /usr/bin/applydeltarpm not installed.
glibc-common-2.17-260.el7_6.4.x86_64.rpm                                                  |  12 MB  00:00:02
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : glibc-common-2.17-260.el7_6.4.x86_64                                                          1/1
  Verifying  : glibc-common-2.17-260.el7_6.4.x86_64                                                          1/1

Installed:
  glibc-common.x86_64 0:2.17-260.el7_6.4

Complete!
[root@cm-r01nn02 ~]#
[root@cm-r01nn02 ~]#
[root@cm-r01nn02 ~]#
[root@cm-r01nn02 ~]# sha1sum /usr/lib/locale/locale-archive /root/locale-archive
4a40d739c365ddcd3756283b0d4241dfb9b9dfcd  /usr/lib/locale/locale-archive
8698125a0ab14cd3ae969d3c21b867b9cb490227  /root/locale-archive
[root@cm-r01nn02 ~]# locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
[root@cm-r01nn02 ~]# reboot
Using username "mds.xyz\tom".
Using keyboard-interactive authentication.
Password:
Last login: Wed Apr 10 07:14:48 2019 from 192.168.0.93
tom@mds.xyz@cm-r01nn02:~] 🙂 $ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
tom@mds.xyz@cm-r01nn02:~] 🙂 $ sudo su –
[sudo] password for tom@mds.xyz:
Last login: Wed Apr 10 07:15:02 EDT 2019 on pts/0
[root@cm-r01nn02 ~]# locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
[root@cm-r01nn02 ~]#

If that still doesn't work, consider these two outputs from a correctly working system and an incorrectly working system:

[root@cm-r01nn02 ~]# localectl status
   System Locale: LANG=en_US.UTF-8
       VC Keymap: us
      X11 Layout: us
[root@cm-r01nn02 ~]#

Incorrectly working one:

[root@awx01 ~]# localectl status
   System Locale: n/a

       VC Keymap: us
      X11 Layout: us
[root@awx01 ~]#

Set the system locale:

[root@awx01 ~]# localectl set-locale LANG=en_US.UTF-8

restart, if necessary, then run:

locale

checking further still we see this:

[root@awx01 locale]# strings locale-archive|grep -Ei en_us.utf8
en_US.utf8
[root@awx01 locale]# ls -altri /etc/profile
134299888 -rw-r–r–. 1 root root 1795 Nov  5  2016 /etc/profile
[root@awx01 locale]# scp cm-r01nn01:/etc/profile /etc/profile-cm-r01nn01
profile                                                                100% 1819   280.5KB/s   00:00
[root@awx01 locale]# diff /etc/profile /etc/profile-cm-r01nn01
65c65
< for i in /etc/profile.d/*.sh ; do

> for i in /etc/profile.d/*.sh /etc/profile.d/sh.local ; do
[root@awx01 locale]#

so we update the system but still the same issue.  locale is a perl executable so we check the following:

[root@awx01 etc]# perl -v
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = (unset),
        LC_ALL = (unset),
        LANG = "C.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").

This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi
(with 39 registered patches, see
perl -V for more detail)

Copyright 1987-2012, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using "man
perl" or "perldoc perl".  If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.

[root@awx01 etc]#
 

vs a working system:

[root@cm-r01nn01 locale]# perl -v

This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi
(with 39 registered patches, see
perl -V for more detail)

Copyright 1987-2012, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using "man
perl" or "perldoc perl".  If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.

[root@cm-r01nn01 locale]#

Let's look for the locale explicitly:

[root@awx01 etc]# find / -iname en_US.UTF-8
[root@awx01 etc]#

vs a working system:

[root@cm-r01nn01 locale]# find / –iname en_US.UTF-8
/
usr/share/X11/locale/en_US.UTF-8
[root@cm-r01nn01 locale]#

find what installs that file:

[root@cm-r01nn01 locale]# rpm -qf /usr/share/X11/locale/en_US.UTF-8
libX11-common-1.6.5-2.el7.noarch
[root@cm-r01nn01 locale]#

and reinstall that package or install it:

[root@awx01 etc]# yum install libX11-common.noarch

reboot and check if locale assignment worked.  If this still doesn't work, then we need to revisit our steps above since the following may be true when running grep on hidden files:

[root@awx01 ~]# cat .bash_profile |grep LANG
# export LANG="C.UTF-8"
[root@awx01 ~]# grep -ER LANG= *
[root@awx01 ~]#

To avoid the above issue, consider running greps in this manner:

[root@awx01 ~]# grep -rER “LANG=” * .[^.]*
.bash_profile:# export LANG="C.UTF-8"
[root@awx01 ~]# vi .bash_profile
[root@awx01 ~]#

 

And your issue should be solved!    🙂  

Thx,
TK

ERROR scm-web-216:com.cloudera.cmf.model.DbCommand: Command null(clusterHostInspector) has completed. finalstate:FINISHED, success:false, msg:Can only run host inspector when host is healthy.

When receiving this error, look into the worker clouderascm-agent to determine why.  In our case it was:  

[29/Mar/2019 00:11:47 +0000] 800 MainThread agent        ERROR    Error, CM server guid updated, expected f2f1e171-d20d-4425-afe9-58b567b51397, received 18757343-bd5c-4b15-a104-91fd432ebc82

Then follow this page to resolve the above.  

Thx,
TK

Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535.

If you are getting the following:

2019-03-21 01:04:17,021 FATAL main:org.hsqldb.cmdline.SqlFile: SQL Error at 'UTF-8' line 6:
"alter table SETTINGS
    add column LDAP_USER_SEARCH_BASE varchar(1024),
    add column LDAP_USER_SEARCH_FILTER varchar(1024),
    add column LDAP_GROUP_SEARCH_BASE varchar(1024),
    add column LDAP_GROUP_SEARCH_FILTER varchar(1024)"
Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs
2019-03-21 01:04:17,021 FATAL main:org.hsqldb.cmdline.SqlFile: Rolling
backSQL transaction.
2019-03-21 01:04:17,023 ERROR main:com.cloudera.enterprise.dbutil.SqlFileRunner: Exception while
executingddl scripts.
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs

it's probably because you're running these SQL commands to setup the Cloudera database:

CREATE DATABASE scm DEFAULT CHARACTER SET utf8mb4 DEFAULT COLLATE utf8mb4_unicode_ci;
GRANT ALL ON scm.* TO 'scm'@'%' IDENTIFIED BY 'scm';

Or these:

CREATE DATABASE scm DEFAULT CHARACTER SET utf8mb4 DEFAULT COLLATE utf8mb4_general_ci;
GRANT ALL ON 
scm.* TO 'scm'@'%' IDENTIFIED BY 'scm';

Instead of something like this:

create database scm DEFAULT CHARACTER SET utf8;
grant all privileges on 
scm.* to 'scm'@'%' identified by 'scm';

Appears unicode or general, can throw an error with the clouderascm-server on install.

Thx,
TK


     
  Copyright © 2003 - 2013 Tom Kacperski (microdevsys.com). All rights reserved.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License