Header Shadow Image


Forbidden You don’t have permission to access /repos/ on this server.

So you get the following message when installing and configuring your HTTPD server?  Despite the right configuration you still receive the following:

Forbidden

You don't have permission to access /repos/ on this server.

Read the rest of this entry »

1765328228 Cannot contact any KDC for realm

When seeing this:

krb5_child.log:(Tue May 22 02:06:15 2018) [[sssd[krb5_child[1605]]]] [map_krb5_error] (0×0020): 1657: [-1765328228][Cannot contact any KDC for realm 'MDS.XYZ']

Access denied
Using keyboard-interactive authentication.
Password:

reverse the order of your DNS hosts in /etc/resolv.conf to this:

[root@cm-r01dn07 sssd]# cat /etc/resolv.conf
search mds.xyz nix.mds.xyz
nameserver 192.168.0.224
nameserver 192.168.0.44
nameserver 192.168.0.45
[root@cm-r01dn07 sssd]#

from this:

[root@cm-r01dn07 sssd]# cat /etc/resolv.conf
search mds.xyz nix.mds.xyz
nameserver 192.168.0.44
nameserver 192.168.0.45
nameserver 192.168.0.224

[root@cm-r01dn07 sssd]#

And that solved it.

Cheers,
TK

sssd krb5_child Key table entry not found

When you get this message:

May 21 00:13:31 nfs03.nix.mds.xyz [sssd[krb5_child[1822]]][1822]: Key table entry not found

followed by:

[[sssd[krb5_child[1752]]]] [k5c_setup_fast] (0×0020): 2628: [-1765328203][Key table entry not found]

or similar, dig into the logs further to see this:

(Mon May 21 00:13:33 2018) [[sssd[krb5_child[1824]]]] [find_principal_in_keytab] (0×0400): No principal matching host/nfs02.nix.mds.xyz@NIX.MDS.XYZ found in keytab.
(Mon May 21 00:13:33 2018) [[sssd[krb5_child[1824]]]] [check_fast_ccache] (0×0080): find_principal_in_keytab failed for principal host/nfs02.nix.mds.xyz@NIX.MDS.XYZ.
[root@nfs03 sssd]#

Then check your /etc/krb5.conf file:

[root@nfs03 etc]# grep -Ei nfs02 *
krb5.conf:  nfs02.nix.mds.xyz = NIX.MDS.XYZ

And also here:

[root@nfs03 etc]# grep -EiR nfs02 * 2>/dev/null
sssd/sssd.conf:ipa_hostname = nfs02.nix.mds.xyz
sssd/sssd.conf-new:ipa_hostname = nfs02.nix.mds.xyz
[root@nfs03 etc]#

And change accordingly.  The issue resulted from copying the same files from another host, nfs02.

Cheers,
TK

 

Saving random seed failed. / No kdump initial ramdisk found. / Failed to run mkdumprd

Kdump doesn't start?

[root@mbpc-pc grub]# service kdump restart
Memory for crashkernel is not reserved
Please reserve memory by passing "crashkernel=X@Y" parameter to the kernel
Stopping kdump:                                            [FAILED]
No kdump initial ramdisk found.                            [WARNING]
Rebuilding /boot/initrd-4.8.4kdump.img
Saving random seed failed.
Failed to run mkdumprd
[root@mbpc-pc grub]#

Then create the ramdom-seed file like this:

dd if=/dev/urandom of=/var/lib/random-seed bs=1024 count=1

Run

bash -x /etc/init.d/kdump start

to get the command line or simply:

[root@mbpc-pc grub]# /sbin/mkdumprd -d -f –allow-missing /boot/initrd-4.8.4kdump.img 4.8.4
[root@mbpc-pc grub]# ls -altri /boot/initrd-4.8.4kdump.img
71 -rw——-. 1 root root 8852315 Apr 22 12:46 /boot/initrd-4.8.4kdump.img
[root@mbpc-pc grub]#

to get an initial kdump.img going.  Try to restart the kdump daemon:

[root@mbpc-pc grub]# /etc/init.d/kdump restart
Memory for crashkernel is not reserved
Please reserve memory by passing "crashkernel=X@Y" parameter to the kernel
Stopping kdump:                                            [FAILED]
Memory for crashkernel is not reserved
Please reserve memory by passing "crashkernel=X@Y" parameter to the kernel
Starting kdump:                                            [FAILED]
[root@mbpc-pc grub]#

So I checked /boot/grub/grub.conf and had this:

crashkernel=256M

Instead of:

crashkernel=256M@32M

Make the change and restart the system because we need to reload the kernel to take the change into effect.  Now also be careful what values you pick as many sites will suggest variations:

Picking:

crashkernel=256M@16M

results in:

crashkernel reservation failed – memory is in use.

Using:

crashkernel=auto

results in:

kexec_core: crashkernel: memory value expected

Specifying:

Reserving 256MB of memory at 592MB for crashkernel (System RAM: 4092MB)

actually autoallocates a free 256MB chunk at an auto determined offset.  However still get's us the familiar startup message above.  So we need to dig deeper.  We try bash -x /etc/init.d/kdump start to see what the issue is and to our surprise, things start up just fine:

[root@mbpc-pc ~]# /etc/init.d/kdump start
Memory for crashkernel is not reserved
Please reserve memory by passing "crashkernel=X@Y" parameter to the kernel
Starting kdump:                                            [FAILED]
[root@mbpc-pc ~]#
[root@mbpc-pc ~]#
[root@mbpc-pc ~]#
[root@mbpc-pc ~]# bash -x /etc/init.d/kdump start
+ . /etc/init.d/functions
++ TEXTDOMAIN=initscripts
++ umask 022
++ PATH=/sbin:/usr/sbin:/bin:/usr/bin
++ export PATH
++ '[' -z '' ']'
++ COLUMNS=80
++ '[' -z '' ']'
+++ /sbin/consoletype
++ CONSOLETYPE=pty
++ '[' -f /etc/sysconfig/i18n -a -z '' -a -z '' ']'
++ . /etc/profile.d/lang.sh
++ unset LANGSH_SOURCED
++ '[' -z '' ']'
++ '[' -f /etc/sysconfig/init ']'
++ . /etc/sysconfig/init
+++ BOOTUP=color
+++ RES_COL=60
+++ MOVE_TO_COL='echo -en \033[60G'
+++ SETCOLOR_SUCCESS='echo -en \033[0;32m'
+++ SETCOLOR_FAILURE='echo -en \033[0;31m'
+++ SETCOLOR_WARNING='echo -en \033[0;33m'
+++ SETCOLOR_NORMAL='echo -en \033[0;39m'
+++ PROMPT=yes
+++ AUTOSWAP=no
+++ ACTIVE_CONSOLES='/dev/tty[1-6]'
+++ SINGLE=/sbin/sushell
++ '[' pty = serial ']'
++ __sed_discard_ignored_files='/\(~\|\.bak\|\.orig\|\.rpmnew\|\.rpmorig\|\.rpmsave\)$/d'
+++ cat /proc/cmdline
++ strstr 'ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on crashkernel=256M pci=nomsi' rc.debug
++ '[' 'ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on crashkernel=256M pci=nomsi' = 'ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on crashkernel=256M pci=nomsi' ']'
++ return 1
+ KEXEC=/sbin/kexec
+ BOOTDIR=/boot
+ KDUMP_KERNELVER=
+ KDUMP_COMMANDLINE=
+ KDUMP_IDE_NOPROBE_COMMANDLINE=
+ KEXEC_ARGS=
+ KDUMP_CONFIG_FILE=/etc/kdump.conf
+ MEM_RESERVED=
+ MKDUMPRD_ARGS=
+ CLUSTER_CONFIG_FILE=/etc/cluster/cluster.conf
+ FENCE_KDUMP_CONFIG=/etc/sysconfig/fence_kdump
+ SSH_KEY_LOCATION=/root/.ssh/kdump_id_rsa
+ DEFAULT_DUMP_MODE=kdump
+ LOGGER='/usr/bin/logger -p info -t kdump'
+ standard_kexec_args=-p
+ '[' -f /etc/sysconfig/kdump ']'
+ . /etc/sysconfig/kdump
++ KDUMP_KERNELVER=
++ KDUMP_COMMANDLINE=
++ KDUMP_COMMANDLINE_APPEND='irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off acpi_no_memhotplug'
++ MKDUMPRD_ARGS=–allow-missing
++ KEXEC_ARGS=
++ KDUMP_BOOTDIR=/boot
++ KDUMP_IMG=vmlinuz
++ KDUMP_IMG_EXT=
+ single_instance_lock
+ exec
+ flock 9
+ determine_dump_mode
+ fadump_enabled_sys_node=/sys/kernel/fadump_enabled
+ '[' -f /sys/kernel/fadump_enabled ']'
+ case "$1" in
+ '[' kdump == fadump ']'
+ '[' -s /proc/vmcore ']'
+ start
+ sestatus
+ grep -q 'SELinux status.*enabled'
+ selinux_relabel
+ local _path _i _attr
++ path_to_be_relabeled
++ local _path _target _mnt=/ _rmnt
++ is_dump_target_configured
++ local _target
+++ egrep '^ext[234]|^xfs|^btrfs|^raw|^ssh|^nfs|^nfs4|^net' /etc/kdump.conf
++ _target=
++ '[' -n '' ']'
+++ get_save_path
++++ grep '^path' /etc/kdump.conf
++++ awk '{print $2}'
+++ local _save_path=/var/crash
+++ '[' -z /var/crash ']'
+++ echo /var/crash
++ _path=/var/crash
+++ df ///var/crash
+++ tail -1
+++ awk '{ print $NF }'
++ _rmnt=/
++ [[ / == \/ ]]
++ echo ///var/crash
+ _path=///var/crash
+ '[' -z ///var/crash ']'
+ '[' -d ///var/crash ']'
++ find ///var/crash
+ for _i in '$(find $_path)'
++ getfattr -m security.selinux ///var/crash
+ _attr='# file: var/crash
security.selinux'
+ '[' -z '# file: var/crash
security.selinux' ']'
+ save_raw
++ awk '$1 ~ /^raw$/ { print $2; }' /etc/kdump.conf
+ local raw_part=
+ local kdump_dir
+ '[' '' ']'
+ return 0
+ '[' 0 -ne 0 ']'
+ status
+ '[' kdump == fadump ']'
+ '[' '!' -e /sys/kernel/kexec_crash_loaded ']'
+ in_xen_pv_guest
+ grep -q 'xen-percpu-virq  *timer0' /proc/interrupts
+ in_xen_hvm_guest
+ grep -q xen /sys/hypervisor/type
++ cat /sys/kernel/kexec_crash_loaded
+ rc=0
+ '[' 0 == 1 ']'
+ return 1
+ rc=1
+ '[' 1 == 2 ']'
+ '[' 1 == 0 ']'
+ '[' kdump '!=' fadump ']'
+ check_kernel_parameter
+ '[' -z '' ']'
++ cat /proc/cmdline
+ KDUMP_COMMANDLINE='ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on crashkernel=256M pci=nomsi'
++ cat /sys/kernel/kexec_crash_size
+ MEM_RESERVED=268435456
+ '[' 268435456 -eq 0 ']'
+ return 0
+ '[' 0 '!=' 0 ']'
+ check_config
+ '[' kdump == fadump ']'
+ check_kdump_config
+ local modified_files=
+ local force_rebuild=0
+ MKDUMPRD='/sbin/mkdumprd -d -f –allow-missing'
++ grep '^force_rebuild' /etc/kdump.conf
++ cut '-d ' -f2
+ force_rebuild=
+ '[' -n '' ']'
+ '[' -z '' ']'
++ uname -r
+ local running_kernel=4.8.4
++ echo 4.8.4
++ sed s/smp//g
+ kdump_kver=4.8.4
+ kdump_kernel=/boot/vmlinuz-4.8.4
+ kdump_initrd=/boot/initrd-4.8.4kdump.img
+ '[' '!' -f /boot/vmlinuz-4.8.4 ']'
+ '[' '!' -f /boot/initrd-4.8.4kdump.img ']'
+ '[' -z '' ']'
++ stat -c %Y /boot/initrd-4.8.4kdump.img
+ image_time=1524415601
++ grep '^kdump_post' /etc/kdump.conf
++ cut '-d ' -f2
+ EXTRA_FILES=
++ grep '^kdump_pre' /etc/kdump.conf
++ cut '-d ' -f2
+ CHECK_FILE=
+ EXTRA_FILES=' '
++ grep '^extra_modules' /etc/kdump.conf
++ cut '-d ' -f2-
+ CHECK_FILE=
+ EXTRA_FILES='  '
++ grep '^extra_bins' /etc/kdump.conf
++ cut '-d ' -f2-
+ CHECK_FILE=
+ EXTRA_FILES='   '
++ grep '^extra_modules' /etc/kdump.conf
+ FORCE_REBUILD=
+ files='/etc/kdump.conf /boot/vmlinuz-4.8.4    '
+ grep -q '^fence_kdump_nodes' /etc/kdump.conf
+ '[' -f /etc/cluster/cluster.conf ']'
+ for file in '$files'
+ time_stamp=0
+ '[' -f /etc/kdump.conf ']'
++ stat -c %Y /etc/kdump.conf
+ time_stamp=1524414829
+ '[' 1524414829 -gt 1524415601 ']'
+ for file in '$files'
+ time_stamp=0
+ '[' -f /boot/vmlinuz-4.8.4 ']'
++ stat -c %Y /boot/vmlinuz-4.8.4
+ time_stamp=1477845416
+ '[' 1477845416 -gt 1524415601 ']'
+ '[' -n '' -a '!= ' ']'
+ '[' -n '' -a '!= ' ']'
+ in_xen_hvm_guest
+ grep -q xen /sys/hypervisor/type
+ return 0
+ return 0
+ '[' 0 '!=' 0 ']'
+ start_dump
+ '[' kdump == fadump ']'
+ load_kdump
++ uname -m
+ ARCH=x86_64
++ awk '/Slab:.*/ {print $2}' /proc/meminfo
+ KMEMINUSE=152036
++ dc '-e268435456 1024 / p'
+ MEM_RESERVED=262144
++ dc '-e262144 .7 * 10 * 10 / p'
+ MEM_RESERVED=183500
+ '[' x86_64 '!=' i686 -a x86_64 '!=' i386 -a x86_64 '!=' x86_64 ']'
+ '[' x86_64 == i686 -o x86_64 == i386 ']'
+ '[' -f /sys/firmware/efi/systab ']'
+ echo 'irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off acpi_no_memhotplug'
+ grep -q nr_cpus
++ uname -r
+ ver=4.8.4
++ echo 4.8.4
++ cut -d- -f1
+ maj=4.8.4
++ echo 4.8.4
++ cut -d- -f2
+ min=4.8.4
+ min=4
+ '[' 4.8.4 = 2.6.32 ']'
++ prepare_cmdline
++ local cmdline
++ '[' -z 'ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on crashkernel=256M pci=nomsi' ']'
++ cmdline='ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on crashkernel=256M pci=nomsi'
+++ remove_cmdline_param 'ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on crashkernel=256M pci=nomsi' crashkernel mem hugepages hugepagesz
+++ local 'cmdline=ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on crashkernel=256M pci=nomsi'
+++ shift
+++ for arg in '$@'
++++ echo ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on crashkernel=256M pci=nomsi
++++ sed -e 's/\bcrashkernel=[^ ]*\b//g' -e 's/\bcrashkernel\b//g'
+++ cmdline='ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on  pci=nomsi'
+++ for arg in '$@'
++++ echo ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on pci=nomsi
++++ sed -e 's/\bmem=[^ ]*\b//g' -e 's/\bmem\b//g'
+++ cmdline='ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on pci=nomsi'
+++ for arg in '$@'
++++ echo ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on pci=nomsi
++++ sed -e 's/\bhugepages=[^ ]*\b//g' -e 's/\bhugepages\b//g'
+++ cmdline='ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on pci=nomsi'
+++ for arg in '$@'
++++ echo ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on pci=nomsi
++++ sed -e 's/\bhugepagesz=[^ ]*\b//g' -e 's/\bhugepagesz\b//g'
+++ cmdline='ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on pci=nomsi'
+++ echo ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on pci=nomsi
++ cmdline='ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on pci=nomsi'
++ cmdline='ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on pci=nomsi irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off acpi_no_memhotplug'
++ avoid_cdrom_drive
++ local DRIVE=
++ local MEDIA=
++ IDE_DRIVES=(`echo hd{a,b,c,d}`)
+++ echo hda hdb hdc hdd
++ local IDE_DRIVES
++ local COUNTER=0
++ for DRIVE in '${IDE_DRIVES[@]}'
+++ echo 'ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on crashkernel=256M pci=nomsi'
+++ grep -q hda=
++ '[' -f /proc/ide/hda/media ']'
++ for DRIVE in '${IDE_DRIVES[@]}'
+++ echo 'ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on crashkernel=256M pci=nomsi'
+++ grep -q hdb=
++ '[' -f /proc/ide/hdb/media ']'
++ for DRIVE in '${IDE_DRIVES[@]}'
+++ echo 'ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on crashkernel=256M pci=nomsi'
+++ grep -q hdc=
++ '[' -f /proc/ide/hdc/media ']'
++ for DRIVE in '${IDE_DRIVES[@]}'
+++ echo 'ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on crashkernel=256M pci=nomsi'
+++ grep -q hdd=
++ '[' -f /proc/ide/hdd/media ']'
++ '[' 0 -eq 0 ']'
++ KDUMP_IDE_NOPROBE_COMMANDLINE=
++ KDUMP_COMMANDLINE='ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on crashkernel=256M pci=nomsi '
+++ get_bootcpu_initial_apicid
+++ awk '                                                       \
        BEGIN { CPU = "-1"; }                                   \
        $1=="processor" && $2==":"      { CPU = $NF; }          \
        CPU=="0" && /initial apicid/    { print $NF; }          \
        ' /proc/cpuinfo
++ local id=0
++ '[' '!' -z 0 ']'
+++ append_cmdline 'ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on pci=nomsi irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off acpi_no_memhotplug' disable_cpu_apicid 0
+++ local 'cmdline=ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on pci=nomsi irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off acpi_no_memhotplug'
+++ local 'newstr=ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on pci=nomsi irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off acpi_no_memhotplug'
+++ '[' 'ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on pci=nomsi irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off acpi_no_memhotplug' == 'ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on pci=nomsi irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off acpi_no_memhotplug' ']'
+++ cmdline='ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on pci=nomsi irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off acpi_no_memhotplug disable_cpu_apicid=0'
+++ echo ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on pci=nomsi irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off acpi_no_memhotplug disable_cpu_apicid=0
++ cmdline='ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on pci=nomsi irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off acpi_no_memhotplug disable_cpu_apicid=0'
++ echo ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on pci=nomsi irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off acpi_no_memhotplug disable_cpu_apicid=0
+ KDUMP_COMMANDLINE='ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on pci=nomsi irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off acpi_no_memhotplug disable_cpu_apicid=0'
+ grep -q /sys/kernel/debug /proc/mounts
+ mount -t debugfs debug /sys/kernel/debug
+ MNTDEBUG=/sys/kernel/debug
+ /sbin/kexec -p '–command-line=ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on pci=nomsi irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off acpi_no_memhotplug disable_cpu_apicid=0' –initrd=/boot/initrd-4.8.4kdump.img /boot/vmlinuz-4.8.4
+ '[' 0 == 0 ']'
+ umount /sys/kernel/debug
+ /usr/bin/logger -p info -t kdump 'kexec: loaded kdump kernel'
+ return 0
+ return 0
+ '[' 0 '!=' 0 ']'
+ echo -n 'Starting kdump:'
Starting kdump:+ success
+ '[' color '!=' verbose -a -z '' ']'
+ echo_success
+ '[' color = color ']'
+ echo -en '\033[60G'
                                                           + echo -n '['
[+ '[' color = color ']'
+ echo -en '\033[0;32m'
+ echo -n '  OK  '
  OK  + '[' color = color ']'
+ echo -en '\033[0;39m'

+ echo -n ']'
]+ echo -ne '\r'
+ return 0
+ return 0
+ echo

+ /usr/bin/logger -p info -t kdump 'started up'
+ exit 0
[root@mbpc-pc ~]#

Weird.  What's going on then?  This is likely a bash conditional issue somewhere.  Why else would it work if we specify bash -x.  But let's investigate further:

/var/log/messages
Apr 22 15:30:21 mbpc-pc kdump: kexec: failed to load kdump kernel
Apr 22 15:30:21 mbpc-pc kdump: failed to start up
Apr 22 15:30:49 mbpc-pc kdump: kexec: failed to load kdump kernel
Apr 22 15:30:49 mbpc-pc kdump: failed to start up

shows us that kexec failed to load the kdump kernel when running without bash -x.

[root@mbpc-pc ~]# cat /etc/init.d/kdump|grep kexec
KEXEC=/sbin/kexec
standard_kexec_args="-p"
        MEM_RESERVED=`cat /sys/kernel/kexec_crash_size`
        $KEXEC $KEXEC_ARGS $standard_kexec_args \
                $LOGGER "kexec: loaded kdump kernel"
                $LOGGER "kexec: failed to load kdump kernel"
        if [ ! -e /sys/kernel/kexec_crash_loaded ]
        rc=`cat /sys/kernel/kexec_crash_loaded`
                $LOGGER "kexec: failed to unload kdump kernel"
        $LOGGER "kexec: unloaded kdump kernel"
[root@mbpc-pc ~]# cat /sys/kernel/kexec_crash_size
268435456
[root@mbpc-pc ~]#

After a few more hours of digging, we're not that much further along.  So we employ the dirty fix for the time being till we can spend more time and figure out the rest.  Here's the result of the testing:

[root@mbpc-pc ~]# /etc/init.d/kdump start
start(): Calling save_raw() …
KDUMP_COMMANDLINE=
MEM_RESERVED=268435456
Running check_config …
start_dump(): DEFAULT_DUMP_MODE=kdump
load_kdump(): KDUMP_COMMANDLINE_APPEND=irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off acpi_no_memhotplug
load_kdump(): Running /sbin/kexec  -p –command-line=ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on pci=nomsi irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off acpi_no_memhotplug disable_cpu_apicid=0 –initrd=/boot/initrd-4.8.4kdump.img /boot/vmlinuz-4.8.4 …
+ /sbin/kexec -p '–command-line=ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on pci=nomsi irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off acpi_no_memhotplug disable_cpu_apicid=0' –initrd=/boot/initrd-4.8.4kdump.img /boot/vmlinuz-4.8.4
Memory for crashkernel is not reserved
Please reserve memory by passing "crashkernel=X@Y" parameter to the kernel
Then try loading kdump kernel
+ RETV=1
+ set +x
load_kdump(): RETV=1
Starting kdump:                                            [FAILED]
[root@mbpc-pc ~]#
[root@mbpc-pc ~]#
[root@mbpc-pc ~]# bash -x /etc/init.d/kdump status 2>/dev/null
Kdump is not operational
[root@mbpc-pc ~]# /sbin/kexec -p '–command-line=ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on pci=nomsi irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off acpi_no_memhotplug disable_cpu_apicid=0' –initrd=/boot/initrd-4.8.4kdump.img /boot/vmlinuz-4.8.4
[root@mbpc-pc ~]#
[root@mbpc-pc ~]# bash -x /etc/init.d/kdump status 2>/dev/null                                                                                 Kdump is operational
[root@mbpc-pc ~]#
[root@mbpc-pc ~]#
[root@mbpc-pc ~]# bash -x /etc/init.d/kdump stop 2>/dev/null
Stopping kdump:                                            [  OK  ]
[root@mbpc-pc ~]# bash -x /etc/init.d/kdump status 2>/dev/null
Kdump is not operational
[root@mbpc-pc ~]#
[root@mbpc-pc ~]#
[root@mbpc-pc ~]# bash -x /etc/init.d/kdump start 2>/dev/null
start(): Calling save_raw() …
KDUMP_COMMANDLINE=
MEM_RESERVED=268435456
Running check_config …
start_dump(): DEFAULT_DUMP_MODE=kdump
load_kdump(): KDUMP_COMMANDLINE_APPEND=irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off acpi_no_memhotplug
load_kdump(): Running /sbin/kexec  -p –command-line=ro root=/dev/mapper/mbpcvg-rootlv rd_LVM_LV=mbpcvg/rootlv rd_LVM_LV=VGEntertain/olv_swap rd_LVM_LV=mbpcvg/swaplv rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb nomodeset irqpoll pcie_aspm=off amd_iommu=on pci=nomsi irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off acpi_no_memhotplug disable_cpu_apicid=0 –initrd=/boot/initrd-4.8.4kdump.img /boot/vmlinuz-4.8.4 …
load_kdump(): RETV=0
Starting kdump:                                            [  OK  ]
[root@mbpc-pc ~]#
[root@mbpc-pc ~]# bash -x /etc/init.d/kdump status 2>/dev/null
Kdump is operational
[root@mbpc-pc ~]#

 

And the vimdiff of the strace of each piece gives:

  + /usr/bin/strace /sbin/kexec -p '–command-line=ro root=/dev/mapper/|  + /usr/bin/strace /sbin/kexec -p –command-line=ro root=/dev/mapper/m
  execve("/sbin/kexec", ["/sbin/kexec", "-p", "--command-line=ro root=/|  execve("/sbin/kexec", ["/sbin/kexec", "-p", "--command-line=ro", "roo
  brk(0)                                  = 0x1eea000                  |  brk(0)                                  = 0x1e76000
  mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,|  mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
  access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or |  access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or
  open("/etc/ld.so.cache", O_RDONLY)      = 3                          |  open("/etc/ld.so.cache", O_RDONLY)      = 3
  fstat(3, {st_mode=S_IFREG|0644, st_size=114634, ...}) = 0            |  fstat(3, {st_mode=S_IFREG|0644, st_size=114634, ...}) = 0
  mmap(NULL, 114634, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f0fc92b5000    |  mmap(NULL, 114634, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f5b26cfb000
  close(3)                                = 0                          |  close(3)                                = 0
  open("/lib64/libz.so.1", O_RDONLY)      = 3                          |  open("/lib64/libz.so.1", O_RDONLY)      = 3
  read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0 !\240z3\0\0\0|  read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0 !\240z3\0\0\0
  fstat(3, {st_mode=S_IFREG|0755, st_size=91096, ...}) = 0             |  fstat(3, {st_mode=S_IFREG|0755, st_size=91096, ...}) = 0
  mmap(0x337aa00000, 2183696, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENY|  mmap(0x337aa00000, 2183696, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENY
  mprotect(0x337aa15000, 2093056, PROT_NONE) = 0                       |  mprotect(0x337aa15000, 2093056, PROT_NONE) = 0
+ +--  4 lines: mmap(0x337ac14000, 8192, PROT_READ|PROT_WRITE, MAP_PRIV|+ +--  4 lines: mmap(0x337ac14000, 8192, PROT_READ|PROT_WRITE, MAP_PRIV
  fstat(3, {st_mode=S_IFREG|0755, st_size=1926480, ...}) = 0           |  fstat(3, {st_mode=S_IFREG|0755, st_size=1926480, ...}) = 0
  mmap(0x3379600000, 3750152, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENY|  mmap(0x3379600000, 3750152, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENY
  mprotect(0x337978a000, 2097152, PROT_NONE) = 0                       |  mprotect(0x337978a000, 2097152, PROT_NONE) = 0
  mmap(0x337998a000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|  mmap(0x337998a000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED
  mmap(0x337998f000, 18696, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|  mmap(0x337998f000, 18696, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED
  close(3)                                = 0                          |  close(3)                                = 0
  mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,|  mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
  mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,|  mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
  mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,|  mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
  arch_prctl(ARCH_SET_FS, 0x7f0fc92b3700) = 0                          |  arch_prctl(ARCH_SET_FS, 0x7f5b26cf9700) = 0
  mprotect(0x337ac14000, 4096, PROT_READ) = 0                          |  mprotect(0x337ac14000, 4096, PROT_READ) = 0
  mprotect(0x337998a000, 16384, PROT_READ) = 0                         |  mprotect(0x337998a000, 16384, PROT_READ) = 0
  mprotect(0x337941f000, 4096, PROT_READ) = 0                          |  mprotect(0x337941f000, 4096, PROT_READ) = 0
  munmap(0x7f0fc92b5000, 114634)          = 0                          |  munmap(0x7f5b26cfb000, 114634)          = 0
  brk(0)                                  = 0x1eea000                  |  brk(0)                                  = 0x1e76000
  brk(0x1f0b000)                          = 0x1f0b000                  |  brk(0x1e97000)                          = 0x1e97000
  open("/proc/iomem", O_RDONLY)           = 3                          |  open("/proc/iomem", O_RDONLY)           = 3
  fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0                 |  fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
  mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,|  mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
  read(3, "00000000-00000fff : reserved\n000"..., 1024) = 1024         |  read(3, "00000000-00000000 : reserved\n000"..., 1024) = 1024
  read(3, "08\n    fc000000-fc7fffff : 0000:"..., 1024) = 1024         |  read(3, "08\n    00000000-00000000 : 0000:"..., 1024) = 1024
  read(3, "       fdb40000-fdb7ffff : 0000:"..., 1024) = 1024          |  read(3, "       00000000-00000000 : 0000:"..., 1024) = 1024
  read(3, "-fe02afff : ohci_hcd\n  fe02b000-"..., 1024) = 610          |  read(3, "-00000000 : ohci_hcd\n  00000000-"..., 1024) = 608
  read(3, "", 1024)                       = 0                          |  read(3, "", 1024)                       = 0
  close(3)                                = 0                          |  close(3)                                = 0
  munmap(0x7f0fc92d0000, 4096)            = 0                          |  munmap(0x7f5b26d16000, 4096)            = 0
  open("/boot/vmlinuz-4.8.4", O_RDONLY)   = 3                          |  fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0
  fstat(3, {st_mode=S_IFREG|0644, st_size=5045696, ...}) = 0           |  mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
  mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,|  write(1, "Memory for crashkernel is not re"..., 39Memory for crashker
  read(3, "\352\5\0\300\7\214\310\216\330\216\300\216\3201\344\373\374\|  ) = 39
  lseek(3, 0, SEEK_CUR)                   = 16384                      |  write(1, "Please reserve memory by passing"..., 75Please reserve memo
  read(3, "1\300\216\330\216\300\216\320\216\340\216\350H\215-\355\375\|  ) = 75
a.txt                                                26,1           Top b.txt                                                26,1           Top

 

And a better visual in image format:

Kexec Strace and Vimdiff

So then a temporary solution for now is to use the following:

[root@mbpc-pc ~]# cat /etc/rc.local |grep -Ei "kdump|random-seed"
# Create a random seed for kdump. – Tom K.
dd if=/dev/urandom of=/var/lib/random-seed bs=1024 count=1
# kexec of kdump can't start when bash -x isn't used.  So this is a hack.
bash -x /etc/init.d/kdump start 2>/dev/null;
bash -x /etc/init.d/kdump status 2>/dev/null
[root@mbpc-pc ~]#

But unfortunately, that didn't stick either.  Only one choice then to resolve a kernel panic seen earlier on my system and that was to install the latest 4.X Kernel:

[root@mbpc-pc ~]# rpm –import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
[root@mbpc-pc yum.repos.d]# yum –enablerepo=elrepo-kernel install kernel-ml

Cheers,
Tom

NFS reply xid 3844308326 reply ERR 20: Auth Rejected Credentials (client should begin new session)

Getting this? Mounts freezing?  Keep reading:

tcpdump -i eth0 -s 0 -w dump.dat
tcpdump -r dump.dat |grep -Ei "psql02|nfs-c01"

02:55:48.731360 IP psql02.nix.mds.xyz.33991 > nfs-c01.nix.mds.xyz.nfs: Flags [P.], seq 1:693, ack 1, win 229, options [nop,nop,TS val 166990 ecr 5681495], length 692: NFS request xid 3844308326 688 null
02:55:48.731483 IP nfs-c01.nix.mds.xyz.nfs > psql02.nix.mds.xyz.33991: Flags [.], ack 693, win 238, options [nop,nop,TS val 5681498 ecr 166990], length 0
02:55:48.732644 IP nfs-c01.nix.mds.xyz.nfs > psql02.nix.mds.xyz.33991: Flags [P.], seq 1:25, ack 693, win 238, options [nop,nop,TS val 5681499 ecr 166990], length 24: NFS reply xid 3844308326 reply ERR 20: Auth Rejected Credentials (client should begin new session)
02:55:48.732670 IP psql02.nix.mds.xyz.33991 > nfs-c01.nix.mds.xyz.nfs: Flags [.], ack 25, win 229, options [nop,nop,TS val 166991 ecr 5681499], length 0

Try this patch to bring nfs-utils-1.3.0-0.48.el7_4.1.x86_64 up to nfs-utils-1.3.0-0.48.el7_4.2.x86_64:

http://download.rhn.redhat.com/errata/RHBA-2018-0422.html

Update and enjoy?  Nope!  So let's keep digging some more.  After more of an exhaustive search, the result was to add the following firewall lines and restart autofs.  Appears autofs didn't properly start on account of the missing firewall ports causing everything else to freeze, including any additional mounts:


[root@ovirt01 sssd]# firewall-cmd –zone=public –permanent –add-port=111/udp
success
[root@ovirt01 sssd]# firewall-cmd –zone=public –permanent –add-port=2049/ufp
success
[root@ovirt01 sssd]# firewall-cmd –reload
success
[root@ovirt01 sssd]# systemctl restart autofs
[root@ovirt01 sssd]# mount nfs-c01:/n /m
[root@ovirt01 sssd]# umount /m
[root@ovirt01 sssd]#
[root@ovirt01 sssd]#

The following fix was also used in combination with above: 

https://review.gerrithub.io/#/c/ffilz/nfs-ganesha/+/408756/

[root@nfs02 ~]# /bin/ganesha.nfsd -v
NFS-Ganesha Release = V2.7-dev.10
ganesha.nfsd compiled on Apr 30 2018 at 02:21:35
Release comment = GANESHA file server is 64 bits compliant and supports NFS v3,4.0,4.1 (pNFS) and 9P
Git HEAD = 9cf00dccc9ab92ea4a6ec6f7f1f2c043bdc20a4b
Git Describe = V2.7-dev.10-0-g9cf00dc
[root@nfs02 ~]#

On top of the above, also ensure the following gluster errors are handled:

[2018-05-01 22:43:06.412067] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2018-05-01 22:43:55.554833] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:49152 failed (Connection refused); disconnecting socket

 

[root@nfs02 glusterfs]# netstat -pnlt|grep gluster
tcp        0      0 0.0.0.0:24007           0.0.0.0:*               LISTEN      1108/glusterd
tcp        0      0 0.0.0.0:49152           0.0.0.0:*               LISTEN      1432/glusterfsd
[root@nfs02 glusterfs]#


[ CORRECT ]

[root@nfs02 glusterfs]# firewall-cmd –zone=dmz –list-all
dmz
  target: default
  icmp-block-inversion: no
  interfaces:
  sources:
  services: ssh
  ports: 2049/tcp 111/tcp 24007-24008/tcp 38465-38469/tcp 111/udp 22/tcp 22/udp 49000-59999/udp 49000-59999/tcp 20048/tcp 20048/udp 49152/tcp 4501/tcp 4501/udp 10000/tcp 9000/udp 9000/tcp
  protocols:
  masquerade: no
  forward-ports:
  source-ports:
  icmp-blocks:
  rich rules:

[root@nfs02 glusterfs]#


[ INCORRECT ]

[root@nfs01 /]# firewall-cmd –zone=public –list-all
public (active)
  target: default
  icmp-block-inversion: no
  interfaces: eth0
  sources:
  services: ssh dhcpv6-client haproxy
  ports: 24007-24008/tcp 49152/tcp 38465-38469/tcp 111/tcp 111/udp 2049/tcp 4501/tcp 4501/udp 20048/udp 20048/tcp 22/tcp 22/udp 10000/tcp 49000-59999/udp 49000-59999/tcp 9000/udp 9000/tcp 137/udp 138/udp 2049/udp
  protocols:
  masquerade: no
  forward-ports:
  source-ports: 49000-59999/tcp
  icmp-blocks:
  rich rules:

[root@nfs01 /]#


Fix was to remove the source-port by either editing /etc/firewalld/zones/public.xml and removing 

firewall-cmd –zone=public –permanent –remove-source-port=49000-59999/udp
firewall-cmd –zone=public –permanent –remove-source-port=49000-59999/tcp
firewall-cmd –reload


Also ensure haproxy is running on both hosts:


[root@nfs02 systemd]# systemctl status haproxy -l
* haproxy.service – HAProxy Load Balancer
   Loaded: loaded (/usr/lib/systemd/system/haproxy.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2018-05-01 23:21:44 EDT; 20s ago
 Main PID: 2405 (haproxy-systemd)
   CGroup: /system.slice/haproxy.service
           |-2405 /usr/sbin/haproxy-systemd-wrapper -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid
           |-2406 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds
           `-2407 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds

May 01 23:21:44 nfs02.nix.mds.xyz systemd[1]: Started HAProxy Load Balancer.
May 01 23:21:44 nfs02.nix.mds.xyz systemd[1]: Starting HAProxy Load Balancer…
May 01 23:21:44 nfs02.nix.mds.xyz haproxy-systemd-wrapper[2405]: haproxy-systemd-wrapper: executing /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds
[root@nfs02 systemd]# sysctl -p
net.ipv4.ip_nonlocal_bind = 1
net.ipv4.ip_forward = 1
vm.min_free_kbytes = 1048560
[root@nfs02 systemd]#

 

[root@nfs01 ~]# systemctl status haproxy -l
â haproxy.service – HAProxy Load Balancer
   Loaded: loaded (/usr/lib/systemd/system/haproxy.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2018-05-01 23:21:53 EDT; 7s ago
 Main PID: 21707 (haproxy-systemd)
   CGroup: /system.slice/haproxy.service
           ââ21707 /usr/sbin/haproxy-systemd-wrapper -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid
           ââ21708 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds
           ââ21709 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds

May 01 23:21:53 nfs01.nix.mds.xyz systemd[1]: Started HAProxy Load Balancer.
May 01 23:21:53 nfs01.nix.mds.xyz systemd[1]: Starting HAProxy Load Balancer…
May 01 23:21:53 nfs01.nix.mds.xyz haproxy-systemd-wrapper[21707]: haproxy-systemd-wrapper: executing /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds
[root@nfs01 ~]# sysctl -p
net.ipv4.ip_nonlocal_bind = 1
net.ipv4.ip_forward = 1
vm.min_free_kbytes = 1048560
[root@nfs01 ~]#

The other issue that existed was that you did not have a proper PTR and DNS records for the server.  Add them in IPA server.  This indicates that either there is an IPA server replication issue or PTR records are not created:

[root@psql01 ~]# dig -x psql01

; <<>> DiG 9.9.4-RedHat-9.9.4-51.el7_4.2 <<>> -x psql01
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 29853
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;psql01.in-addr.arpa.           IN      PTR

;; AUTHORITY SECTION:
in-addr.arpa.           900     IN      SOA     b.in-addr-servers.arpa. nstld.iana.org. 2018013362 1800 900 604800 3600

;; Query time: 95 msec
;; SERVER: 192.168.0.44#53(192.168.0.44)
;; WHEN: Tue May 01 23:39:52 EDT 2018
;; MSG SIZE  rcvd: 116

[root@psql01 ~]#

But that didn't fix it either. 

FINAL SOLUTION

Until we started to look at auditd:

type=AVC msg=audit(1526965320.850:4094): avc:  denied  { write } for  pid=8714 comm="ganesha.nfsd" name="nfs_0" dev="dm-0" ino=201547689 scontext=system_u:system_r:ganesha_t:s0 tcontext=system_u:object_r:krb5_host_rcache_t:s0 tclass=file
type=SYSCALL msg=audit(1526965320.850:4094): arch=c000003e syscall=2 success=no exit=-13 a0=7f23b0003150 a1=2 a2=180 a3=2 items=0 ppid=1 pid=8714 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="ganesha.nfsd" exe="/usr/bin/ganesha.nfsd" subj=system_u:system_r:ganesha_t:s0 key=(null)
type=PROCTITLE msg=audit(1526965320.850:4094): proctitle=2F7573722F62696E2F67616E657368612E6E667364002D4C002F7661722F6C6F672F67616E657368612F67616E657368612E6C6F67002D66002F6574632F67616E657368612F67616E657368612E636F6E66002D4E004E49565F4556454E54
type=AVC msg=audit(1526965320.850:4095): avc:  denied  { unlink } for  pid=8714 comm="ganesha.nfsd" name="nfs_0" dev="dm-0" ino=201547689 scontext=system_u:system_r:ganesha_t:s0 tcontext=system_u:object_r:krb5_host_rcache_t:s0 tclass=file
type=SYSCALL msg=audit(1526965320.850:4095): arch=c000003e syscall=87 success=no exit=-13 a0=7f23b0004100 a1=7f23b0000050 a2=7f23b0004100 a3=5 items=0 ppid=1 pid=8714 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="ganesha.nfsd" exe="/usr/bin/ganesha.nfsd" subj=system_u:system_r:ganesha_t:s0 key=(null)
type=PROCTITLE msg=audit(1526965320.850:4095): proctitle=2F7573722F62696E2F67616E657368612E6E667364002D4C002F7661722F6C6F672F67616E657368612F67616E657368612E6C6F67002D66002F6574632F67616E657368612F67616E657368612E636F6E66002D4E004E49565F4556454E54

A few lines like this:

grep AVC /var/log/audit/audit.log | audit2allow -M systemd-allow


semodule -i systemd-allow.pp

solved the issue for us.  The error thrown also included this:

May 21 23:53:13 psql01 kernel: CPU: 3 PID: 2273 Comm: mount.nfs Tainted: G             L ————   3.10.0-693.21.1.el7.x86_64 #1
May 21 23:53:13 psql01 kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/14/2014
May 21 23:53:13 psql01 kernel: task: ffff880136335ee0 ti: ffff8801376b0000 task.ti: ffff8801376b0000
May 21 23:53:13 psql01 kernel: RIP: 0010:[]  [] _raw_spin_unlock_irqrestore+0×15/0×20
May 21 23:53:13 psql01 kernel: RSP: 0018:ffff8801376b3a60  EFLAGS: 00000206
May 21 23:53:13 psql01 kernel: RAX: ffffffffc05ab078 RBX: ffff880036973928 RCX: dead000000000200
May 21 23:53:13 psql01 kernel: RDX: ffffffffc05ab078 RSI: 0000000000000206 RDI: 0000000000000206
May 21 23:53:13 psql01 kernel: RBP: ffff8801376b3a60 R08: ffff8801376b3ab8 R09: ffff880137de1200
May 21 23:53:13 psql01 kernel: R10: ffff880036973928 R11: 0000000000000000 R12: ffff880036973928
May 21 23:53:13 psql01 kernel: R13: ffff8801376b3a58 R14: ffff88013fd98a40 R15: ffff8801376b3a58
May 21 23:53:13 psql01 kernel: FS:  00007fab48f07880(0000) GS:ffff88013fd80000(0000) knlGS:0000000000000000
May 21 23:53:13 psql01 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
May 21 23:53:13 psql01 kernel: CR2: 00007f99793d93cc CR3: 000000013761e000 CR4: 00000000000007e0
May 21 23:53:13 psql01 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 21 23:53:13 psql01 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
May 21 23:53:13 psql01 kernel: Call Trace:
May 21 23:53:13 psql01 kernel: [] finish_wait+0×56/0×70
May 21 23:53:13 psql01 kernel: [] nfs_wait_client_init_complete+0xa1/0xe0 [nfs]
May 21 23:53:13 psql01 kernel: [] ? wake_up_atomic_t+0×30/0×30
May 21 23:53:13 psql01 kernel: [] nfs_get_client+0x22b/0×470 [nfs]
May 21 23:53:13 psql01 kernel: [] nfs4_set_client+0×98/0×130 [nfsv4]
May 21 23:53:13 psql01 kernel: [] nfs4_create_server+0x13e/0x3b0 [nfsv4]
May 21 23:53:13 psql01 kernel: [] nfs4_remote_mount+0x2e/0×60 [nfsv4]
May 21 23:53:13 psql01 kernel: [] mount_fs+0x3e/0x1b0
May 21 23:53:13 psql01 kernel: [] ? __alloc_percpu+0×15/0×20
May 21 23:53:13 psql01 kernel: [] vfs_kern_mount+0×67/0×110
May 21 23:53:13 psql01 kernel: [] nfs_do_root_mount+0×86/0xc0 [nfsv4]
May 21 23:53:13 psql01 kernel: [] nfs4_try_mount+0×44/0xc0 [nfsv4]
May 21 23:53:13 psql01 kernel: [] ? get_nfs_version+0×27/0×90 [nfs]
May 21 23:53:13 psql01 kernel: [] nfs_fs_mount+0x4cb/0xda0 [nfs]
May 21 23:53:13 psql01 kernel: [] ? nfs_clone_super+0×140/0×140 [nfs]
May 21 23:53:13 psql01 kernel: [] ? param_set_portnr+0×70/0×70 [nfs]
May 21 23:53:13 psql01 kernel: [] mount_fs+0x3e/0x1b0
May 21 23:53:13 psql01 kernel: [] ? __alloc_percpu+0×15/0×20
May 21 23:53:13 psql01 kernel: [] vfs_kern_mount+0×67/0×110
May 21 23:53:13 psql01 kernel: [] do_mount+0×233/0xaf0
May 21 23:53:13 psql01 kernel: [] SyS_mount+0×96/0xf0
May 21 23:53:13 psql01 kernel: [] system_call_fastpath+0x1c/0×21
May 21 23:53:13 psql01 kernel: [] ? system_call_after_swapgs+0xae/0×146

 

Good Luck!

Cheers,
TK

rpc mount export: RPC: Unable to receive; errno = Connection refused

For the below errors:

[root@psql02 log]# showmount -e nfs02
rpc mount export: RPC: Unable to receive; errno = Connection refused
[root@psql02 log]#

Apr 16 01:12:37 nfs02 kernel: FINAL_REJECT: IN=eth0 OUT= MAC=00:50:56:86:2d:21:00:50:56:86:3c:c7:08:00 SRC=192.168.0.124 DST=192.168.0.119 LEN=60 TOS=0×00 PREC=0×00 TTL=64 ID=44729 DF PROTO=TCP SPT=978 DPT=20048 WINDOW=29200 RES=0×00 SYN URGP=0

[root@nfs02 log]#
[root@nfs02 log]#
[root@nfs02 log]#
[root@nfs02 log]# firewall-cmd –zone=public –list-all
public
  target: default
  icmp-block-inversion: no
  interfaces:
  sources:
  services: haproxy
  ports: 20048/udp 2049/tcp 111/tcp 111/udp 24007-24008/tcp 38465-38469/tcp 4501/tcp 4501/udp 22/tcp 22/udp 49000-59999/udp 49000-59999/tcp 9000/tcp 9000/udp 137/udp 138/udp
  protocols:
  masquerade: no
  forward-ports:
  source-ports:
  icmp-blocks:
  rich rules:

[root@nfs02 log]#

Ensure you have port 20048 TCP added to your firewall:

  995  firewall-cmd –zone=public –permanent –add-port=20048/tcp
  996  firewall-cmd –reload

Cheers,
TK

psql: error while loading shared libraries: libpq.so.rh-postgresql95-5: cannot open shared object file: No such file or directory

Well heck:

-bash-4.2$ psql
psql: error while loading shared libraries: libpq.so.rh-postgresql95-5: cannot open shared object file: No such file or directory
-bash-4.2$

So let's see what's going on:

[root@ovirt01 ~]# find / -iname libpq.so*
/usr/lib64/libpq.so.5
/usr/lib64/libpq.so.5.5
/opt/rh/rh-postgresql95/root/usr/lib64/libpq.so.rh-postgresql95-5
/opt/rh/rh-postgresql95/root/usr/lib64/libpq.so.rh-postgresql95-5.8
[root@ovirt01 ~]#

So we can see it's in the lib64 path.  And within the root postgres folder, we see that usr/lib is empty that lib points too:

[root@ovirt01 root]# find / -iname psql
/opt/rh/rh-postgresql95/root/usr/bin/psql
[root@ovirt01 root]# pwd
/opt/rh/rh-postgresql95/root
[root@ovirt01 root]# ls -altrid lib
201829110 lrwxrwxrwx. 1 root root 7 Feb 12 11:08 lib -> usr/lib
[root@ovirt01 root]# pwd
/opt/rh/rh-postgresql95/root
[root@ovirt01 root]# ls -altri usr/lib/
total 4
134420487 dr-xr-xr-x.  2 root root    6 Feb 16  2016 .
 67638534 drwxr-xr-x. 13 root root 4096 Feb 12 11:08 ..
[root@ovirt01 root]#

So obviously, nothing that uses usr/lib/ will get anything useful out of it.  But the following path under the same folder above has lot's of usefull things:

[root@ovirt01 root]# ls -altrid lib64
201829111 lrwxrwxrwx. 1 root root 9 Feb 12 11:08 lib64 -> usr/lib64
[root@ovirt01 root]#

Since the lib folder doesn't have anything useful in it, a simple solution is to link lib to usr/lib64 instead.  So let's do that.  Sure enough:

201829110 lrwxrwxrwx.  1 root root    9 Apr 15 00:56 lib -> usr/lib64

And here we go again:

-bash-4.2$ strace psql
execve("/opt/rh/rh-postgresql95/root/usr/bin/psql", ["psql"], [/* 21 vars */]) = 0
brk(NULL)                               = 0x17b5000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa7a1ff5000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=42118, …}) = 0
mmap(NULL, 42118, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fa7a1fea000
close(3)                                = 0
open("/lib64/tls/x86_64/libpq.so.rh-postgresql95-5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/lib64/tls/x86_64", 0x7ffcdfe3c4a0) = -1 ENOENT (No such file or directory)
open("/lib64/tls/libpq.so.rh-postgresql95-5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/lib64/tls", {st_mode=S_IFDIR|0555, st_size=6, …}) = 0
open("/lib64/x86_64/libpq.so.rh-postgresql95-5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/lib64/x86_64", 0x7ffcdfe3c4a0)   = -1 ENOENT (No such file or directory)
open("/lib64/libpq.so.rh-postgresql95-5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/lib64", {st_mode=S_IFDIR|0555, st_size=40960, …}) = 0
open("/usr/lib64/tls/x86_64/libpq.so.rh-postgresql95-5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/tls/x86_64", 0x7ffcdfe3c4a0) = -1 ENOENT (No such file or directory)
open("/usr/lib64/tls/libpq.so.rh-postgresql95-5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/tls", {st_mode=S_IFDIR|0555, st_size=6, …}) = 0
open("/usr/lib64/x86_64/libpq.so.rh-postgresql95-5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/x86_64", 0x7ffcdfe3c4a0) = -1 ENOENT (No such file or directory)
open("/usr/lib64/libpq.so.rh-postgresql95-5", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/lib64", {st_mode=S_IFDIR|0555, st_size=40960, …}) = 0
writev(2, [{"psql", 4}, {": ", 2}, {"error while loading shared libra"..., 36}, {": ", 2}, {"libpq.so.rh-postgresql95-5", 26}, {": ", 2}, {"cannot open shared object file", 30}, {": ", 2}, {"No such file or directory", 25}, {"\n", 1}], 10psql: error while loading shared libraries: libpq.so.rh-postgresql95-5: cannot open shared object file: No such file or directory
) = 130
exit_group(127)                         = ?
+++ exited with 127 +++
-bash-4.2$

 

So we need to add it to the default library path.  Easy enough:

[root@ovirt01 ld.so.conf.d]#
[root@ovirt01 ld.so.conf.d]# cat postgres-x86_64.conf
/opt/rh/rh-postgresql95/root/lib
/opt/rh/rh-postgresql95/root/lib64
[root@ovirt01 ld.so.conf.d]# ldconfig
[root@ovirt01 ld.so.conf.d]# strings /etc/ld.so.cache |grep -Ei postgresql95
libpq.so.rh-postgresql95-5
/opt/rh/rh-postgresql95/root/lib64/libpq.so.rh-postgresql95-5
libpgtypes.so.rh-postgresql95-3
/opt/rh/rh-postgresql95/root/lib64/libpgtypes.so.rh-postgresql95-3
libecpg_compat.so.rh-postgresql95-3
/opt/rh/rh-postgresql95/root/lib64/libecpg_compat.so.rh-postgresql95-3
libecpg.so.rh-postgresql95-6
/opt/rh/rh-postgresql95/root/lib64/libecpg.so.rh-postgresql95-6
[root@ovirt01 ld.so.conf.d]#

And let's try again.  And sure enough, we have a winner:

-bash-4.2$
-bash-4.2$ psql
psql (9.5.9)
Type "help" for help.

postgres=# \l
                                             List of databases
         Name         |        Owner         | Encoding |   Collate   |    Ctype    |   Access privileges
———————-+———————-+———-+————-+————-+———————–
 engine               | engine               | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 ovirt_engine_history | ovirt_engine_history | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 postgres             | postgres             | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 template0            | postgres             | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
                      |                      |          |             |             | postgres=CTc/postgres
 template1            | postgres             | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
                      |                      |          |             |             | postgres=CTc/postgres
(5 rows)

postgres=#

Cheers,
TK

 

NFS reply xid reply ERR 20: Auth Invalid failure code 13 and AD logins become hung when using NFS home directories.

You get the following error message:

15:27:35.430633 IP 192.168.0.80.nfs > 192.168.0.145.843: Flags [P.], seq 29:53, ack 417, win 235, options [nop,nop,TS val 6635947 ecr 5159126], length 24: NFS reply xid 2938911306 reply ERR 20: Auth Invalid failure code 13

Checking the nfs01 / nfs02 server we see alot of the following:

[2018-04-08 17:19:39.014330] E [MSGID: 108006] [afr-common.c:5006:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.

[2018-04-08 17:32:21.714643] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.

[2018-04-08 17:32:21.734187] E [MSGID: 101046] [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null

[2018-04-08 17:32:21.734437] I [MSGID: 104039] [glfs-resolve.c:935:__glfs_active_subvol] 0-gv01: first lookup on graph 6e667330-322e-6e69-782e-6d64732e7879 (0) failed (Transport endpoint is not connected) [Transport endpoint is not connected]

Looks like the backend store for our NFS mount is not looking too hot right now.  So we go into troubleshooting Gluster.

 

Cheers,
TK

/usr/bin/ganesha.nfsd: /lib64/libntirpc.so.1.6: version `NTIRPC_1.6.1′ not found (required by /usr/bin/ganesha.nfsd)

What to do when you get this message below:


[root@nfs01 ganesha]# systemctl status nfs-ganesha -l
â nfs-ganesha.service – NFS-Ganesha file server
   Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Sun 2018-04-08 13:22:32 EDT; 1min 8s ago
     Docs: http://github.com/nfs-ganesha/nfs-ganesha/wiki
  Process: 2033 ExecStart=/bin/bash -c ${NUMACTL} ${NUMAOPTS} /usr/bin/ganesha.nfsd ${OPTIONS} ${EPOCH} (code=exited, status=1/FAILURE)

Apr 08 13:22:32 nfs01.nix.mds.xyz systemd[1]: Starting NFS-Ganesha file server…
Apr 08 13:22:32 nfs01.nix.mds.xyz bash[2033]: /usr/bin/ganesha.nfsd: /lib64/libntirpc.so.1.6: version `NTIRPC_1.6.1' not found (required by /usr/bin/ganesha.nfsd)
Apr 08 13:22:32 nfs01.nix.mds.xyz systemd[1]: nfs-ganesha.service: control process exited, code=exited status=1
Apr 08 13:22:32 nfs01.nix.mds.xyz systemd[1]: Failed to start NFS-Ganesha file server.
Apr 08 13:22:32 nfs01.nix.mds.xyz systemd[1]: Unit nfs-ganesha.service entered failed state.
Apr 08 13:22:32 nfs01.nix.mds.xyz systemd[1]: nfs-ganesha.service failed.
[root@nfs01 ganesha]#

I simply went into my custom build of nfs ganesha and issued a reinstallation of the binaries. 

[root@nfs02 nfs-ganesha]# make install
[ 16%] Built target ntirpc
[ 17%] Built target log
[ 19%] Built target config_parsing
[ 22%] Built target cidr
[ 23%] Built target avltree
[ 23%] Built target hashtable
[ 28%] Built target sal
[ 29%] Built target rpcal
[ 29%] Built target nfs4callbacks
[ 53%] Built target nfsproto
[ 55%] Built target nfs_mnt_xdr
[ 58%] Built target nlm
[ 58%] Built target gos
[ 58%] Built target string_utils
[ 60%] Built target rquota
[ 68%] Built target 9p
[ 69%] Built target sm_notify.ganesha
[ 70%] Built target hash
[ 70%] Built target netgroup_cache
[ 75%] Built target support
[ 75%] Built target uid2grp
[ 77%] Built target fsalnull
[ 80%] Built target fsalmdcache
[ 81%] Built target fsalpseudo
[ 82%] Built target fsalproxy
[ 87%] Built target fsalgpfs
[ 89%] Built target fsalgluster
[ 90%] Built target fsalmem
[ 90%] Built target idmap
[ 93%] Built target FsalCore
[ 96%] Built target MainServices
[100%] Built target ganesha.nfsd
Install the project…
– Install configuration: "Debug"
– Skipping: /etc/ganesha/ganesha.conf (already exists)
Files "/root/ganesha/nfs-ganesha/src/config_samples/ganesha.conf.example" to "/etc/ganesha/ganesha.conf" are different.
– Installing: /etc/ganesha/ganesha.conf.example
– Installing: /usr/share/doc/ganesha/config_samples
– Up-to-date: /usr/share/doc/ganesha/config_samples/ds.conf
– Up-to-date: /usr/share/doc/ganesha/config_samples/export.txt
– Up-to-date: /usr/share/doc/ganesha/config_samples/ganesha.conf.example
– Up-to-date: /usr/share/doc/ganesha/config_samples/gluster.conf
– Up-to-date: /usr/share/doc/ganesha/config_samples/gpfs.conf
– Up-to-date: /usr/share/doc/ganesha/config_samples/gpfs.ganesha.exports.conf
– Up-to-date: /usr/share/doc/ganesha/config_samples/gpfs.ganesha.log.conf
– Up-to-date: /usr/share/doc/ganesha/config_samples/gpfs.ganesha.main.conf
– Up-to-date: /usr/share/doc/ganesha/config_samples/gpfs.ganesha.nfsd.conf
– Up-to-date: /usr/share/doc/ganesha/config_samples/logging.txt
– Up-to-date: /usr/share/doc/ganesha/config_samples/logrotate_fsal_gluster
– Up-to-date: /usr/share/doc/ganesha/config_samples/logrotate_ganesha
– Up-to-date: /usr/share/doc/ganesha/config_samples/mem.conf
– Up-to-date: /usr/share/doc/ganesha/config_samples/rgw.conf
– Up-to-date: /usr/share/doc/ganesha/config_samples/rgw_bucket.conf
– Up-to-date: /usr/share/doc/ganesha/config_samples/vfs.conf
– Up-to-date: /usr/share/doc/ganesha/config_samples/xfs.conf
– Up-to-date: /usr/share/doc/ganesha/config_samples/README
– Up-to-date: /usr/share/doc/ganesha/config_samples/ceph.conf
– Up-to-date: /usr/share/doc/ganesha/config_samples/config.txt
– Installing: /var/run/ganesha
– Installing: /usr/lib64/pkgconfig/libntirpc.pc
– Installing: /usr/include/ntirpc
– Installing: /usr/include/ntirpc/fpmath.h
– Installing: /usr/include/ntirpc/getpeereid.h
– Installing: /usr/include/ntirpc/intrinsic.h
– Installing: /usr/include/ntirpc/libc_private.h
– Installing: /usr/include/ntirpc/misc
– Installing: /usr/include/ntirpc/misc/abstract_atomic.h
– Installing: /usr/include/ntirpc/misc/bsd_epoll.h
– Installing: /usr/include/ntirpc/misc/city.h
– Installing: /usr/include/ntirpc/misc/citycrc.h
– Installing: /usr/include/ntirpc/misc/event.h
– Installing: /usr/include/ntirpc/misc/opr.h
– Installing: /usr/include/ntirpc/misc/opr_queue.h
– Installing: /usr/include/ntirpc/misc/os_epoll.h
– Installing: /usr/include/ntirpc/misc/portable.h
– Installing: /usr/include/ntirpc/misc/queue.h
– Installing: /usr/include/ntirpc/misc/rbtree.h
– Installing: /usr/include/ntirpc/misc/rbtree_x.h
– Installing: /usr/include/ntirpc/misc/socket.h
– Installing: /usr/include/ntirpc/misc/stdint.h
– Installing: /usr/include/ntirpc/misc/stdio.h
– Installing: /usr/include/ntirpc/misc/timespec.h
– Installing: /usr/include/ntirpc/misc/wait_queue.h
– Installing: /usr/include/ntirpc/misc/winpthreads.h
– Installing: /usr/include/ntirpc/namespace.h
– Installing: /usr/include/ntirpc/netconfig.h
– Installing: /usr/include/ntirpc/reentrant.h
– Installing: /usr/include/ntirpc/rpc
– Installing: /usr/include/ntirpc/rpc/auth_inline.h
– Installing: /usr/include/ntirpc/rpc/auth_stat.h
– Installing: /usr/include/ntirpc/rpc/clnt_stat.h
– Installing: /usr/include/ntirpc/rpc/des.h
– Installing: /usr/include/ntirpc/rpc/des_crypt.h
– Installing: /usr/include/ntirpc/rpc/gss_internal.h
– Installing: /usr/include/ntirpc/rpc/nettype.h
– Installing: /usr/include/ntirpc/rpc/pool_queue.h
– Installing: /usr/include/ntirpc/rpc/rpc.h
– Installing: /usr/include/ntirpc/rpc/rpc_cksum.h
– Installing: /usr/include/ntirpc/rpc/rpc_com.h
– Installing: /usr/include/ntirpc/rpc/rpc_err.h
– Installing: /usr/include/ntirpc/rpc/rpc_msg.h
– Installing: /usr/include/ntirpc/rpc/rpcent.h
– Installing: /usr/include/ntirpc/rpc/svc_rqst.h
– Installing: /usr/include/ntirpc/rpc/tirpc_compat.h
– Installing: /usr/include/ntirpc/rpc/xdr_ioq.h
– Installing: /usr/include/ntirpc/rpc/auth.h
– Installing: /usr/include/ntirpc/rpc/auth_gss.h
– Installing: /usr/include/ntirpc/rpc/auth_unix.h
– Installing: /usr/include/ntirpc/rpc/clnt.h
– Installing: /usr/include/ntirpc/rpc/pmap_prot.h
– Installing: /usr/include/ntirpc/rpc/pmap_rmt.h
– Installing: /usr/include/ntirpc/rpc/rpcb_clnt.h
– Installing: /usr/include/ntirpc/rpc/rpcb_prot.h
– Installing: /usr/include/ntirpc/rpc/rpcb_prot.x
– Installing: /usr/include/ntirpc/rpc/svc.h
– Installing: /usr/include/ntirpc/rpc/svc_auth.h
– Installing: /usr/include/ntirpc/rpc/types.h
– Installing: /usr/include/ntirpc/rpc/work_pool.h
– Installing: /usr/include/ntirpc/rpc/xdr.h
– Installing: /usr/include/ntirpc/rpc/xdr_inline.h
– Installing: /usr/include/ntirpc/rpcsvc
– Installing: /usr/include/ntirpc/rpcsvc/crypt.h
– Installing: /usr/include/ntirpc/un-namespace.h
– Installing: /usr/include/ntirpc/version.h
– Up-to-date: /usr/lib64/libntirpc.so.1.6.1
– Installing: /usr/lib64/libntirpc.so.1.6
– Installing: /usr/lib64/libntirpc.so
– Up-to-date: /usr/lib64/ganesha/libfsalnull.so.4.2.0
– Up-to-date: /usr/lib64/ganesha/libfsalnull.so.4
– Up-to-date: /usr/lib64/ganesha/libfsalnull.so
– Up-to-date: /usr/lib64/ganesha/libfsalproxy.so.4.2.0
– Up-to-date: /usr/lib64/ganesha/libfsalproxy.so.4
– Up-to-date: /usr/lib64/ganesha/libfsalproxy.so
– Up-to-date: /usr/lib64/ganesha/libfsalgpfs.so.4.2.0
– Up-to-date: /usr/lib64/ganesha/libfsalgpfs.so.4
– Up-to-date: /usr/lib64/ganesha/libfsalgpfs.so
– Up-to-date: /usr/lib64/ganesha/libfsalgluster.so.4.2.0
– Up-to-date: /usr/lib64/ganesha/libfsalgluster.so.4
– Up-to-date: /usr/lib64/ganesha/libfsalgluster.so
– Up-to-date: /usr/lib64/ganesha/libfsalmem.so.4.2.0
– Up-to-date: /usr/lib64/ganesha/libfsalmem.so.4
– Up-to-date: /usr/lib64/ganesha/libfsalmem.so
– Up-to-date: /usr/bin/ganesha.nfsd
[root@nfs02 nfs-ganesha]#

 

I should however move over to the rpm packages now that they are at Ganesha V2.6 .  

Cheers,
TK

kernel: NFS: nfs4_discover_server_trunking unhandled error -512. Exiting with error EIO and mount hangs

Mount still hangs and restarting NFS and autofs doesn't work to resolve this error below:

Apr  8 12:12:46 ovirt01 systemd: Stopping Automounts filesystems on demand…
Apr  8 12:12:46 ovirt01 kernel: NFS: nfs4_discover_server_trunking unhandled error -512. Exiting with error EIO
Apr  8 12:12:47 ovirt01 automount[1487]: umount_autofs_indirect: ask umount returned busy /n
Apr  8 12:12:48 ovirt01 ovsdb-server: ovs|111192|stream_ssl|ERR|Private key must be configured to use SSL
Apr  8 12:12:48 ovirt01 ovsdb-server: ovs|111193|stream_ssl|ERR|Certificate must be configured to use SSL
Apr  8 12:12:49 ovirt01 systemd: Starting Automounts filesystems on demand…
Apr  8 12:12:49 ovirt01 automount[2165]: lookup_read_map: lookup(sss): getautomntent_r: No such file or directory
Apr  8 12:12:49 ovirt01 systemd: Started Automounts filesystems on demand.

This looks like a potential kernel bug as indicated through this thread.  Upgrade kernel and try again.  And this is the RH document that speak to more of this.  However despite the kernel versions, the issue continued again.  See the next dated post for more info.  Now despite this, we continued to have issues, even through reboots and it appeared that the above kernel upgrade didn't do very much at all.  So we investigated by attempting to mount the share remotely using both participating nodes and the VIP:

[root@ipaclient01 /]# mount nfs02:/n /n -v
mount.nfs: timeout set for Sun Apr  8 19:58:13 2018
mount.nfs: trying text-based options 'vers=4.1,addr=192.168.0.119,clientaddr=192.168.0.236'
mount.nfs: mount(2): No such file or directory
mount.nfs: trying text-based options 'addr=192.168.0.119'
mount.nfs: prog 100003, trying vers=3, prot=6
mount.nfs: trying 192.168.0.119 prog 100003 vers 3 prot TCP port 2049
mount.nfs: prog 100005, trying vers=3, prot=17
mount.nfs: trying 192.168.0.119 prog 100005 vers 3 prot UDP port 20048
mount.nfs: mount(2): Permission denied
mount.nfs: access denied by server while mounting nfs02:/n

[root@ipaclient01 /]# showmount -e nfs02
Export list for nfs02:
/n (everyone)
[root@ipaclient01 /]#

Next we try with the following, remembering to unmount /n right after each attempt:

mount nfs02:/n /n -v
mount nfs01:/n /n -v
mount nfs-c01:/n /n -v

[root@ipaclient01 /]# showmount -e nfs02
Export list for nfs02:
/n (everyone)
[root@ipaclient01 /]#

 

Though we're still not 100% the exact solution that fixed this but we did recompile NFS Ganesha and reinstalled it also taking the Gluster FS offline and online numerous times as well as doing the same to NFS Ganesha and ensuring that HAPROXY and keepalived is working on both nodes nodes.  We also added HAPROXY statistics to the configuration with these lines at the end of each /etc/haproxy/haproxy.cfg file:

[root@nfs02 ganesha]# cat /etc/haproxy/haproxy.cfg
global
    log         127.0.0.1 local2 debug
    stats       socket /var/run/haproxy.sock mode 0600 level admin
    # stats     socket /var/lib/haproxy/stats
    maxconn     4000
    user        haproxy
    group       haproxy
    daemon
    debug

defaults
    mode                    tcp
    log                     global
    option                  dontlognull
    option                  redispatch
    retries                 3
    timeout http-request    10s
    timeout queue           1m
    timeout connect         10s
    timeout client          1m
    timeout server          1m
    timeout http-keep-alive 10s
    timeout check           10s
    maxconn                 3000


frontend nfs-in
    bind nfs-c01:2049
    mode tcp
    option tcplog
    default_backend             nfs-back


backend nfs-back
    balance     roundrobin
    server      nfs02.nix.mds.xyz    nfs02.nix.mds.xyz:2049 check
    server      nfs01.nix.mds.xyz    nfs01.nix.mds.xyz:2049 check

listen stats
    bind :9000
    mode http
    stats enable
    stats hide-version
    stats realm Haproxy\ Statistics
    stats uri /haproxy-stats
    stats auth admin:passw0rd
[root@nfs02 ganesha]#

Sure enough, once this was done, the mounts worked like a charm from all the other hosts.  Another issue was that we were calling dhclient within a certain script at boot time.  This allocated a secondary IP to the interface potentially confusing the NFS servers though this is unconfirmed.  Nontheless, the host was reconfigured with only one static IP.

Digging some more:


==> n.log <==
[2018-04-09 05:08:13.704156] I [MSGID: 100030] [glusterfsd.c:2556:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.13.2 (args: /usr/sbin/glusterfs –process-name fuse –volfile-server=nfs01 –volfile-id=/gv01 /n)
[2018-04-09 05:08:13.711255] W [MSGID: 101002] [options.c:995:xl_opt_validate] 0-glusterfs: option 'address-family' is deprecated, preferred is 'transport.address-family', continuing with correction
[2018-04-09 05:08:13.728297] W [socket.c:3216:socket_connect] 0-glusterfs: Error disabling sockopt IPV6_V6ONLY: "Protocol not available"
[2018-04-09 05:08:13.729025] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2018-04-09 05:08:13.737757] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
[2018-04-09 05:08:13.738114] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 3
[2018-04-09 05:08:13.738203] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 4
[2018-04-09 05:08:13.738324] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 5
[2018-04-09 05:08:13.738330] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 6
[2018-04-09 05:08:13.738655] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 7
[2018-04-09 05:08:13.738742] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 8
[2018-04-09 05:08:13.739460] W [MSGID: 101174] [graph.c:363:_log_if_unknown_option] 0-gv01-readdir-ahead: option 'parallel-readdir' is not recognized
[2018-04-09 05:08:13.739787] I [MSGID: 114020] [client.c:2360:notify] 0-gv01-client-0: parent translators are ready, attempting connect on transport
[2018-04-09 05:08:13.747040] W [socket.c:3216:socket_connect] 0-gv01-client-0: Error disabling sockopt IPV6_V6ONLY: "Protocol not available"
[2018-04-09 05:08:13.747372] I [MSGID: 114020] [client.c:2360:notify] 0-gv01-client-1: parent translators are ready, attempting connect on transport
[2018-04-09 05:08:13.747883] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2018-04-09 05:08:13.748026] I [MSGID: 114018] [client.c:2285:client_rpc_notify] 0-gv01-client-0: disconnected from gv01-client-0. Client process will keep trying to connect to glusterd until brick's port is available
[2018-04-09 05:08:13.748070] W [MSGID: 108001] [afr-common.c:5391:afr_notify] 0-gv01-replicate-0: Client-quorum is not met
[2018-04-09 05:08:13.754493] W [socket.c:3216:socket_connect] 0-gv01-client-1: Error disabling sockopt IPV6_V6ONLY: "Protocol not available"
Final graph:
+——————————————————————————+
  1: volume gv01-client-0
  2:     type protocol/client
  3:     option ping-timeout 42
  4:     option remote-host nfs01
  5:     option remote-subvolume /bricks/0/gv01
  6:     option transport-type socket
  7:     option transport.address-family inet
  8:     option username 916ccf06-dc1d-467f-bc3d-f00a7449618f
  9:     option password a44739e0-9587-411f-8e6a-9a6a4e46156c
 10:     option event-threads 8
 11:     option transport.tcp-user-timeout 0
 12:     option transport.socket.keepalive-time 20
 13:     option transport.socket.keepalive-interval 2
 14:     option transport.socket.keepalive-count 9
 15:     option send-gids true
 16: end-volume
 17:
 18: volume gv01-client-1
 19:     type protocol/client
 20:     option ping-timeout 42
 21:     option remote-host nfs02
 22:     option remote-subvolume /bricks/0/gv01
 23:     option transport-type socket
 24:     option transport.address-family inet
 25:     option username 916ccf06-dc1d-467f-bc3d-f00a7449618f
 26:     option password a44739e0-9587-411f-8e6a-9a6a4e46156c
 27:     option event-threads 8
 28:     option transport.tcp-user-timeout 0
 29:     option transport.socket.keepalive-time 20
 30:     option transport.socket.keepalive-interval 2
 31:     option transport.socket.keepalive-count 9
 32:     option send-gids true
 33: end-volume
 34:
 35: volume gv01-replicate-0
 36:     type cluster/replicate
 37:     option afr-pending-xattr gv01-client-0,gv01-client-1
 38:     option quorum-type auto
 39:     option use-compound-fops off
 40:     subvolumes gv01-client-0 gv01-client-1
 41: end-volume
 42:
 43: volume gv01-dht
 44:     type cluster/distribute
 45:     option lock-migration off
 46:     subvolumes gv01-replicate-0
 47: end-volume
 48:
 49: volume gv01-write-behind
 50:     type performance/write-behind
 51:     option cache-size 8MB
 52:     subvolumes gv01-dht
 53: end-volume
 54:
 55: volume gv01-read-ahead
 56:     type performance/read-ahead
 57:     subvolumes gv01-write-behind
 58: end-volume
 59:
 60: volume gv01-readdir-ahead
 61:     type performance/readdir-ahead
 62:     option parallel-readdir off
 63:     option rda-request-size 131072
 64:     option rda-cache-limit 10MB
 65:     subvolumes gv01-read-ahead
 66: end-volume
 67:
 68: volume gv01-io-cache
 69:     type performance/io-cache
 70:     option cache-size 1GB
 71:     subvolumes gv01-readdir-ahead
 72: end-volume
 73:
 74: volume gv01-quick-read
 75:     type performance/quick-read
 76:     option cache-size 1GB
 77:     subvolumes gv01-io-cache
 78: end-volume
 79:
 80: volume gv01-open-behind
 81:     type performance/open-behind
 82:     subvolumes gv01-quick-read
 83: end-volume
 84:
 85: volume gv01-md-cache
 86:     type performance/md-cache
 87:     subvolumes gv01-open-behind
 88: end-volume
 89:
 90: volume gv01
 91:     type debug/io-stats
 92:     option log-level INFO
 93:     option latency-measurement off
 94:     option count-fop-hits off
 95:     subvolumes gv01-md-cache
 96: end-volume
 97:
 98: volume meta-autoload
 99:     type meta
100:     subvolumes gv01
101: end-volume
102:
+——————————————————————————+
[2018-04-09 05:08:13.922631] E [socket.c:2374:socket_connect_finish] 0-gv01-client-1: connection to 192.168.0.119:24007 failed (No route to host); disconnecting socket
[2018-04-09 05:08:13.922690] E [MSGID: 108006] [afr-common.c:5164:__afr_handle_child_down_event] 0-gv01-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2018-04-09 05:08:13.926201] I [fuse-bridge.c:4205:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.22
[2018-04-09 05:08:13.926245] I [fuse-bridge.c:4835:fuse_graph_sync] 0-fuse: switched to graph 0
[2018-04-09 05:08:13.926518] I [MSGID: 108006] [afr-common.c:5444:afr_local_init] 0-gv01-replicate-0: no subvolumes up
[2018-04-09 05:08:13.926671] E [MSGID: 101046] [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null
[2018-04-09 05:08:13.926762] E [fuse-bridge.c:4271:fuse_first_lookup] 0-fuse: first lookup on root failed (Transport endpoint is not connected)
[2018-04-09 05:08:13.927207] I [MSGID: 108006] [afr-common.c:5444:afr_local_init] 0-gv01-replicate-0: no subvolumes up
[2018-04-09 05:08:13.927262] E [MSGID: 101046] [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null
[2018-04-09 05:08:13.927301] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001: failed to resolve (Transport endpoint is not connected)
[2018-04-09 05:08:13.927339] E [fuse-bridge.c:900:fuse_getattr_resume] 0-glusterfs-fuse: 2: GETATTR 1 (00000000-0000-0000-0000-000000000001) resolution failed
[2018-04-09 05:08:13.931497] I [MSGID: 108006] [afr-common.c:5444:afr_local_init] 0-gv01-replicate-0: no subvolumes up
[2018-04-09 05:08:13.931558] E [MSGID: 101046] [dht-common.c:1501:dht_lookup_dir_cbk] 0-gv01-dht: dict is null
[2018-04-09 05:08:13.931599] W [fuse-resolve.c:132:fuse_resolve_gfid_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001: failed to resolve (Transport endpoint is not connected)
[2018-04-09 05:08:13.931623] E [fuse-bridge.c:900:fuse_getattr_resume] 0-glusterfs-fuse: 3: GETATTR 1 (00000000-0000-0000-0000-000000000001) resolution failed
[2018-04-09 05:08:13.937258] I [fuse-bridge.c:5093:fuse_thread_proc] 0-fuse: initating unmount of /n
[2018-04-09 05:08:13.938043] W [glusterfsd.c:1393:cleanup_and_exit] (–>/lib64/libpthread.so.0(+0x7e25) [0x7fb80b05ae25] –>/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x560b52471675] –>/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x560b5247149b] ) 0-: received signum (15), shutting down
[2018-04-09 05:08:13.938086] I [fuse-bridge.c:5855:fini] 0-fuse: Unmounting '/n'.
[2018-04-09 05:08:13.938106] I [fuse-bridge.c:5860:fini] 0-fuse: Closing fuse connection to '/n'.

==> glusterd.log <==
[2018-04-09 05:08:15.118078] W [socket.c:3216:socket_connect] 0-management: Error disabling sockopt IPV6_V6ONLY: "Protocol not available"

==> glustershd.log <==
[2018-04-09 05:08:15.282192] W [socket.c:3216:socket_connect] 0-gv01-client-0: Error disabling sockopt IPV6_V6ONLY: "Protocol not available"
[2018-04-09 05:08:15.289508] W [socket.c:3216:socket_connect] 0-gv01-client-1: Error disabling sockopt IPV6_V6ONLY: "Protocol not available"

We see that when one volume in the cluster is offline, gluster takes the other one offline too.  We might need a third server?

[root@nfs01 /]# mount -t glusterfs nfs01:/gv01 /n
Mount failed. Please check the log file for more details.
[root@nfs01 /]#

Can't mount the volume:

[root@nfs01 /]# gluster volume start gv01
volume start: gv01: failed: Quorum not met. Volume operation not allowed.
[root@nfs01 /]#

In this case we set two gluster tunables to allow the single brick to start:

[root@nfs01 glusterfs]# gluster volume set VOL cluster.server-quorum-type none
volume set: failed: Volume VOL does not exist
[root@nfs01 glusterfs]# gluster volume set gv01 cluster.server-quorum-type none
volume set: success
[root@nfs01 glusterfs]# gluster volume set gv01 cluster.quorum-type none
volume set: success
[root@nfs01 glusterfs]#
[root@nfs01 glusterfs]#
[root@nfs01 glusterfs]#
[root@nfs01 glusterfs]# gluster volume status
Status of volume: gv01
Gluster process                             TCP Port  RDMA Port  Online  Pid
——————————————————————————
Brick nfs01:/bricks/0/gv01                  49152     0          Y       28139
Self-heal Daemon on localhost               N/A       N/A        Y       28026

Task Status of Volume gv01
——————————————————————————
There are no active volume tasks

[root@nfs01 glusterfs]#

And set the same in /etc/glusterfs/glusterd.vol:

[root@nfs01 glusterfs]# cat /etc/glusterfs/glusterd.vol
volume management
    type mgmt/glusterd
    option working-directory /var/lib/glusterd
    option transport-type socket,rdma
    option transport.socket.keepalive-time 10
    option transport.socket.keepalive-interval 2
    option transport.socket.read-fail-log off
    option ping-timeout 0
    option event-threads 1
    option cluster.quorum-type none
    option cluster.server-quorum-type none
#   option lock-timer 180
#   option transport.address-family inet6
#   option base-port 49152
#   option max-port  65535
end-volume
[root@nfs01 glusterfs]#

Then you can mount remotely again.  But alas, that didn't resolve this either. 

ULTIMATE SOLUTION THAT WORKED

Until we started to look at auditd:

type=AVC msg=audit(1526965320.850:4094): avc:  denied  { write } for  pid=8714 comm="ganesha.nfsd" name="nfs_0" dev="dm-0" ino=201547689 scontext=system_u:system_r:ganesha_t:s0 tcontext=system_u:object_r:krb5_host_rcache_t:s0 tclass=file
type=SYSCALL msg=audit(1526965320.850:4094): arch=c000003e syscall=2 success=no exit=-13 a0=7f23b0003150 a1=2 a2=180 a3=2 items=0 ppid=1 pid=8714 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="ganesha.nfsd" exe="/usr/bin/ganesha.nfsd" subj=system_u:system_r:ganesha_t:s0 key=(null)
type=PROCTITLE msg=audit(1526965320.850:4094): proctitle=2F7573722F62696E2F67616E657368612E6E667364002D4C002F7661722F6C6F672F67616E657368612F67616E657368612E6C6F67002D66002F6574632F67616E657368612F67616E657368612E636F6E66002D4E004E49565F4556454E54
type=AVC msg=audit(1526965320.850:4095): avc:  denied  { unlink } for  pid=8714 comm="ganesha.nfsd" name="nfs_0" dev="dm-0" ino=201547689 scontext=system_u:system_r:ganesha_t:s0 tcontext=system_u:object_r:krb5_host_rcache_t:s0 tclass=file
type=SYSCALL msg=audit(1526965320.850:4095): arch=c000003e syscall=87 success=no exit=-13 a0=7f23b0004100 a1=7f23b0000050 a2=7f23b0004100 a3=5 items=0 ppid=1 pid=8714 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="ganesha.nfsd" exe="/usr/bin/ganesha.nfsd" subj=system_u:system_r:ganesha_t:s0 key=(null)
type=PROCTITLE msg=audit(1526965320.850:4095): proctitle=2F7573722F62696E2F67616E657368612E6E667364002D4C002F7661722F6C6F672F67616E657368612F67616E657368612E6C6F67002D66002F6574632F67616E657368612F67616E657368612E636F6E66002D4E004E49565F4556454E54

A few lines like this:

grep AVC /var/log/audit/audit.log | audit2allow -M systemd-allow

semodule -i systemd-allow.pp

solved the issue for us.  The error thrown also incuded this kernel panic message:

May 21 23:53:13 psql01 kernel: CPU: 3 PID: 2273 Comm: mount.nfs Tainted: G             L ————   3.10.0-693.21.1.el7.x86_64 #1


May 21 23:53:13 psql01 kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/14/2014


May 21 23:53:13 psql01 kernel: task: ffff880136335ee0 ti: ffff8801376b0000 task.ti: ffff8801376b0000


May 21 23:53:13 psql01 kernel: RIP: 0010:[]  [] _raw_spin_unlock_irqrestore+0×15/0×20


May 21 23:53:13 psql01 kernel: RSP: 0018:ffff8801376b3a60  EFLAGS: 00000206


May 21 23:53:13 psql01 kernel: RAX: ffffffffc05ab078 RBX: ffff880036973928 RCX: dead000000000200


May 21 23:53:13 psql01 kernel: RDX: ffffffffc05ab078 RSI: 0000000000000206 RDI: 0000000000000206


May 21 23:53:13 psql01 kernel: RBP: ffff8801376b3a60 R08: ffff8801376b3ab8 R09: ffff880137de1200


May 21 23:53:13 psql01 kernel: R10: ffff880036973928 R11: 0000000000000000 R12: ffff880036973928


May 21 23:53:13 psql01 kernel: R13: ffff8801376b3a58 R14: ffff88013fd98a40 R15: ffff8801376b3a58


May 21 23:53:13 psql01 kernel: FS:  00007fab48f07880(0000) GS:ffff88013fd80000(0000) knlGS:0000000000000000


May 21 23:53:13 psql01 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b


May 21 23:53:13 psql01 kernel: CR2: 00007f99793d93cc CR3: 000000013761e000 CR4: 00000000000007e0


May 21 23:53:13 psql01 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000


May 21 23:53:13 psql01 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400


May 21 23:53:13 psql01 kernel: Call Trace:


May 21 23:53:13 psql01 kernel: [] finish_wait+0×56/0×70


May 21 23:53:13 psql01 kernel: [] nfs_wait_client_init_complete+0xa1/0xe0 [nfs]


May 21 23:53:13 psql01 kernel: [] ? wake_up_atomic_t+0×30/0×30


May 21 23:53:13 psql01 kernel: [] nfs_get_client+0x22b/0×470 [nfs]


May 21 23:53:13 psql01 kernel: [] nfs4_set_client+0×98/0×130 [nfsv4]


May 21 23:53:13 psql01 kernel: [] nfs4_create_server+0x13e/0x3b0 [nfsv4]


May 21 23:53:13 psql01 kernel: [] nfs4_remote_mount+0x2e/0×60 [nfsv4]


May 21 23:53:13 psql01 kernel: [] mount_fs+0x3e/0x1b0


May 21 23:53:13 psql01 kernel: [] ? __alloc_percpu+0×15/0×20


May 21 23:53:13 psql01 kernel: [] vfs_kern_mount+0×67/0×110


May 21 23:53:13 psql01 kernel: [] nfs_do_root_mount+0×86/0xc0 [nfsv4]


May 21 23:53:13 psql01 kernel: [] nfs4_try_mount+0×44/0xc0 [nfsv4]


May 21 23:53:13 psql01 kernel: [] ? get_nfs_version+0×27/0×90 [nfs]


May 21 23:53:13 psql01 kernel: [] nfs_fs_mount+0x4cb/0xda0 [nfs]


May 21 23:53:13 psql01 kernel: [] ? nfs_clone_super+0×140/0×140 [nfs]


May 21 23:53:13 psql01 kernel: [] ? param_set_portnr+0×70/0×70 [nfs]


May 21 23:53:13 psql01 kernel: [] mount_fs+0x3e/0x1b0


May 21 23:53:13 psql01 kernel: [] ? __alloc_percpu+0×15/0×20


May 21 23:53:13 psql01 kernel: [] vfs_kern_mount+0×67/0×110


May 21 23:53:13 psql01 kernel: [] do_mount+0×233/0xaf0


May 21 23:53:13 psql01 kernel: [] SyS_mount+0×96/0xf0


May 21 23:53:13 psql01 kernel: [] system_call_fastpath+0x1c/0×21


May 21 23:53:13 psql01 kernel: [] ? system_call_after_swapgs+0xae/0×146

Best of luck!

Cheers,
TK


     
  Copyright © 2003 - 2013 Tom Kacperski (microdevsys.com). All rights reserved.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License