Header Shadow Image


NFS reply xid 3844308326 reply ERR 20: Auth Rejected Credentials (client should begin new session)

Getting this? Mounts freezing?  Keep reading:

tcpdump -i eth0 -s 0 -w dump.dat
tcpdump -r dump.dat |grep -Ei "psql02|nfs-c01"

02:55:48.731360 IP psql02.nix.mds.xyz.33991 > nfs-c01.nix.mds.xyz.nfs: Flags [P.], seq 1:693, ack 1, win 229, options [nop,nop,TS val 166990 ecr 5681495], length 692: NFS request xid 3844308326 688 null
02:55:48.731483 IP nfs-c01.nix.mds.xyz.nfs > psql02.nix.mds.xyz.33991: Flags [.], ack 693, win 238, options [nop,nop,TS val 5681498 ecr 166990], length 0
02:55:48.732644 IP nfs-c01.nix.mds.xyz.nfs > psql02.nix.mds.xyz.33991: Flags [P.], seq 1:25, ack 693, win 238, options [nop,nop,TS val 5681499 ecr 166990], length 24: NFS reply xid 3844308326 reply ERR 20: Auth Rejected Credentials (client should begin new session)
02:55:48.732670 IP psql02.nix.mds.xyz.33991 > nfs-c01.nix.mds.xyz.nfs: Flags [.], ack 25, win 229, options [nop,nop,TS val 166991 ecr 5681499], length 0

Try this patch to bring nfs-utils-1.3.0-0.48.el7_4.1.x86_64 up to nfs-utils-1.3.0-0.48.el7_4.2.x86_64:

http://download.rhn.redhat.com/errata/RHBA-2018-0422.html

Update and enjoy?  Nope!  So let's keep digging some more.  After more of an exhaustive search, the result was to add the following firewall lines and restart autofs.  Appears autofs didn't properly start on account of the missing firewall ports causing everything else to freeze, including any additional mounts:


[root@ovirt01 sssd]# firewall-cmd –zone=public –permanent –add-port=111/udp
success
[root@ovirt01 sssd]# firewall-cmd –zone=public –permanent –add-port=2049/ufp
success
[root@ovirt01 sssd]# firewall-cmd –reload
success
[root@ovirt01 sssd]# systemctl restart autofs
[root@ovirt01 sssd]# mount nfs-c01:/n /m
[root@ovirt01 sssd]# umount /m
[root@ovirt01 sssd]#
[root@ovirt01 sssd]#

The following fix was also used in combination with above: 

https://review.gerrithub.io/#/c/ffilz/nfs-ganesha/+/408756/

[root@nfs02 ~]# /bin/ganesha.nfsd -v
NFS-Ganesha Release = V2.7-dev.10
ganesha.nfsd compiled on Apr 30 2018 at 02:21:35
Release comment = GANESHA file server is 64 bits compliant and supports NFS v3,4.0,4.1 (pNFS) and 9P
Git HEAD = 9cf00dccc9ab92ea4a6ec6f7f1f2c043bdc20a4b
Git Describe = V2.7-dev.10-0-g9cf00dc
[root@nfs02 ~]#

On top of the above, also ensure the following gluster errors are handled:

[2018-05-01 22:43:06.412067] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-gv01-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2018-05-01 22:43:55.554833] E [socket.c:2374:socket_connect_finish] 0-gv01-client-0: connection to 192.168.0.131:49152 failed (Connection refused); disconnecting socket

 

[root@nfs02 glusterfs]# netstat -pnlt|grep gluster
tcp        0      0 0.0.0.0:24007           0.0.0.0:*               LISTEN      1108/glusterd
tcp        0      0 0.0.0.0:49152           0.0.0.0:*               LISTEN      1432/glusterfsd
[root@nfs02 glusterfs]#


[ CORRECT ]

[root@nfs02 glusterfs]# firewall-cmd –zone=dmz –list-all
dmz
  target: default
  icmp-block-inversion: no
  interfaces:
  sources:
  services: ssh
  ports: 2049/tcp 111/tcp 24007-24008/tcp 38465-38469/tcp 111/udp 22/tcp 22/udp 49000-59999/udp 49000-59999/tcp 20048/tcp 20048/udp 49152/tcp 4501/tcp 4501/udp 10000/tcp 9000/udp 9000/tcp
  protocols:
  masquerade: no
  forward-ports:
  source-ports:
  icmp-blocks:
  rich rules:

[root@nfs02 glusterfs]#


[ INCORRECT ]

[root@nfs01 /]# firewall-cmd –zone=public –list-all
public (active)
  target: default
  icmp-block-inversion: no
  interfaces: eth0
  sources:
  services: ssh dhcpv6-client haproxy
  ports: 24007-24008/tcp 49152/tcp 38465-38469/tcp 111/tcp 111/udp 2049/tcp 4501/tcp 4501/udp 20048/udp 20048/tcp 22/tcp 22/udp 10000/tcp 49000-59999/udp 49000-59999/tcp 9000/udp 9000/tcp 137/udp 138/udp 2049/udp
  protocols:
  masquerade: no
  forward-ports:
  source-ports: 49000-59999/tcp
  icmp-blocks:
  rich rules:

[root@nfs01 /]#


Fix was to remove the source-port by either editing /etc/firewalld/zones/public.xml and removing 

firewall-cmd –zone=public –permanent –remove-source-port=49000-59999/udp
firewall-cmd –zone=public –permanent –remove-source-port=49000-59999/tcp
firewall-cmd –reload


Also ensure haproxy is running on both hosts:


[root@nfs02 systemd]# systemctl status haproxy -l
* haproxy.service – HAProxy Load Balancer
   Loaded: loaded (/usr/lib/systemd/system/haproxy.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2018-05-01 23:21:44 EDT; 20s ago
 Main PID: 2405 (haproxy-systemd)
   CGroup: /system.slice/haproxy.service
           |-2405 /usr/sbin/haproxy-systemd-wrapper -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid
           |-2406 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds
           `-2407 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds

May 01 23:21:44 nfs02.nix.mds.xyz systemd[1]: Started HAProxy Load Balancer.
May 01 23:21:44 nfs02.nix.mds.xyz systemd[1]: Starting HAProxy Load Balancer…
May 01 23:21:44 nfs02.nix.mds.xyz haproxy-systemd-wrapper[2405]: haproxy-systemd-wrapper: executing /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds
[root@nfs02 systemd]# sysctl -p
net.ipv4.ip_nonlocal_bind = 1
net.ipv4.ip_forward = 1
vm.min_free_kbytes = 1048560
[root@nfs02 systemd]#

 

[root@nfs01 ~]# systemctl status haproxy -l
â haproxy.service – HAProxy Load Balancer
   Loaded: loaded (/usr/lib/systemd/system/haproxy.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2018-05-01 23:21:53 EDT; 7s ago
 Main PID: 21707 (haproxy-systemd)
   CGroup: /system.slice/haproxy.service
           ââ21707 /usr/sbin/haproxy-systemd-wrapper -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid
           ââ21708 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds
           ââ21709 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds

May 01 23:21:53 nfs01.nix.mds.xyz systemd[1]: Started HAProxy Load Balancer.
May 01 23:21:53 nfs01.nix.mds.xyz systemd[1]: Starting HAProxy Load Balancer…
May 01 23:21:53 nfs01.nix.mds.xyz haproxy-systemd-wrapper[21707]: haproxy-systemd-wrapper: executing /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds
[root@nfs01 ~]# sysctl -p
net.ipv4.ip_nonlocal_bind = 1
net.ipv4.ip_forward = 1
vm.min_free_kbytes = 1048560
[root@nfs01 ~]#

The other issue that existed was that you did not have a proper PTR and DNS records for the server.  Add them in IPA server.  This indicates that either there is an IPA server replication issue or PTR records are not created:

[root@psql01 ~]# dig -x psql01

; <<>> DiG 9.9.4-RedHat-9.9.4-51.el7_4.2 <<>> -x psql01
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 29853
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;psql01.in-addr.arpa.           IN      PTR

;; AUTHORITY SECTION:
in-addr.arpa.           900     IN      SOA     b.in-addr-servers.arpa. nstld.iana.org. 2018013362 1800 900 604800 3600

;; Query time: 95 msec
;; SERVER: 192.168.0.44#53(192.168.0.44)
;; WHEN: Tue May 01 23:39:52 EDT 2018
;; MSG SIZE  rcvd: 116

[root@psql01 ~]#

But that didn't fix it either. 

FINAL SOLUTION

Until we started to look at auditd:

type=AVC msg=audit(1526965320.850:4094): avc:  denied  { write } for  pid=8714 comm="ganesha.nfsd" name="nfs_0" dev="dm-0" ino=201547689 scontext=system_u:system_r:ganesha_t:s0 tcontext=system_u:object_r:krb5_host_rcache_t:s0 tclass=file
type=SYSCALL msg=audit(1526965320.850:4094): arch=c000003e syscall=2 success=no exit=-13 a0=7f23b0003150 a1=2 a2=180 a3=2 items=0 ppid=1 pid=8714 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="ganesha.nfsd" exe="/usr/bin/ganesha.nfsd" subj=system_u:system_r:ganesha_t:s0 key=(null)
type=PROCTITLE msg=audit(1526965320.850:4094): proctitle=2F7573722F62696E2F67616E657368612E6E667364002D4C002F7661722F6C6F672F67616E657368612F67616E657368612E6C6F67002D66002F6574632F67616E657368612F67616E657368612E636F6E66002D4E004E49565F4556454E54
type=AVC msg=audit(1526965320.850:4095): avc:  denied  { unlink } for  pid=8714 comm="ganesha.nfsd" name="nfs_0" dev="dm-0" ino=201547689 scontext=system_u:system_r:ganesha_t:s0 tcontext=system_u:object_r:krb5_host_rcache_t:s0 tclass=file
type=SYSCALL msg=audit(1526965320.850:4095): arch=c000003e syscall=87 success=no exit=-13 a0=7f23b0004100 a1=7f23b0000050 a2=7f23b0004100 a3=5 items=0 ppid=1 pid=8714 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="ganesha.nfsd" exe="/usr/bin/ganesha.nfsd" subj=system_u:system_r:ganesha_t:s0 key=(null)
type=PROCTITLE msg=audit(1526965320.850:4095): proctitle=2F7573722F62696E2F67616E657368612E6E667364002D4C002F7661722F6C6F672F67616E657368612F67616E657368612E6C6F67002D66002F6574632F67616E657368612F67616E657368612E636F6E66002D4E004E49565F4556454E54

A few lines like this:

grep AVC /var/log/audit/audit.log | audit2allow -M systemd-allow


semodule -i systemd-allow.pp

solved the issue for us.  The error thrown also included this:

May 21 23:53:13 psql01 kernel: CPU: 3 PID: 2273 Comm: mount.nfs Tainted: G             L ————   3.10.0-693.21.1.el7.x86_64 #1
May 21 23:53:13 psql01 kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/14/2014
May 21 23:53:13 psql01 kernel: task: ffff880136335ee0 ti: ffff8801376b0000 task.ti: ffff8801376b0000
May 21 23:53:13 psql01 kernel: RIP: 0010:[]  [] _raw_spin_unlock_irqrestore+0×15/0×20
May 21 23:53:13 psql01 kernel: RSP: 0018:ffff8801376b3a60  EFLAGS: 00000206
May 21 23:53:13 psql01 kernel: RAX: ffffffffc05ab078 RBX: ffff880036973928 RCX: dead000000000200
May 21 23:53:13 psql01 kernel: RDX: ffffffffc05ab078 RSI: 0000000000000206 RDI: 0000000000000206
May 21 23:53:13 psql01 kernel: RBP: ffff8801376b3a60 R08: ffff8801376b3ab8 R09: ffff880137de1200
May 21 23:53:13 psql01 kernel: R10: ffff880036973928 R11: 0000000000000000 R12: ffff880036973928
May 21 23:53:13 psql01 kernel: R13: ffff8801376b3a58 R14: ffff88013fd98a40 R15: ffff8801376b3a58
May 21 23:53:13 psql01 kernel: FS:  00007fab48f07880(0000) GS:ffff88013fd80000(0000) knlGS:0000000000000000
May 21 23:53:13 psql01 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
May 21 23:53:13 psql01 kernel: CR2: 00007f99793d93cc CR3: 000000013761e000 CR4: 00000000000007e0
May 21 23:53:13 psql01 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 21 23:53:13 psql01 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
May 21 23:53:13 psql01 kernel: Call Trace:
May 21 23:53:13 psql01 kernel: [] finish_wait+0×56/0×70
May 21 23:53:13 psql01 kernel: [] nfs_wait_client_init_complete+0xa1/0xe0 [nfs]
May 21 23:53:13 psql01 kernel: [] ? wake_up_atomic_t+0×30/0×30
May 21 23:53:13 psql01 kernel: [] nfs_get_client+0x22b/0×470 [nfs]
May 21 23:53:13 psql01 kernel: [] nfs4_set_client+0×98/0×130 [nfsv4]
May 21 23:53:13 psql01 kernel: [] nfs4_create_server+0x13e/0x3b0 [nfsv4]
May 21 23:53:13 psql01 kernel: [] nfs4_remote_mount+0x2e/0×60 [nfsv4]
May 21 23:53:13 psql01 kernel: [] mount_fs+0x3e/0x1b0
May 21 23:53:13 psql01 kernel: [] ? __alloc_percpu+0×15/0×20
May 21 23:53:13 psql01 kernel: [] vfs_kern_mount+0×67/0×110
May 21 23:53:13 psql01 kernel: [] nfs_do_root_mount+0×86/0xc0 [nfsv4]
May 21 23:53:13 psql01 kernel: [] nfs4_try_mount+0×44/0xc0 [nfsv4]
May 21 23:53:13 psql01 kernel: [] ? get_nfs_version+0×27/0×90 [nfs]
May 21 23:53:13 psql01 kernel: [] nfs_fs_mount+0x4cb/0xda0 [nfs]
May 21 23:53:13 psql01 kernel: [] ? nfs_clone_super+0×140/0×140 [nfs]
May 21 23:53:13 psql01 kernel: [] ? param_set_portnr+0×70/0×70 [nfs]
May 21 23:53:13 psql01 kernel: [] mount_fs+0x3e/0x1b0
May 21 23:53:13 psql01 kernel: [] ? __alloc_percpu+0×15/0×20
May 21 23:53:13 psql01 kernel: [] vfs_kern_mount+0×67/0×110
May 21 23:53:13 psql01 kernel: [] do_mount+0×233/0xaf0
May 21 23:53:13 psql01 kernel: [] SyS_mount+0×96/0xf0
May 21 23:53:13 psql01 kernel: [] system_call_fastpath+0x1c/0×21
May 21 23:53:13 psql01 kernel: [] ? system_call_after_swapgs+0xae/0×146

 

Good Luck!

Cheers,
TK

Comments are closed.


     
  Copyright © 2003 - 2013 Tom Kacperski (microdevsys.com). All rights reserved.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License