Header Shadow Image


failed command: READ FPDMA QUEUED FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE

So my last Seagate SATA drive in my RAID 6 Array died spectacularly taking out my 4.8.4 Kernel and locking up my storage to the point where the only way I can get to it is via the kernel boot parameter init=/bin/bash .  The disk lasted about 5.762 years:  

Read the rest of this entry »

GlusterFS: Configuration and Setup w/ NFS-Ganesha for an HA NFS Cluster (Quick Start Guide)

This is a much shorter version of our troubleshooting article on NFS Ganesh we created earlier.  This is meant as a quick start guide for those who just want to get this server up and running very quickly.  The point of High Availabilty is that the best implement HA solutions never allow any outage to be noticed by the client.  It's not the client's job to put up with the fallout of a failure, it's the sysadmins job to ensure they never have too. In this configuration, however, we will use a 3 node Gluster Cluster. In short, we'll be using the following techs to setup an HA configuration:

  • GlusterFS
  • NFS Ganesha
  • CentOS 7 
  • HAPROXY
  • keepalived
  • firewalld
  • selinux

Here's a summary configuration for this whole work.  If you run into this particularly nasty error, visit the solution page here:

HOST SETTING DESCRIPTION
nfs01 / nfs02 / nfs03

Create and reserve some IP's for your hosts.  We are using the FreeIPA project to provide DNS and Kerberos functionality here:

192.168.0.80 nfs-c01 (nfs01, nfs02, nfs03)  VIP DNS Entry

192.168.0.131 nfs01
192.168.0.119 nfs02
192.168.0.125 nfs03

Add the hosts to your DNS server for a clean setup. Alternately  add them to /etc/hosts (ugly)
nfs01 / nfs02 / nfs03

PACKAGES

You can use the packages directly.  Since version 2.6.X, Ganesha supports binding only on specific interfaces and has been introduced in the latest RPM packages.

yum install nfs-ganesha.x86_64 nfs-ganesha-gluster.x86_64 nfs-ganesha-proxy.x86_64 nfs-ganesha-utils.x86_64 nfs-ganesha-vfs.x86_64 nfs-ganesha-xfs.x86_64 nfs-ganesha-mount-9P.x86_64

COMPILING 

We used this method because we needed a feature that allows binding the service only on specific ports, at the time only available from the latest source releases.

wget https://github.com/nfs-ganesha/nfs-ganesha/archive/V2.6-.0.tar.gz

[root@nfs01 ~]# ganesha.nfsd -v
NFS-Ganesha Release = V2.6.0
nfs-ganesha compiled on Feb 20 2018 at 08:55:23
Release comment = GANESHA file server is 64 bits compliant and supports NFS v3,4.0,4.1 (pNFS) and 9P
Git HEAD = 97867975b2ee69d475876e222c439b1bc9764a78
Git Describe = V2.6-.0-0-g9786797
[root@nfs01 ~]#

DETAILED INSTRUCTIONS:

https://github.com/nfs-ganesha/nfs-ganesha/wiki/Compiling

https://github.com/nfs-ganesha/nfs-ganesha/wiki/GLUSTER
https://github.com/nfs-ganesha/nfs-ganesha/wiki/XFSLUSTRE

PACKAGES:

yum install glusterfs-api-devel.x86_64
yum install xfsprogs-devel.x86_64
yum install xfsprogs.x86_64
xfsdump-3.1.4-1.el7.x86_64
libguestfs-xfs-1.36.3-6.el7_4.3.x86_64
libntirpc-devel-1.5.4-1.el7.x86_64
libntirpc-1.5.4-1.el7.x86_64

libnfsidmap-devel-0.25-17.el7.x86_64
jemalloc-devel-3.6.0-1.el7.x86_64

COMMANDS

git clone https://github.com/nfs-ganesha/nfs-ganesha.git
cd nfs-ganesha;
git checkout V2.6-stable

git submodule update init recursive
yum install gcc-c++
yum install cmake

ccmake /root/ganesha/nfs-ganesha/src/
# Press the c, e, c, g keys to create and generate the config and make files.
make
make install

Compile and build
nfsganesha 2.60+
from source.  (At
this time RPM
packages did not work) Install the listed packages before compiling as well.
nfs01 / nfs02 / nfs03 Add a disk to the VM such as /dev/sdb . Add secondary
disk for the
shared GlusterFS
nfs01 / nfs02 / nfs03

Create the FS on the new disk and mount it and setup Gluster:

mkfs.xfs /dev/sdb
mkdir -p /bricks/0
mount /dev/sdb /bricks/0
# grep brick /etc/fstab
/dev/sdb /bricks/0                              xfs     defaults        0 0

Gluster currently ships in version 4.1.  This won't work with Ganesha.  Use either the repo or continue installing the latest version of Gluster:

# cat CentOS-Gluster-3.13.repo
# CentOS-Gluster-3.13.repo
#
# Please see http://wiki.centos.org/SpecialInterestGroup/Storage for more
# information

[centos-gluster313]
name=CentOS-$releasever – Gluster 3.13 (Short Term Maintanance)
baseurl=http://mirror.centos.org/centos/$releasever/storage/$basearch/gluster-3.13/
gpgcheck=1
enabled=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-SIG-Storage

[centos-gluster313-test]
name=CentOS-$releasever – Gluster 3.13 Testing (Short Term Maintenance)
baseurl=http://buildlogs.centos.org/centos/$releasever/storage/$basearch/gluster-3.13/
gpgcheck=0
enabled=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-SIG-Storage

Alternately to the above, use the following to install the latest repo:

yum install centos-release-gluster

Install and enable the rest:

yum -y install glusterfs glusterfs-fuse glusterfs-server glusterfs-api glusterfs-cli
systemctl enable glusterd.service
systemctl start glusterd

On node01 ONLY if creating brand new: 

gluster volume create gv01 replica 2 nfs01:/bricks/0/gv01 nfs02:/bricks/0/gv01 

gluster volume info gv01
gluster volume status 

Replace bricks: 

Unreachable brick:
gluster volume remove-brick gv01 replica X nfs01:/bricks/0/gv01 start
gluster volume remove-brick gv01 replica X nfs01:/bricks/0/gv01 force
gluster peer detach nfs01

Reachable brick:

gluster volume remove-brick gv01 replica X nfs01:/bricks/0/gv01 start
gluster volume remove-brick gv01 replica X nfs01:/bricks/0/gv01 status
gluster volume remove-brick gv01 replica X nfs01:/bricks/0/gv01 commit

gluster peer detach nfs01

Add subsequent bricks: 

(from existing cluster member )
[root@nfs01 ~]# gluster peer probe nfs03 
[root@nfs01 ~]# gluster volume add-brick gv01 replica 3 nfs03:/bricks/0/gv01

Mount the storage locally: 

systemctl disable autofs 
mkdir /n 

Example below.  Add to /etc/fstab as well: 

[root@nfs01 ~]# mount -t glusterfs nfs01:/gv01 /n
[root@nfs02 ~]# mount -t glusterfs nfs02:/gv01 /n
[root@nfs03 ~]# mount -t glusterfs nfs03:/gv01 /n

Ex:

nfs01:/gv01 /n    glusterfs defaults      0 0

Ensure the following options are set on the gluster volume:

[root@nfs01 glusterfs]# gluster volume set gv01 cluster.quorum-type auto
volume set: success
[root@nfs01 glusterfs]# gluster volume set gv01 cluster.server-quorum-type server
volume set: success

Here is an example Gluster volume configuration we used (This config is replicated when adding new bricks):

cluster.server-quorum-type: server
cluster.quorum-type: auto
server.event-threads: 8
client.event-threads: 8
performance.readdir-ahead: on
performance.write-behind-window-size: 8MB
performance.io-thread-count: 16
performance.cache-size: 1GB
nfs.trusted-sync: on
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet

 

Configure the
GlusterFS filesystem
using
nfs01 / nfs02 / nfs03

PACKAGES:
yum install haproxy     # ( 1.5.18-6.el7.x86_64 used in this case )

/etc/haproxy/haproxy.cfg:

global
    log         127.0.0.1 local0 debug
    stats       socket /var/run/haproxy.sock mode 0600 level admin
    # stats     socket /var/lib/haproxy/stats
    maxconn     4000
    user        haproxy
    group       haproxy
    daemon
    debug

defaults
    mode                    tcp
    log                     global
    option                  dontlognull
    option                  redispatch
    retries                 3
    timeout http-request    10s
    timeout queue           1m
    timeout connect         10s
    timeout client          1m
    timeout server          1m
    timeout http-keep-alive 10s
    timeout check           10s
    maxconn                 3000


frontend nfs-in
    log                         127.0.0.1       local0          debug
#    bind                       nfs02:2049
    bind                        nfs-c01:2049
    mode                        tcp
    option                      tcplog
    default_backend             nfs-back


backend nfs-back
    log         /dev/log local0 debug
    mode        tcp
    balance     source
    server      nfs01.nix.mds.xyz    nfs01.nix.mds.xyz:2049 check
    server      nfs02.nix.mds.xyz    nfs02.nix.mds.xyz:2049 check
    server      nfs03.nix.mds.xyz    nfs03.nix.mds.xyz:2049 check

listen stats
    bind :9000
    mode http
    stats enable
    stats hide-version
    stats realm Haproxy\ Statistics
    stats uri /haproxy-stats
    stats auth admin:s3cretp@s$w0rd

Set logging settings for HAProxy:

# cat /etc/rsyslog.d/haproxy.conf
$ModLoad imudp
$UDPServerAddress 127.0.0.1
$UDPServerRun 514
local6.* /var/log/haproxy.log
local0.* /var/log/haproxy.log

Configure rsyslogd (/etc/rsyslog.conf):

local0.*             /var/log/haproxy.log
local3.*             /var/log/keepalived.log

 

Install and
Configure HAPROXY. great source that helped with this part.
nfs01 / nfs02 / nfs03 # echo "net.ipv4.ip_nonlocal_bind = 1" >> /etc/sysctl.conf
# echo "net.ipv4.ip_forward = 1" >> /etc/sysctl.conf
# sysctl -p
net.ipv4.ip_nonlocal_bind = 1
net.ipv4.ip_forward = 1
Turn on kernel parameters.  These allow keepalived below to function properly.
nfs01 / nfs02 / nfs03 

PACKAGES:

yum install keepalived    # ( Used 1.3.5-1.el7.x86_64 in this case )

NFS01:

vrrp_script chk_haproxy {
        script "killall -0 haproxy"             # check the haproxy process
        interval 2                              # every 2 seconds
        weight 2                                # add 2 points if OK
}

vrrp_instance nfs-c01 {
        interface eth0                          # interface to monitor
        state MASTER                            # MASTER on haproxy1, BACKUP on haproxy2
        virtual_router_id 80                    # Set to last digit of cluster IP.
        priority 101                            # 101 on haproxy1, 100 on haproxy2

        authentication {
                auth_type PASS
                auth_pass
s3cretp@s$w0rd
        }

        virtual_ipaddress {
                delay_loop 12
                lb_algo wrr
                lb_kind DR
                protocol TCP
                192.168.0.80                    # virtual ip address
        }

        track_script {
                chk_haproxy
        }
}

NFS02:

vrrp_script chk_haproxy {
        script "killall -0 haproxy"             # check the haproxy process
        interval 2                              # every 2 seconds
        weight 2                                # add 2 points if OK
}

vrrp_instance nfs-c01 {
        interface eth0                          # interface to monitor
        state BACKUP                            # MASTER on haproxy1, BACKUP on haproxy2
        virtual_router_id 80                    # Set to last digit of cluster IP.
        priority 102                            # 101 on haproxy1, 100 on haproxy2

        authentication {
                auth_type PASS
                auth_pass 
s3cretp@s$w0rd
        }

        virtual_ipaddress {
                delay_loop 12
                lb_algo wrr
                lb_kind DR
                protocol TCP
                192.168.0.80                    # virtual ip address
        }

        track_script {
                chk_haproxy
        }
}

NFS03:

vrrp_script chk_haproxy {
        script "killall -0 haproxy"             # check the haproxy process
        interval 2                              # every 2 seconds
        weight 2                                # add 2 points if OK
}

vrrp_instance nfs-c01 {
        interface eth0                          # interface to monitor
        state BACKUP                            # MASTER on haproxy1, BACKUP on haproxy2
        virtual_router_id 80                    # Set to last digit of cluster IP.
        priority 103                            # 101 on haproxy1, 100 on haproxy2

        authentication {
                auth_type PASS
                auth_pass 
s3cretp@s$w0rd
        }

        virtual_ipaddress {
                delay_loop 12
                lb_algo wrr
                lb_kind DR
                protocol TCP
                192.168.0.80                    # virtual ip address
        }

        track_script {
                chk_haproxy
        }
}

 

Configure keepalived. A great source that helped with this as well.

nfs01 / nfs02 / nfs03

This step can be made quicker by copying the xml definitions from one host to the other if you already have one defined:

/etc/firewalld/zones/dmz.xml
/etc/firewalld/zones/public.xml

Contents of above:

# cat dmz.xml
<?xml version="1.0" encoding="utf-8"?>
<zone>
  <short>DMZ</short>
  <description>For computers in your demilitarized zone that are publicly-accessible with limited access to your internal network. Only selected incoming connections are accepted.</description>
  <service name="ssh"/>
  <port protocol="tcp" port="2049"/>
  <port protocol="tcp" port="111"/>
  <port protocol="tcp" port="24007-24008"/>
  <port protocol="tcp" port="38465-38469"/>
  <port protocol="udp" port="111"/>
  <port protocol="tcp" port="22"/>
  <port protocol="udp" port="22"/>
  <port protocol="udp" port="49000-59999"/>
  <port protocol="tcp" port="49000-59999"/>
  <port protocol="tcp" port="20048"/>
  <port protocol="udp" port="20048"/>
  <port protocol="tcp" port="49152"/>
  <port protocol="tcp" port="4501"/>
  <port protocol="udp" port="4501"/>
  <port protocol="tcp" port="10000"/>
  <port protocol="udp" port="9000"/>
  <port protocol="tcp" port="9000"/>
  <port protocol="tcp" port="445"/>
  <port protocol="tcp" port="139"/>
  <port protocol="udp" port="138"/>
  <port protocol="udp" port="137"/>
</zone>

 

# cat public.xml
<?xml version="1.0" encoding="utf-8"?>
<zone>
  <short>Public</short>
  <description>For use in public areas. You do not trust the other computers on networks to not harm your computer. Only selected incoming connections are accepted.</description>
  <service name="ssh"/>
  <service name="dhcpv6-client"/>
  <service name="haproxy"/>
  <port protocol="tcp" port="24007-24008"/>
  <port protocol="tcp" port="49152"/>
  <port protocol="tcp" port="38465-38469"/>
  <port protocol="tcp" port="111"/>
  <port protocol="udp" port="111"/>
  <port protocol="tcp" port="2049"/>
  <port protocol="tcp" port="4501"/>
  <port protocol="udp" port="4501"/>
  <port protocol="udp" port="20048"/>
  <port protocol="tcp" port="20048"/>
  <port protocol="tcp" port="22"/>
  <port protocol="udp" port="22"/>
  <port protocol="tcp" port="10000"/>
  <port protocol="udp" port="49000-59999"/>
  <port protocol="tcp" port="49000-59999"/>
  <port protocol="udp" port="9000"/>
  <port protocol="tcp" port="9000"/>
  <port protocol="udp" port="137"/>
  <port protocol="udp" port="138"/>
  <port protocol="udp" port="2049"/>
  <port protocol="tcp" port="445"/>
  <port protocol="tcp" port="139"/>
  <port protocol="udp" port="68"/>
  <port protocol="udp" port="67"/>
</zone>

 

Individual setup:

# cat public.bash

firewall-cmd –zone=public –permanent –add-port=2049/tcp

firewall-cmd –zone=public –permanent –add-port=111/tcp

firewall-cmd –zone=public –permanent –add-port=111/udp

firewall-cmd –zone=public –permanent –add-port=24007-24008/tcp

firewall-cmd –zone=public –permanent –add-port=49152/tcp

firewall-cmd –zone=public –permanent –add-port=38465-38469/tcp

firewall-cmd –zone=public –permanent –add-port=4501/tcp

firewall-cmd –zone=public –permanent –add-port=4501/udp

firewall-cmd –zone=public –permanent –add-port=20048/udp

firewall-cmd –zone=public –permanent –add-port=20048/tcp
firewall-cmd –reload

# cat dmz.bash

firewall-cmd –zone=dmz –permanent –add-port=2049/tcp

firewall-cmd –zone=dmz –permanent –add-port=111/tcp

firewall-cmd –zone=dmz –permanent –add-port=111/udp

firewall-cmd –zone=dmz –permanent –add-port=24007-24008/tcp

firewall-cmd –zone=dmz –permanent –add-port=49152/tcp

firewall-cmd –zone=dmz –permanent –add-port=38465-38469/tcp

firewall-cmd –zone=dmz –permanent –add-port=4501/tcp

firewall-cmd –zone=dmz –permanent –add-port=4501/udp

firewall-cmd –zone=dmz –permanent –add-port=20048/tcp

firewall-cmd –zone=dmz –permanent –add-port=20048/udp

firewall-cmd –reload

#

# On Both

firewall-cmd –permanent –direct –add-rule ipv4 filter INPUT 0 -m pkttype –pkt-type multicast -j ACCEPT
firewall-cmd –reload

 

HANDY STUFF:

firewall-cmd –zone=dmz –list-all
firewall-cmd –zone=public –list-all
firewall-cmd –set-log-denied=all
firewall-cmd –permanent –add-service=haproxy
firewall-cmd –list-all
firewall-cmd –runtime-to-permanent

Configure firewalld.
DO NOT
disable
firewalld .
nfs01 / nfs02 / nfs03

Run any of the following command, or a combination of, on deny entries in /var/log/audit/audit.log that may appear as you stop, start or install above services:

METHOD 1:
grep AVC /var/log/audit/audit.log | audit2allow -M systemd-allow
semodule -i systemd-allow.pp

METHOD 2:
audit2allow -a
audit2allow -a -M ganesha_<NUM>_port
semodule -i ganesha_<NUM>_port.pp

USEFULL THINGS:

ausearch –interpret
aureport

Configure selinux. 
Don't disable it.
This actually  makes your host safer and is actually
easy to work with using just these commands.
nfs01 / nfs02 / nfs03

NODE 1:

[root@nfs01 ~]# cat /etc/ganesha/ganesha.conf
###################################################

#
# EXPORT
#
# To function, all that is required is an EXPORT
#
# Define the absolute minimal export
#
###################################################

# logging directives–be careful
LOG {
        # Default_Log_Level is unknown token??
        # Default_Log_Level = NIV_FULL_DEBUG;
        Components {
#                 ALL = FULL_DEBUG;
                MEMLEAKS = FATAL;
                FSAL = DEBUG;
                NFSPROTO = FATAL;
                NFS_V4 = FULL_DEBUG;
                EXPORT = DEBUG;
                FILEHANDLE = FATAL;
                DISPATCH = DEBUG;
                CACHE_INODE = FULL_DEBUG;
                CACHE_INODE_LRU = FATAL;
                HASHTABLE = FATAL;
                HASHTABLE_CACHE = FATAL;
                DUPREQ = FATAL;
                INIT = DEBUG;
                MAIN = FATAL;
                IDMAPPER = FULL_DEBUG;
                NFS_READDIR = FULL_DEBUG;
                NFS_V4_LOCK = FULL_DEBUG;
                CONFIG = FULL_DEBUG;
                CLIENTID = FULL_DEBUG;
                SESSIONS = FATAL;
                PNFS = FATAL;
                RW_LOCK = FATAL;
                NLM = FATAL;
                RPC = FULL_DEBUG;
                NFS_CB = FATAL;
                THREAD = FATAL;
                NFS_V4_ACL = FULL_DEBUG;
                STATE = FULL_DEBUG;
#                9P = FATAL;
#                9P_DISPATCH = FATAL;
                FSAL_UP = FATAL;
                DBUS = FATAL;
        }

        Facility {
                name = FILE;
                destination = "/var/log/ganesha/ganesha-rgw.log";
                enable = active;
        }
}

NFSv4 {
    Lease_Lifetime = 20 ;
    IdmapConf = "/etc/idmapd.conf" ;
    DomainName = "nix.mds.xyz" ;
}

NFS_KRB5 {
        PrincipalName = "nfs/nfs01.nix.mds.xyz@NIX.MDS.XYZ" ;
        KeytabPath = /etc/krb5.keytab ;
        Active_krb5 = YES ;
}


NFS_Core_Param {
        Bind_addr=192.168.0.119;
        NFS_Port=2049;
        MNT_Port=20048;
        NLM_Port=38468;
        Rquota_Port=4501;
}

%include "/etc/ganesha/export.conf"
# %include "/etc/ganesha/export-home.conf"

 

[root@nfs01 ~]# cat /etc/ganesha/export.conf

EXPORT {
    Export_Id = 1 ;                             # Export ID unique to each export
    Path = "/n";                                # Path of the volume to be exported. Eg: "/test_volume"

    FSAL {
        name = GLUSTER;
        hostname = "nfs01.nix.mds.xyz";         # IP of one of the nodes in the trusted pool
        volume = "gv01";                        # Volume name. Eg: "test_volume"
    }

    Access_type = RW;                           # Access permissions
    Squash = No_root_squash;                    # To enable/disable root squashing
    Disable_ACL = FALSE;                        # To enable/disable ACL
    Pseudo = "/n";                              # NFSv4 pseudo path for this export. Eg: "/test_volume_pseudo"
    Protocols = "3", "4";                       # NFS protocols supported
    Transports = "UDP", "TCP" ;                 # Transport protocols supported
    SecType = "sys", "krb5","krb5i","krb5p";    # "sys", "krb5","krb5i","krb5p";        # Security flavors supported
}

[root@nfs01 ~]#

 

NODE 2:

# cat /etc/ganesha/ganesha.conf
###################################################
#
# EXPORT
#
# To function, all that is required is an EXPORT
#
# Define the absolute minimal export
#
###################################################

# logging directives–be careful
LOG {
        # Default_Log_Level is unknown token??
        # Default_Log_Level = NIV_FULL_DEBUG;
        Components {
#                 ALL = FULL_DEBUG;
                MEMLEAKS = FATAL;
                FSAL = DEBUG;
                NFSPROTO = FATAL;
                NFS_V4 = FULL_DEBUG;
                EXPORT = DEBUG;
                FILEHANDLE = FATAL;
                DISPATCH = DEBUG;
                CACHE_INODE = FULL_DEBUG;
                CACHE_INODE_LRU = FATAL;
                HASHTABLE = FATAL;
                HASHTABLE_CACHE = FATAL;
                DUPREQ = FATAL;
                INIT = DEBUG;
                MAIN = FATAL;
                IDMAPPER = FULL_DEBUG;
                NFS_READDIR = FULL_DEBUG;
                NFS_V4_LOCK = FULL_DEBUG;
                CONFIG = FULL_DEBUG;
                CLIENTID = FULL_DEBUG;
                SESSIONS = FATAL;
                PNFS = FATAL;
                RW_LOCK = FATAL;
                NLM = FATAL;
                RPC = FULL_DEBUG;
                NFS_CB = FATAL;
                THREAD = FATAL;
                NFS_V4_ACL = FULL_DEBUG;
                STATE = FULL_DEBUG;
#                9P = FATAL;
#                9P_DISPATCH = FATAL;
                FSAL_UP = FATAL;
                DBUS = FATAL;
        }

        Facility {
                name = FILE;
                destination = "/var/log/ganesha/ganesha-rgw.log";
                enable = active;
        }
}

NFSv4 {
    Lease_Lifetime = 20 ;
    IdmapConf = "/etc/idmapd.conf" ;
    DomainName = "nix.mds.xyz" ;
}

NFS_KRB5 {
        PrincipalName = "nfs/nfs02.nix.mds.xyz@NIX.MDS.XYZ" ;
        KeytabPath = /etc/krb5.keytab ;
        Active_krb5 = YES ;
}


NFS_Core_Param {
        Bind_addr=192.168.0.119;
        NFS_Port=2049;
        MNT_Port=20048;
        NLM_Port=38468;
        Rquota_Port=4501;
}

%include "/etc/ganesha/export.conf"
# %include "/etc/ganesha/export-home.conf"
[root@nfs02 glusterfs]#
[root@nfs02 glusterfs]#
[root@nfs02 glusterfs]# cat /etc/ganesha/export.conf
EXPORT {
    Export_Id = 1 ;                             # Export ID unique to each export
    Path = "/n";                                # Path of the volume to be exported. Eg: "/test_volume"

    FSAL {
        name = GLUSTER;
        hostname = "nfs02.nix.mds.xyz";         # IP of one of the nodes in the trusted pool
        volume = "gv01";                        # Volume name. Eg: "test_volume"
    }

    Access_type = RW;                           # Access permissions
    Squash = No_root_squash;                    # To enable/disable root squashing
    Disable_ACL = FALSE;                        # To enable/disable ACL
    Pseudo = "/n";                              # NFSv4 pseudo path for this export. Eg: "/test_volume_pseudo"
    Protocols = "3", "4";                       # NFS protocols supported
    Transports = "UDP", "TCP" ;                 # Transport protocols supported
    SecType = "sys", "krb5","krb5i","krb5p";    # "sys", "krb5","krb5i","krb5p";        # Security flavors supported
}
[root@nfs02 glusterfs]#

 

NODE 3:

[root@nfs03 ~]# cat /etc/ganesha/ganesha.conf
###################################################
#
# EXPORT
#
# To function, all that is required is an EXPORT
#
# Define the absolute minimal export
#
###################################################


# logging directives–be careful
LOG {
        # Default_Log_Level is unknown token??
        # Default_Log_Level = NIV_FULL_DEBUG;
        Components {
#                 ALL = FULL_DEBUG;
                MEMLEAKS = FATAL;
                FSAL = DEBUG;
                NFSPROTO = FATAL;
                NFS_V4 = FULL_DEBUG;
                EXPORT = DEBUG;
                FILEHANDLE = FATAL;
                DISPATCH = DEBUG;
                CACHE_INODE = FULL_DEBUG;
                CACHE_INODE_LRU = FATAL;
                HASHTABLE = FATAL;
                HASHTABLE_CACHE = FATAL;
                DUPREQ = FATAL;
                INIT = DEBUG;
                MAIN = FATAL;
                IDMAPPER = FULL_DEBUG;
                NFS_READDIR = FULL_DEBUG;
                NFS_V4_LOCK = FULL_DEBUG;
                CONFIG = FULL_DEBUG;
                CLIENTID = FULL_DEBUG;
                SESSIONS = FATAL;
                PNFS = FATAL;
                RW_LOCK = FATAL;
                NLM = FATAL;
                RPC = FULL_DEBUG;
                NFS_CB = FATAL;
                THREAD = FATAL;
                NFS_V4_ACL = FULL_DEBUG;
                STATE = FULL_DEBUG;
#                9P = FATAL;
#                9P_DISPATCH = FATAL;
                FSAL_UP = FATAL;
                DBUS = FATAL;
        }

        Facility {
                name = FILE;
                destination = "/var/log/ganesha/ganesha-rgw.log";
                enable = active;
        }
}

NFSv4 {
    Lease_Lifetime = 20 ;
    IdmapConf = "/etc/idmapd.conf" ;
    DomainName = "nix.mds.xyz" ;
}


NFS_KRB5 {
        PrincipalName = "nfs/nfs03.nix.mds.xyz@NIX.MDS.XYZ" ;
        KeytabPath = /etc/krb5.keytab ;
        Active_krb5 = YES ;
}

NFS_Core_Param {
        Bind_addr = 192.168.0.125;
        NFS_Port = 2049;
        MNT_Port = 20048;
        NLM_Port = 38468;
        Rquota_Port = 4501;
}

%include "/etc/ganesha/export.conf"
# %include "/etc/ganesha/export-home.conf"
[root@nfs03 ~]#
[root@nfs03 ~]#
[root@nfs03 ~]# cat /etc/ganesha/export.conf
EXPORT {
        Export_Id = 1 ;                             # Export ID unique to each export
        Path = "/n";                                # Path of the volume to be exported. Eg: "/test_volume"

        FSAL {
                name = GLUSTER;
                hostname = "nfs03.nix.mds.xyz";         # IP of one of the nodes in the trusted pool
                volume = "gv01";                        # Volume name. Eg: "test_volume"
        }

        Access_type = RW;                           # Access permissions
        Squash = No_root_squash;                    # To enable/disable root squashing
        Disable_ACL = FALSE;                        # To enable/disable ACL
        Pseudo = "/n";                              # NFSv4 pseudo path for this export. Eg: "/test_volume_pseudo"
        Protocols = "3", "4";                       # "3", "4" NFS protocols supported
        Transports = "UDP", "TCP" ;                 # "UDP", "TCP" Transport protocols supported
        SecType = "sys","krb5","krb5i","krb5p";     # "sys","krb5","krb5i","krb5p";     # Security flavors supported
}
[root@nfs03 ~]#

 

STARTUP:

systemctl start nfs-ganesha
(Only if you did not extract the startup script) /usr/bin/ganesha.nfsd -L /var/log/ganesha/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT

 

Configure NFS Ganesha
nfs01 / nfs02 / nfs03

 

[root@nfs01 ~]# cat /etc/fstab|grep -Ei "brick|gv01"
/dev/sdb /bricks/0                              xfs     defaults        0 0
nfs01:/gv01 /n                                  glusterfs defaults      0 0
[root@nfs01 ~]#

[root@nfs01 ~]# mount|grep -Ei "brick|gv01"
/dev/sdb on /bricks/0 type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
nfs01:/gv01 on /n type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
[root@nfs01 ~]#

 

[root@nfs01 ~]# ps -ef|grep -Ei "haproxy|keepalived|ganesha"; netstat -pnlt|grep -Ei "haproxy|ganesha|keepalived"
root      1402     1  0 00:59 ?        00:00:00 /usr/sbin/keepalived -D
root      1403  1402  0 00:59 ?        00:00:00 /usr/sbin/keepalived -D
root      1404  1402  0 00:59 ?        00:00:02 /usr/sbin/keepalived -D
root     13087     1  0 01:02 ?        00:00:00 /usr/sbin/haproxy-systemd-wrapper -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid
haproxy  13088 13087  0 01:02 ?        00:00:00 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds
haproxy  13089 13088  0 01:02 ?        00:00:01 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds
root     13129     1 15 01:02 ?        00:13:11 /usr/bin/ganesha.nfsd -L /var/log/ganesha/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT
root     19742 15633  0 02:30 pts/2    00:00:00 grep –color=auto -Ei haproxy|keepalived|ganesha
tcp        0      0 192.168.0.80:2049       0.0.0.0:*               LISTEN      13089/haproxy
tcp6       0      0 192.168.0.131:20048     :::*                    LISTEN      13129/ganesha.nfsd
tcp6       0      0 :::564                  :::*                    LISTEN      13129/ganesha.nfsd
tcp6       0      0 192.168.0.131:4501      :::*                    LISTEN      13129/ganesha.nfsd
tcp6       0      0 192.168.0.131:2049      :::*                    LISTEN      13129/ganesha.nfsd
tcp6       0      0 192.168.0.131:38468     :::*                    LISTEN      13129/ganesha.nfsd
[root@nfs01 ~]#

 

Ensure mounts are
done and everything
is started up.
nfs01 / nfs02 / nfs03

yumdownloader nfs-ganesha.x86_64
rpm2cpio nfs-ganesha-2.5.5-1.el7.x86_64.rpm | cpio -idmv ./usr/lib/systemd/system/nfs-ganesha-lock.service
rpm2cpio nfs-ganesha-2.5.5-1.el7.x86_64.rpm | cpio -idmv ./usr/lib/systemd/system/nfs-ganesha.service
rpm2cpio nfs-ganesha-2.5.5-1.el7.x86_64.rpm | cpio -idmv ./usr/lib/systemd/system/nfs-ganesha-config.service
rpm2cpio nfs-ganesha-2.5.5-1.el7.x86_64.rpm | cpio -idmv ./usr/libexec/ganesha/nfs-ganesha-config.sh

Copy above to the same folders under / instead of ./ :

systemctl enable nfs-ganesha.service
systemctl status nfs-ganesha.service

Since you compiled from source you don't have nice startup scripts.  To get your nice startup scripts from an existing ganesha RPM do the following.  Then use systemctl to stop and start nfs-ganesha as you would any other service.
 
ANY

Enable dumps:

gluster volume set gv01 server.statedump-path /var/log/glusterfs/
gluster volume statedump gv01

 

Enable state dumps for issue isolation.
Enable Samba / SMB for Windows File Sharing ( Optional )

Packages:

samba-common-4.7.1-6.el7.noarch
samba-client-libs-4.7.1-6.el7.x86_64
libsmbclient-4.7.1-6.el7.x86_64
samba-libs-4.7.1-6.el7.x86_64
samba-4.7.1-6.el7.x86_64
libsmbclient-devel-4.7.1-6.el7.x86_64
samba-common-libs-4.7.1-6.el7.x86_64
samba-common-tools-4.7.1-6.el7.x86_64
samba-client-4.7.1-6.el7.x86_64

# cat /etc/samba/smb.conf|grep NFS -A 12
[NFS]
        comment = NFS Shared Storage
        path = /n
        valid users = root
        public = no
        writable = yes
        read only = no
        browseable = yes
        guest ok = no
        printable = no
        write list = root tom@mds.xyz tomk@nix.mds.xyz
        directory mask = 0775
        create mask = 664

Start the service after enabling it:

systemctl enable smb
systemctl start smb

Samba permissions to access NFS directories, fusefs and allow export.

Likewise for fusefs filesystems:

# setsebool -P samba_share_fusefs on
# getsebool samba_share_fusefs
samba_share_fusefs –> on

 

Likewise, for NFS shares, you'll need the following to allow sharing out of NFS shares:

# setsebool -P samba_share_nfs on
# getsebool samba_share_nfs
samba_share_nfs –> on
#

And some firewalls ports to go along with it:

firewall-cmd –zone=public –permanent –add-port=445/tcp
firewall-cmd –zone=
public –permanent –add-port=139/tcp
firewall-cmd –zone=
public –permanent –add-port=138/udp
firewall-cmd –zone=
public –permanent –add-port=137/udp
firewall-cmd –reload

 

We can also enable SMB / Samba file sharing on the individual cluster hosts and allow visibility to the Gluster FS / NFS – Ganesha from Windows.

nfs01 / nfs02 / nfs03

Referencing this post, we will import a few principals from the master IPA server.  (For the KDC steps, see the reference post.):

On the IPA server, issue the following to permission retrieveal of principals on clients:

[root@idmipa01 ~]# ipa service-add nfs/nfs03.nix.mds.xyz

[root@idmipa01 ~]# ipa service-allow-retrieve-keytab nfs/nfs-c01.nix.mds.xyz@NIX.MDS.XYZ –groups=admins
[root@idmipa01 ~]# ipa service-allow-retrieve-keytab nfs/nfs-c01.nix.mds.xyz@NIX.MDS.XYZ –hosts={nfs01.nix.mds.xyz,nfs02.nix.mds.xyz,nfs03.nix.mds.xyz} 

[root@idmipa01 ~]# ipa service-allow-retrieve-keytab nfs/nfs01.nix.mds.xyz@NIX.MDS.XYZ –groups=admins  
[root@idmipa01 ~]# ipa service-allow-retrieve-keytab nfs/nfs02.nix.mds.xyz@NIX.MDS.XYZ –groups=admins
  
[root@idmipa01 ~]# ipa service-allow-retrieve-keytab nfs/nfs03.nix.mds.xyz@NIX.MDS.XYZ –groups=admins  

[root@idmipa01 ~]# ipa service-allow-retrieve-keytab nfs/nfs03.nix.mds.xyz –hosts=nfs01.nix.mds.xyz
[root@idmipa01 ~]# ipa service-allow-retrieve-keytab nfs/nfs02.nix.mds.xyz –hosts=nfs02.nix.mds.xyz
[root@idmipa01 ~]# ipa service-allow-retrieve-keytab nfs/nfs01.nix.mds.xyz –hosts=nfs03.nix.mds.xyz

On the target client issue the following:

[root@nfs01 ~]# kinit admin    # Or the user you permissioned above.
[root@nfs01 ~]# ipa-getkeytab -s idmipa01.nix.mds.xyz -p nfs/nfs-c01.nix.mds.xyz -k /etc/krb5.keytab -r 

[root@nfs01 ~]# ipa-getkeytab -s idmipa01.nix.mds.xyz -p nfs/nfs01.nix.mds.xyz -k /etc/krb5.keytab -r 

 

Pull in principals from your IPA / KDC Server.  
nfs01 / nfs02 / nfs03 

Check the HAProxy GUI to see the full status report:

http://nfs-c01:9000/haproxy-stats

Verify the cluster.

TESTING

Now let's do some checks on our NFS HA.  Mount the share using the VIP from a client then create a test file:

[root@ipaclient01 /]# mount -t nfs4 nfs-c01:/n /n
[root@ipaclient01 n]# echo -ne "Hacked It.  Gluster, NFS Ganesha, HAPROXY, keepalived scalable NFS server." > some-people-find-this-awesome.txt

[root@ipaclient01 n]# mount|grep nfs4
nfs-c01:/n on /n type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.0.236,local_lock=none,addr=192.168.0.80)
[root@ipaclient01 n]#

 

Then check each brick to see if the file was replicated:

[root@nfs01 n]# cat /bricks/0/gv01/some-people-find-this-awesome.txt
Hacked It.  Gluster, NFS Ganesha, HAPROXY, keepalived scalable NFS server.
[root@nfs01 n]# mount|grep -Ei gv01
nfs01:/gv01 on /n type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
[root@nfs01 n]#

[root@nfs02 n]# cat /bricks/0/gv01/some-people-find-this-awesome.txt
Hacked It.  Gluster, NFS Ganesha, HAPROXY, keepalived scalable NFS server.
[root@nfs02 n]# mount|grep -Ei gv01
nfs02:/gv01 on /n type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
[root@nfs02 n]#

Good!  Now let's hard shutdown one node, nfs01, the primary node.  Expected behaviour is that we need to see failover to nfs02 and then when we bring back the nfs01 server, we need to see the file is replicated.  While we do this, the client ipaclient01 is not supposed to loose any connection to the NFS mount via the VIP.  Here are the results:

[root@nfs02 n]# ps -ef|grep -Ei "haproxy|ganesha|keepalived"
root     12245     1  0 Feb19 ?        00:00:03 /usr/sbin/keepalived -D
root     12246 12245  0 Feb19 ?        00:00:03 /usr/sbin/keepalived -D
root     12247 12245  0 Feb19 ?        00:00:41 /usr/sbin/keepalived -D
root     12409     1 16 Feb20 ?        00:13:05 /usr/bin/ganesha.nfsd -L /var/log/ganesha/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT
root     17892     1  0 00:37 ?        00:00:00 /usr/sbin/haproxy-systemd-wrapper -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid
haproxy  17893 17892  0 00:37 ?        00:00:00 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds
haproxy  17894 17893  0 00:37 ?        00:00:00 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds
root     17918 21084  0 00:38 pts/0    00:00:00 grep –color=auto -Ei haproxy|ganesha|keepalived
[root@nfs02 n]# ps -ef|grep -Ei "haproxy|ganesha|keepalived"; netstat -pnlt|grep -Ei ganesha; netstat -pnlt|grep -Ei haproxy; netstat -pnlt|grep -Ei keepalived
root     12245     1  0 Feb19 ?        00:00:03 /usr/sbin/keepalived -D
root     12246 12245  0 Feb19 ?        00:00:03 /usr/sbin/keepalived -D
root     12247 12245  0 Feb19 ?        00:00:41 /usr/sbin/keepalived -D
root     12409     1 16 Feb20 ?        00:13:09 /usr/bin/ganesha.nfsd -L /var/log/ganesha/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT
root     17892     1  0 00:37 ?        00:00:00 /usr/sbin/haproxy-systemd-wrapper -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid
haproxy  17893 17892  0 00:37 ?        00:00:00 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds
haproxy  17894 17893  0 00:37 ?        00:00:00 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds
root     17947 21084  0 00:38 pts/0    00:00:00 grep –color=auto -Ei haproxy|ganesha|keepalived
tcp6       0      0 192.168.0.119:20048     :::*                    LISTEN      12409/ganesha.nfsd
tcp6       0      0 :::564                  :::*                    LISTEN      12409/ganesha.nfsd
tcp6       0      0 192.168.0.119:4501      :::*                    LISTEN      12409/ganesha.nfsd
tcp6       0      0 192.168.0.119:2049      :::*                    LISTEN      12409/ganesha.nfsd
tcp6       0      0 192.168.0.119:38468     :::*                    LISTEN      12409/ganesha.nfsd
tcp        0      0 192.168.0.80:2049       0.0.0.0:*               LISTEN      17894/haproxy
[root@nfs02 n]#
[root@nfs02 n]#
[root@nfs02 n]#
[root@nfs02 n]# ssh nfs-c01
Password:
Last login: Wed Feb 21 00:37:28 2018 from nfs-c01.nix.mine.dom
[root@nfs02 ~]# logout
Connection to nfs-c01 closed.
[root@nfs02 n]#

From client we can still see all the files (seemless with no interruption to the NFS service).  As a bonus, while we started this first test, we noticed that HAPROXY was offline on nfs02.  While trying to list the client files, it appeared hung but still responded then listed files right after we started HAPROXY on nfs02

[root@ipaclient01 n]# ls -altri some-people-find-this-awesome.txt
11782527620043058273 -rw-r–r–. 1 nobody nobody 74 Feb 21 00:26 some-people-find-this-awesome.txt
[root@ipaclient01 n]# df -h .
Filesystem      Size  Used Avail Use% Mounted on
nfs-c01:/n      128G   43M  128G   1% /n
[root@ipaclient01 n]# ssh nfs-c01
Password:
Last login: Wed Feb 21 00:41:06 2018 from nfs-c01.nix.mine.dom
[root@nfs02 ~]#

Checking the gluster volume on nfs02:

[root@nfs02 n]# gluster volume status
Status of volume: gv01
Gluster process                             TCP Port  RDMA Port  Online  Pid
——————————————————————————
Brick nfs02:/bricks/0/gv01                  49152     0          Y       16103
Self-heal Daemon on localhost               N/A       N/A        Y       16094

Task Status of Volume gv01
——————————————————————————
There are no active volume tasks

[root@nfs02 n]#

Now let's bring back the first node and fail the second after nfs01 is up again.  As soon as we bring nfs01 back up, the VIP fails over to nfs01 without any hickup or manual invervention on the client end:

[root@ipaclient01 n]# ls -altri
total 11
                 128 dr-xr-xr-x. 21 root   root   4096 Feb 18 22:24 ..
11782527620043058273 -rw-r–r–.  1 nobody nobody   74 Feb 21 00:26 some-people-find-this-awesome.txt
                   1 drwxr-xr-x.  3 nobody nobody 4096 Feb 21 00:26 .
[root@ipaclient01 n]#
[root@ipaclient01 n]#
[root@ipaclient01 n]#
[root@ipaclient01 n]# ssh nfs-c01
Password:
Last login: Wed Feb 21 00:59:56 2018
[root@nfs01 ~]#

So now let's fail the second node.  NFS still works:

[root@ipaclient01 ~]# ssh nfs-c01
Password:
Last login: Wed Feb 21 01:31:50 2018
[root@nfs01 ~]# logout
Connection to nfs-c01 closed.
[root@ipaclient01 ~]# cd /n
[root@ipaclient01 n]# ls -altri some-people-find-this-awesome.txt
11782527620043058273 -rw-r–r–. 1 nobody nobody 74 Feb 21 00:26 some-people-find-this-awesome.txt
[root@ipaclient01 n]# df -h .
Filesystem      Size  Used Avail Use% Mounted on
nfs-c01:/n      128G   43M  128G   1% /n
[root@ipaclient01 n]#

So we bring the second node back up.  And that concludes the configuration!  All works like a charm!

You can also check out our guest post for the same on loadbalancer.org!

Good Luck!

Cheers,
Tom K.

Cannot find key for kvno in keytab

If you are getting this:

krb5_child.log:(Tue Mar  6 23:18:46 2018) [[sssd[krb5_child[3193]]]] [map_krb5_error] (0×0020): 1655: [-1765328340][Cannot find key for nfs/nfs01.nix.my.dom@NIX.my.dom kvno 6 in keytab]

Then you can resolve it by copying the old keytab file back (or removing the incorrect entries using ktutil).  In our case we had made a saved copy and readded the NFS principals to the keytab file.  You can list out the current principals in the keytab file using:

klist -kte /etc/krb5.keytab

This was followed up by readding missing keytab keys from the IPA server:

ipa-getkeytab -s idmipa01.nix.my.dom -p nfs/nfs-c01.nix.my.dom -k /etc/krb5.keytab
ipa-getkeytab -s idmipa01.nix.my.dom -p nfs/nfs01.nix.my.dom -k /etc/krb5.keytab

Alternately, create the keytab entries manually using ktutil above.

Cheers,
Tom

 

Name resolution for the name timed out after none of the configured DNS servers responded.

You're getting this: 

Name resolution for the name <URL> timed out after none of the configured DNS servers responded.

One of the resolutions is to adjust a few network parameters: 

netsh interface tcp set global rss=disabled
netsh interface tcp set global autotuninglevel=disabled
netsh int ip set global taskoffload=disabled

Then set these registry options: 

regedit: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
 
EnableTCPChimney=dword:00000000
EnableTCPA=dword:00000000
EnableRSS=dword:00000000

Cheers,
Tom K.

Ping request could not find host HOST. Please check the name and try again .

ping cannot find host but nslookup on a host works just fine:  

Ping request could not find host HOST. Please check the name and try again.

Restart the DNS Client Service in Windows Services to resolve this one.  A few other commands to try:  

ipconfig /flushdns
ipconfig / registerdns

Following this, check eventviewer why it stopped working to begin with.  The service is started using:

C:\Windows\system32\svchost.exe -k NetworkService

Alternately, stopping the caching daemon also works.

Cheers,
Tom K

GlusterFS: Configuration and Setup w/ NFS-Ganesha for an HA NFS Cluster

In this post we will go over how to setup a highly available NFS Cluster using:

  • GlusterFS
  • NFS Ganesha
  • CentOS 7 
  • HAPROXY
  • keepalived
  • firewalld
  • selinux

This post is very lengthy and goes over quite a few details on the way to configuring this setup.  We document virtually every step including how to build out a GlusterFS filesystem on both physical or virtual environments.  For those interested in a quick setup, please skip to the SUMMARY or TESTING sections at the bottom for a summary of commands and configuration files used.  If you run into problems, just search the page for the issue you have, as it's likely listed, and read the solution attempted.

Read the rest of this entry »

Replication bind with GSSAPI auth failed: LDAP error 49 (Invalid credentials) ()

FreeIPA replication failes for about 13 minutes with no activity on the first IDM server.  Not clear why at first.

Feb 12 10:06:56 idmipa01 named-pkcs11[2529]: zone nix.mds.xyz/IN: sending notifies (serial 1518448016)
Feb 12 10:07:06 idmipa01 named-pkcs11[2529]: error (chase DS servers) resolving 'mds.xyz/DS/IN': 192.168.0.224#53
Feb 12 10:07:14 idmipa01 ns-slapd: [12/Feb/2018:10:07:14.130840773 -0500] – ERR – NSMMReplicationPlugin – bind_and_check_pwp – agmt="cn=meToidmipa02.nix.mds.xyz" (idmipa02:389) – Replication bind with GSSAPI auth failed: LDAP error 49 (Invalid credentials) ()
Feb 12 10:20:01 idmipa01 systemd: Created slice user-0.slice.
Feb 12 10:20:01 idmipa01 systemd: Starting user-0.slice.

The problem was again with NTP and time/date settings.

[root@idmipa02 log]# date
Wed Feb 14 00:05:58 EST 2018
[root@idmipa02 log]#

 

[root@idmipa01 log]# date
Wed Feb 14 00:00:14 EST 2018
You have new mail in /var/spool/mail/root
[root@idmipa01 log]#

Over 5 minute difference.  Checking further we see the following in the logs:

Feb 12 10:13:00 idmipa02 rc.local: Error resolving ca.pool.ntp.org: Name or service not known (-2)
Feb 12 10:13:00 idmipa02 rc.local: 12 Feb 10:13:00 ntpdate[963]: Can't find host ca.pool.ntp.org: Name or service not known (-2)
Feb 12 10:13:00 idmipa02 rc.local: 12 Feb 10:13:00 ntpdate[963]: no servers can be used, exiting

So we need to keep the time between the two masters in sync otherwise this replication issue will reoccur.  But we need to ensure our NTP servers are resolvable.  So we may need to put extra conditions in our NTP servers.  We have:

[root@idmipa01 log]# cat /etc/rc.local |grep -Evi "#"

touch /var/lock/subsys/local
ntpdate -u ca.pool.ntp.org;
[root@idmipa01 log]#

But we should use a single IP in case of failure (We are using NLB on our AD DC servers and we noted a failure on that host earlier which we just fixed.):

[root@idmipa01 log]# cat /etc/rc.local |grep -Evi "#"

touch /var/lock/subsys/local
ntpdate -u ca.pool.ntp.org || ntpdate -u 206.108.0.132 || ntpdate -u 159.203.8.72;

[root@idmipa01 log]#

This gives us some safety in case the name can't be resolved due to DNS issues.  We will also reconfigure our NTP servers as follows:

[root@idmipa02 log]# grep -Evi "#" /etc/ntp.conf | sed -e "/^$/d"
restrict default kod nomodify notrap nopeer noquery
restrict -6 default kod nomodify notrap nopeer noquery
fudge   127.127.1.0 stratum 10
restrict 192.168.0.0 mask 255.255.255.0 nomodify notrap
restrict 127.0.0.1
restrict ::1
driftfile /var/lib/ntp/ntp.drift
logfile /var/log/ntp.log
server 0.ca.pool.ntp.org prefer
server 1.ca.pool.ntp.org
server 2.ca.pool.ntp.org
server 3.ca.pool.ntp.org

server 198.50.139.209

server 207.210.46.249
includefile /etc/ntp/crypto/pw
keys /etc/ntp/keys
disable monitor
[root@idmipa02 log]#

and

[root@idmipa01 log]# grep -Evi "#" /etc/ntp.conf|sed -e "/^$/d"
restrict default kod nomodify notrap nopeer noquery
restrict -6 default kod nomodify notrap nopeer noquery
fudge   127.127.1.0 stratum 10
restrict 192.168.0.0 mask 255.255.255.0 nomodify notrap
restrict 127.0.0.1
restrict ::1
driftfile /var/lib/ntp/ntp.drift
logfile /var/log/ntp.log

server 207.210.46.249
server 198.50.139.209
server 0.ca.pool.ntp.org
server 1.ca.pool.ntp.org
server 2.ca.pool.ntp.org
server 3.ca.pool.ntp.org prefer
includefile /etc/ntp/crypto/pw
keys /etc/ntp/keys
disable monitor
[root@idmipa01 log]#

Noticed the preferred NTP servers are different on each of our NTP servers.  We're attempting to prevent a scenario where the same external NTP server is polled twice from two different servers simultaneously.  No clear evidence if this causes an issue but setting an alternate preferred server for each of our NTP servers prevents that from occurring just in case it could ever be true.  We also add 2 IP's from one the domains above in case DNS errors cause us issues.  We will be immune to this if it were ever to come up. The difference is significant:

[root@idmipa02 log]# ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*LOCAL(0)        .LOCL.          10 l    4   64    1    0.000    0.000   0.000
 k8s-w04.tblflp. 152.2.133.55     2 u    3   64    1   21.943  906.098   0.000
 echo.baxterit.n 213.251.128.249  2 u    2   64    1   39.255  908.220   0.000
 k8s-w01.tblflp. 152.2.133.55     2 u    1   64    1   18.415  903.549   0.000
 portal.switch.c 213.251.128.249  2 u    -   64    1   16.560  901.799   0.000
 mirror3.rafal.c .INIT.          16 u    -   64    0    0.000    0.000   0.000
 198.50.139.209  .INIT.          16 u    -   64    0    0.000    0.000   0.000
[root@idmipa02 log]#

[root@idmipa01 log]# ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*LOCAL(0)        .LOCL.          10 l   34   64    1    0.000    0.000   0.000
 198.50.139.209  35.73.197.144    2 u   33   64    1   19.071  -84.149   0.000
 mirror3.rafal.c 53.27.192.223    2 u   32   64    1   18.490  -56.439   0.000
 ns522433.ip-158 18.26.4.105      2 u   31   64    1   17.833  -80.900   0.000
 echo.baxterit.n 213.251.128.249  2 u   30   64    1   16.688  -82.694   0.000
 209.115.181.102 206.108.0.133    2 u   29   64    1   72.834  -82.194   0.000
 mongrel.ahem.ca .INIT.          16 u    -   64    0    0.000    0.000   0.000
[root@idmipa01 log]#

Good Luck!

Cheers,
TK

Getting asked for password when using host shortname with kerberos delegation

When trying to ssh into a host using the server's short name, you get challenged or asked for a password.  You need to set the following to:  

  • First item to set is the following:

dns_canonicalize_hostname = true

in /etc/krb5.conf.  It will then prevent from asking a password.  Using the server's FQDN will work without issues.  

  • Second item to set is to also ensure your sshd_config contains the following lines (This may or may not necessarily work however as I haven't tested all the configuration options.):

KerberosAuthentication yes
ChallengeResponseAuthentication yes

  • The other important item to check and set is the following that you have properly configured /etc/resolv.conf and ifcfg-eth0 interface.  After configuring above items, this item finally got passless single-host sign on authentication to work (DOMAIN is reported to work on certain Linux versions while SEARCH on others.  Doesn't hurt to set both.  In this case order is important for either:  mds.xyz before nix.mds.xyz):

[root@cm-r01en02 ssh]# cat /etc/resolv.conf
; generated by /usr/sbin/dhclient-script
nameserver 192.168.0.44
nameserver 192.168.0.45
search mds.xyz nix.mds.xyz

[root@cm-r01en02 ssh]# cat /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
TYPE=Ethernet
NAME=eth0
BOOTPROTO=static
PEERDNS=no
UUID=62904293-0bde-4ea9-b4a1-6a65191663f3
ONBOOT=yes
IPADDR=192.168.0.133
NETMASK="255.255.255.0"
GATEWAY="192.168.0.1"
USERCTL=no
NM_CONTROLLED=no
HOSTNAME=cm-r01en02.nix.mds.xyz
DOMAIN="mds.xyz nix.mds.xyz"
SEARCH="mds.xyz nix.mds.xyz"
DNS1=192.168.0.44
DNS2=192.168.0.45
DNS3=192.168.0.224

[root@cm-r01en02 ssh]#

My entire sshd_config file had the following set:

AcceptEnv LANG LC_CTYPE LC_NUMERIC LC_TIME LC_COLLATE LC_MONETARY LC_MESSAGES
AcceptEnv LC_IDENTIFICATION LC_ALL LANGUAGE
AcceptEnv LC_PAPER LC_NAME LC_ADDRESS LC_TELEPHONE LC_MEASUREMENT
AcceptEnv XMODIFIERS
AuthorizedKeysCommandUser nobody
AuthorizedKeysCommand /usr/bin/sss_ssh_authorizedkeys
AuthorizedKeysFile      .ssh/authorized_keys
ChallengeResponseAuthentication no
GSSAPIAuthentication yes
GSSAPICleanupCredentials no
HostKey /etc/ssh/ssh_host_ecdsa_key
HostKey /etc/ssh/ssh_host_ed25519_key
HostKey /etc/ssh/ssh_host_rsa_key
KerberosAuthentication yes
PasswordAuthentication yes
PubkeyAuthentication yes
Subsystem       sftp    /usr/libexec/openssh/sftp-server
SyslogFacility AUTHPRIV
UsePAM yes
X11Forwarding yes

Note that PEERDNS is set to no.  This is important or your config will be overwritten on reboot or network restart.  If you can't set it to no for some other reason, simply change the  immutable bit on /etc/resolv.conf using chattr -i /etc/resolv.conf .

Still doesn't work?  You just might need a little bit of patience now:

-sh-4.2$ ssh ipaclient01 -vvvv
debug1: Unspecified GSS failure.  Minor code may provide more information
Clock skew too great

debug3: send packet: type 50

Meaning your NTP daemon hasn't synced up the clock yet.  Give it some time.  Then try again.

Good luck!

Cheers,
TK

 

kinit: Cannot find KDC for realm while getting initial credentials

Problem is that you need 

dns_lookup_kdc = true

in your /etc/krb5.conf under the [libdefaults] section file:

[root@mysql01 ~]# kinit tom@mds.xyz
kinit: Cannot find KDC for realm "mds.xyz" while getting initial credentials
[root@mysql01 ~]#
[root@mysql01 ~]# vi /etc/krb5.conf
[root@mysql01 ~]# systemctl restart sssd
[root@mysql01 ~]# kinit tom@mds.xyz
Password for tom@mds.xyz:
[root@mysql01 ~]#

Cheers,
TK

 

8524 The DSA operation is unable to proceed because of a DNS lookup failure.

Reason for the below failure:

The Active Directory Domain Services Installation Wizard (Dcpromo) was unable to establish connection with the following domain controller. 

 
Domain controller:
winad01.mds.xyz 
 
Additional Data 
Error value:
8524 The DSA operation is unable to proceed because of a DNS lookup failure.

and the subsequent failure in Promotion of a Server to an Active Directory Domain Controller was due to the two nics on each host having DNS settings other then 127.0.0.1.  Two nics were present, one was a LAN and the other NLB on each host.  Once fixed, AD DC promotion went along further but still failed.

This ended up being a DNS issue between the two AD DC's.  First AD DC had a DNS server as well so had to have itself as a DNS server.  So enter first DNS server's IP into the DNS 1 field and enter the router's (usually 192.168.0.1) into DNS 2 field.

Likewise for DNS 2.  Enter the IP of the second DNS server into the NIC DNS 1 field of this second DNS / AD DC server.  DNS 2 should be the main router 192.168.0.1

DNS / AD DC 1:
IP: 192.168.0.123
DNS 1: 192.168.0.123
DNS 2: 192.168.0.1

DNS / AD DC 2:
IP: 192.168.0.124
DNS1: 192.168.0.124
DNS2: 192.168.0.1

Cheers,
TK


     
  Copyright © 2003 - 2013 Tom Kacperski (microdevsys.com). All rights reserved.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License