Header Shadow Image


LDAP ldapmodify: additional info: attribute “ipaBaseID” not allowed

When modifying LDAP entries, you may get the following error:

[root@idmipa03 ~]# ldapmodify -H ldapi://%2fvar%2frun%2fslapd-MWS-MDS-XYZ.socket << EOF
> dn: cn=ranges,cn=etc,dc=mws,dc=mds,dc=xyz
> changetype: modify
> replace: ipaBaseID
> ipaBaseID: 155600000
> EOF
SASL/EXTERNAL authentication started
SASL username: gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth
SASL SSF: 0
modifying entry "cn=ranges,cn=etc,dc=mws,dc=mds,dc=xyz"
ldap_modify: Object class violation (65)
        additional info: attribute "ipaBaseID" not allowed

What this means is that you cannot modify this entry without modifying it's dependent entries as well.  How do we find the dependent entries?  By looking at the schema using tools like jxplorer:

jXplorer Directory Listing

From the above, navigating to the ipaIDrange schema object tells us the dependencies:

LDAP Directory Schema

We can see that the objects are listed with a tag of MUST:

MUST
  • cn
  • ipaBaseID
  • ipaIDRangeSize
  • ipaRangeType

We check the other tag listed as well:

MUST
  • ipaBaseRID
  • ipaNTTrustedDomainSID

This tells us the objects we need to include alongside the one value we want to modify. (  NOTE: Since we don't want to modify any of the other values, we are simply copying and pasting the existing values into the same key / value pairs of the DIT.  ):

[root@idmipa03 ~]# ldapmodify -H ldapi://%2fvar%2frun%2fslapd-MWS-MDS-XYZ.socket << EOF
> dn: cn=MDS.XYZ_id_range,cn=ranges,cn=etc,dc=mws,dc=mds,dc=xyz
> changetype: modify
> replace: ipaBaseRID
> ipaBaseRID: 155600000
> –

> replace: ipaBaseID
> ipaBaseID: 155600000
> –
> replace: ipaIDRangeSize
> ipaIDRangeSize: 200000
> –
> replace: ipaNTTrustedDomainSID
> ipaNTTrustedDomainSID: S-1-5-21-1803828911-4163023034-2461700517
> –
> replace: ipaRangeType
> ipaRangeType: ipa-ad-trust-posix
> –

> EOF
SASL/EXTERNAL authentication started
SASL username: gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth
SASL SSF: 0
modifying entry "cn=MDS.XYZ_id_range,cn=ranges,cn=etc,dc=mws,dc=mds,dc=xyz"

[root@idmipa03 ~]#
 

And we finally have a successful modification.

Cheers,
TK

LDAP ldapmodify: additional info: single-valued attribute “ipaBaseRID” has multiple values

You may run into the following when trying to modify the FreeIPA ID Ranges:

[root@ipa03 ~]# ldapmodify -H ldapi://%2fvar%2frun%2fslapd-MWS-MDS-XYZ.socket << EOF
> dn: cn=MDS.XYZ_id_range,cn=ranges,cn=etc,dc=mws,dc=mds,dc=xyz
> changetype: modify
> add: ipaBaseRID
> ipaBaseRID: 200000000
> EOF
SASL/EXTERNAL authentication started
SASL username: gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth
SASL SSF: 0
modifying entry "cn=MDS.XYZ_id_range,cn=ranges,cn=etc,dc=mws,dc=mds,dc=xyz"
ldap_modify: Object class violation (65)
        additional info: single-valued attribute "ipaBaseRID" has multiple values

The real issue is with the line:

> add: ipaBaseRID

What the error means is that you're trying to ADD another attribute ipaBaseRID instead of replacing or updating the value.  This is a violation of DIT rules.  You cannot have more than one ipaBaseRID key and value pair.

The correct syntax is, therefore to use the replace tag: 

[root@idmipa03 ~]# ldapmodify -H ldapi://%2fvar%2frun%2fslapd-MWS-MDS-XYZ.socket << EOF
> dn: cn=ranges,cn=etc,dc=mws,dc=mds,dc=xyz
> changetype: modify
> replace: ipaBaseID
> ipaBaseID: 155600000
> EOF
SASL/EXTERNAL authentication started
SASL username: gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth
SASL SSF: 0
modifying entry "cn=ranges,cn=etc,dc=mws,dc=mds,dc=xyz"
ldap_modify: Object class violation (65)
        additional info: attribute "ipaBaseID" not allowed

NOTE: However due to the nature of the object definitions on our FreeIPA server, this results in another error you see above.  This is solved through the  LDAP ldapmodify: additional info: attribute "ipaBaseID" not allowed page.

Cheers,
TK

 

Free IPA Replication Verification Tool

There is a tool available that does a verification of the replication of each FreeIPA host:

yum install git -y; git clone https://github.com/peterpakos/checkipaconsistency.git

# ./cipa -d mws.mds.xyz -W "SECRET"
+——————–+————+————-+——-+
| FreeIPA servers:   | idmipa03   | idmipa04    | STATE |
+——————–+————+————-+——-+
| Active Users       | 1          | 1           | OK    |
| Stage Users        | 0          | 0           | OK    |
| Preserved Users    | 0          | 0           | OK    |
| Hosts              | 2          | 2           | OK    |
| Services           | 11         | 11          | OK    |
| User Groups        | 10         | 10          | OK    |
| Host Groups        | 1          | 1           | OK    |
| Netgroups          | 0          | 0           | OK    |
| HBAC Rules         | 1          | 1           | OK    |
| SUDO Rules         | 0          | 0           | OK    |
| DNS Zones          | 3          | 3           | OK    |
| Certificates       | 17         | 17          | OK    |
| LDAP Conflicts     | 0          | 0           | OK    |
| Ghost Replicas     | 0          | 0           | OK    |
| Anonymous BIND     | ON         | ON          | OK    |
| Microsoft ADTrust  | True       | False       | FAIL  |
| Replication Status | idmipa04 0 | idmipa03 18 | OK    |
+——————–+————+————-+——-+
#

Cheers,
TK

[sssd[pac]] [accept_fd_handler] (0x0020): Access denied for uid [994]. / [resolv_discover_srv_done] (0x0040): SRV query failed [11]: Could not contact DNS servers

You receive the following two errors when dealing with apparent group lookups using getent group <USER GROUP> :

[sssd[pac]] [accept_fd_handler] (0x0020): Access denied for uid [994]. 

[resolv_discover_srv_done]
(0x0040): SRV query failed [11]: Could not contact DNS servers

Read the rest of this entry »

Feb 17 00:35:37 idmipa04 ns-slapd: [17/Feb/2019:00:35:37.251117736 -0500] – ERR – agmt=”cn=meToidmipa03.mws.mds.xyz” (idmipa03:389) – clcache_load_buffer – Can’t locate CSN 5c593ee3000200050000 in the changelog (DB rc=-30988). If replication stops, the consumer may need to be reinitialized.

When you get this:

Feb 17 00:35:37 idmipa04 ns-slapd: [17/Feb/2019:00:35:37.251117736 -0500] – ERR – agmt="cn=meToidmipa03.mws.mds.xyz" (idmipa03:389) – clcache_load_buffer – Can't locate CSN 5c593ee3000200050000 in the changelog (DB rc=-30988). If replication stops, the consumer may need to be reinitialized.

Run this on the replica throwing the above error:

[root@idmipa04 ~]# ipa-replica-manage re-initialize –from idmipa03.mws.mds.xyz
Directory Manager password:

Update in progress, 3 seconds elapsed
Update succeeded

[root@idmipa04 ~]#

Cheers,
TK

Zabbix: [Z3001] connection to database ‘zabbix’ failed: [2003] Can’t connect to MySQL server on ‘mysql-01.abc.xyz.123’ (13)

Zabbix error:

[Z3001] connection to database ‘zabbix’ failed: [2003] Can't connect to MySQL server on 'mysql-01.abc.xyz.123' (13)

related to:

audit.log:type=AVC msg=audit(1549949080.977:11328): avc:  denied  { name_connect } for  pid=9115 comm="zabbix_server" dest=3306 scontext=system_u:system_r:zabbix_t:s0 tcontext=system_u:object_r:mysqld_port_t:s0 tclass=tcp_socket

is solved by:

# grep AVC /var/log/audit/audit.log | audit2allow -M systemd-allow; semodule -i systemd-allow.pp

Cheers,
TK

Zabbix: cannot start preprocessing service: Cannot bind socket to “/var/run/zabbix/zabbix_server_preprocessing.sock”: [98] Address already in use.

Zabbix error:

 10272:20190212:003104.073 cannot start preprocessing service: Cannot bind socket to "/var/run/zabbix/zabbix_server_preprocessing.sock": [98] Address already in use.
 10239:20190212:003104.078 One child process died (PID:10272,exitcode/signal:1). Exiting …

related to:

# cat ../audit/audit.log|grep -Ei denied|tail
type=AVC msg=audit(1549949530.062:12551): avc:  denied  { unlink } for  pid=10521 comm="zabbix_server" name="zabbix_server_preprocessing.sock" dev="tmpfs" ino=3998803 scontext=system_u:system_r:zabbix_t:s0 tcontext=system_u:object_r:zabbix_var_run_t:s0 tclass=sock_file

is solved by:

# grep AVC /var/log/audit/audit.log* | audit2allow -M systemd-allow; semodule -i systemd-allow.pp

Cheers,
TK

Zabbix: cannot set resource limit: [13] Permission denied

Zabbix error:

 10587:20190212:003514.676 using configuration file: /etc/zabbix/zabbix_server.conf
 10587:20190212:003514.676 cannot set resource limit: [13] Permission denied

relates to:

[root@host01 zabbix]# cat ../audit/audit.log|grep -Ei denied|tail
type=AVC msg=audit(1549949714.675:12570): avc:  denied  { setrlimit } for  pid=10587 comm="zabbix_server" scontext=system_u:system_r:zabbix_t:s0 tcontext=system_u:system_r:zabbix_t:s0 tclass=process
[root@host01 zabbix]#

and is solved by:

[root@host01 zabbix]# grep AVC /var/log/audit/audit.log* | audit2allow -M systemd-allow; semodule -i systemd-allow.pp

Cheers,
TK

FreeIPA Quick Setup Guide w/ Replication HA, AD DC Trust, Sudo, Ganesha NFS

In this post, we are setting up an IPA server on a separate domain than the one we had configured earlier ( nix.mds.xyz ) .   We do so because IPA comes not only with Authentication and DNS but also with a built in KDC to which we will be connnecting various pieces of software that will make changes to our KDC.  For this reason we prefer to separate our KDC on a secondary domain while allowing the same AD users to authenticate via both IPA servers.  

Read the rest of this entry »

Install RabbitMQ in High Availability

In this post we'll install RabbitMQ in High Availability on 3 nodes.  We'll do this to share out the instance with third party applications that need it while providing fault tolerance. We will reference the following post but instead on CentOS 7.

So let's get started.

HOSTS COMMANDS DESCRIPTION
rmq01 / rmq02 / rmq03 CentOS 7 Create 3 seperate VM's to add to your cluster.
rmq01 / rmq02 / rmq03 echo "net.ipv4.ip_nonlocal_bind = 1" >> /etc/sysctl.conf; echo "net.ipv4.ip_forward = 1" >> /etc/sysctl.conf; sysctl -p Set no local bind and ip forward for RabbitMQ,
HAProxy and Keepalived.
rmq01 / rmq02 / rmq03

[root@rmq01 ~]# cat /etc/firewalld/zones/public.xml
<?xml version="1.0" encoding="utf-8"?>
<zone>
  <short>Public</short>
  <description>For use in public areas. You do not trust the other computers on networks to not harm your computer. Only selected incoming connections are accepted.</description>
  <service name="ssh"/>
  <service name="dhcpv6-client"/>
  <port protocol="tcp" port="24007-24008"/>
  <port protocol="tcp" port="49152"/>
  <port protocol="tcp" port="38465-38469"/>
  <port protocol="tcp" port="111"/>
  <port protocol="udp" port="111"/>
  <port protocol="tcp" port="2049"/>
  <port protocol="tcp" port="4501"/>
  <port protocol="udp" port="4501"/>
  <port protocol="udp" port="20048"/>
  <port protocol="tcp" port="20048"/>
  <port protocol="tcp" port="22"/>
  <port protocol="udp" port="22"/>
  <port protocol="tcp" port="10000"/>
  <port protocol="udp" port="49000-59999"/>
  <port protocol="tcp" port="49000-59999"/>
  <port protocol="udp" port="9000"/>
  <port protocol="tcp" port="9000"/>
  <port protocol="udp" port="137"/>
  <port protocol="udp" port="138"/>
  <port protocol="udp" port="2049"/>
  <port protocol="udp" port="2049"/>
  <port protocol="tcp" port="4369"/>
  <port protocol="tcp" port="5671-5672"/>
  <port protocol="tcp" port="25672"/>
  <port protocol="tcp" port="35672-35682"/>
  <port protocol="tcp" port="15672"/>
  <port protocol="tcp" port="61613-61614"/>
  <port protocol="tcp" port="1883"/>
  <port protocol="tcp" port="8883"/>
  <port protocol="tcp" port="15674"/>
  <port protocol="tcp" port="15675"/>

</zone>
[root@rmq01 ~]#

[root@rmq01 ~]# scp /etc/firewalld/zones/public.xml rmq02:/etc/firewalld/zones/public.xml
Password:
public.xml                                                                                       100% 1509     1.0MB/s   00:00
[root@rmq01 ~]# scp /etc/firewalld/zones/public.xml rmq03:/etc/firewalld/zones/public.xml
Password:
public.xml                                                                                       100% 1509   921.3KB/s   00:00
[root@rmq01 ~]#

[root@rmq01 ~]# systemctl restart firewalld; systemctl status firewalld -l
[root@rmq02 ~]# systemctl restart firewalld; systemctl status firewalld -l
[root@rmq03 ~]# systemctl restart firewalld; systemctl status firewalld -l

Prepare the firewall
and distribute the
configuration to the
other nodes. NOTE:
The ports that are
needed for this are
below the line with 4369.

rmq01 / rmq02 / rmq03

RMQ01

[root@rmq01 ~]# yum install rabbitmq*
(Optional) [root@rmq01 ~]# rabbitmq-server -detached
[root@rmq01 audit]# systemctl restart rabbitmq-server

[root@rmq01 ~]# rabbitmqctl cluster_status
[root@rmq01 ~]# rabbitmqctl add_user roboconf roboconf
[root@rmq01 ~]# rabbitmqctl set_permissions roboconf ".*" ".*" ".*"
[root@rmq01 ~]# rabbitmqctl set_policy ha-all "" '{"ha-mode":"all","ha-sync-mode":"automatic"}'

COOKIE:

[root@rmq01 audit]# cat /var/lib/rabbitmq/.erlang.cookie
OMYNAUNDNDCMEOBIYFLU[root@rmq01 audit]#
[root@rmq01 audit]#
[root@rmq01 audit]# scp /var/lib/rabbitmq/.erlang.cookie rmq02:/var/lib/rabbitmq/.erlang.cookie
[root@rmq01 audit]# scp /var/lib/rabbitmq/.erlang.cookie rmq03:/var/lib/rabbitmq/.erlang.cookie

RMQ02

[root@rmq02 ~]# yum install rabbitmq*
(Optional) [root@rmq02 ~]# rabbitmq-server -detached

[root@rmq02 ~]# systemctl restart rabbitmq-server
[root@rmq02 ~]# rabbitmqctl stop_app
[root@rmq02 ~]# rabbitmqctl join_cluster rabbit@rmq01
[root@rmq02 ~]# rabbitmqctl start_app

RMQ03

[root@rmq03 ~]# yum install rabbitmq*
(Optional) [root@rmq03 ~]# rabbitmq-server -detached

[root@rmq03 ~]# systemctl restart rabbitmq-server
[root@rmq03 ~]# rabbitmqctl stop_app
[root@rmq03 ~]# rabbitmqctl join_cluster rabbit@rmq01
[root@rmq03 ~]# 
rabbitmqctl start_app
 

Check:

[root@rmq01 audit]# rabbitmqctl status

Install and configure RabbitMQ on nodes
as to your left.
rmq01 / rmq02 / rmq03

At the end of configuring RabbitMQ, ensure service start and stop correctly using the following:

[root@rmq01 audit]# systemctl restart rabbitmq-server
[root@rmq01 audit]# systemctl status rabbitmq-server -l

You may receive errors.  In such case, see below.

Once you're done
above, you'll stop
and start the
service in this manner.
rmq01 / rmq02 / rmq03

Run any of the following command, or a combination of, on deny entries in /var/log/audit/audit.log that may appear as you stop, start or install above services:

METHOD 1:
grep AVC /var/log/audit/audit.log | audit2allow -M systemd-allow;semodule -i systemd-allow.pp

METHOD 2:
audit2allow -a
audit2allow -a -M ganesha_<NUM>_port
semodule -i ganesha_<NUM>_port.pp

USEFULL THINGS:

ausearch –interpret
aureport

Configure selinux. Don't disable it.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

This actually  makes your host safer and is actually
easy to work with using just these commands.

rmq01 / rmq02 / rmq03

PACKAGES:
yum install haproxy     # ( 1.5.18-6.el7.x86_64 used in this case )

On all the nodes add the following:  

# cat /etc/haproxy/haproxy.cfg

global
    log         127.0.0.1 local0
    stats       socket /var/run/haproxy.sock mode 0600 level admin
    user        haproxy
    group       haproxy
    daemon
    debug
    maxconn 1024

defaults
    mode tcp
    log global
    option                  dontlognull
    option                  redispatch
    retries 2
    timeout client 30m
    timeout connect 4s
    timeout server 30m
    timeout check 5s

listen stats
    bind :9000
    mode http
    stats enable
    stats hide-version
    stats realm Haproxy\ Statistics
    stats uri /haproxy-stats
    stats auth admin:secretPassword

frontend rmq-in
    mode tcp
    bind rmq-c01:15672
    option tcplog
    default_backend             rmq-back


backend rmq-back
    log         /dev/log local0 debug
    option      tcplog
    mode        tcp
    balance     source
    timeout client  3h
    timeout server  3h
    server      rmq01.nix.mds.xyz    rmq01.nix.mds.xyz:5672 check fall 3 rise 2
    server      rmq02.nix.mds.xyz    rmq02.nix.mds.xyz:5672 check fall 3 rise 2
    server      rmq03.nix.mds.xyz    rmq03.nix.mds.xyz:5672 check fall 3 rise 2
#

Set logging settings for HAProxy:

# cat /etc/rsyslog.d/haproxy.conf
$ModLoad imudp
$UDPServerAddress 127.0.0.1
$UDPServerRun 514
local6.* /var/log/haproxy.log
local0.* /var/log/haproxy.log

Configure rsyslogd (/etc/rsyslog.conf):

local0.*             /var/log/haproxy.log
local3.*             /var/log/keepalived.log

Install and Configure HAPROXY. great source that helped with this part.
rmq01 / rmq02 / rmq03

PACKAGES:

yum install keepalived    # ( Used 1.3.5-1.el7.x86_64 in this case )

RMQ01:

vrrp_script chk_haproxy {
        script "killall -0 haproxy"             # check the haproxy process
        interval 2                              # every 2 seconds
        weight 2                                # add 2 points if OK
}

vrrp_instance rmq-c01 {
        interface eth0                          # interface to monitor
        state MASTER                            # MASTER on haproxy1, BACKUP on haproxy2
        virtual_router_id 57                    # Set to last digit of cluster IP.
        priority 101                            # 101 on haproxy1, 100 on haproxy2

        authentication {
                auth_type PASS
                auth_pass 
s3cretp@s$w0rd
        }

        virtual_ipaddress {
                delay_loop 12
                lb_algo wrr
                lb_kind DR
                protocol TCP
                192.168.0.57                    # virtual ip address
        }

        track_script {
                chk_haproxy
        }
}

RMQ02:

vrrp_script chk_haproxy {
        script "killall -0 haproxy"             # check the haproxy process
        interval 2                              # every 2 seconds
        weight 2                                # add 2 points if OK
}

vrrp_instance rmq-c01 {
        interface eth0                          # interface to monitor
        state BACKUP                            # MASTER on haproxy1, BACKUP on haproxy2
        virtual_router_id 57                    # Set to last digit of cluster IP.
        priority 102                            # 101 on haproxy1, 100 on haproxy2

        authentication {
                auth_type PASS
                auth_pass 
s3cretp@s$w0rd
        }

        virtual_ipaddress {
                delay_loop 12
                lb_algo wrr
                lb_kind DR
                protocol TCP
                192.168.0.57                    # virtual ip address
        }

        track_script {
                chk_haproxy
        }
}

RMQ03:

vrrp_script chk_haproxy {
        script "killall -0 haproxy"             # check the haproxy process
        interval 2                              # every 2 seconds
        weight 2                                # add 2 points if OK
}

vrrp_instance rmq-c01 {
        interface eth0                          # interface to monitor
        state BACKUP                            # MASTER on haproxy1, BACKUP on haproxy2
        virtual_router_id 57                    # Set to last digit of cluster IP.
        priority 103                            # 101 on haproxy1, 100 on haproxy2

        authentication {
                auth_type PASS
                auth_pass 
s3cretp@s$w0rd
        }

        virtual_ipaddress {
                delay_loop 12
                lb_algo wrr
                lb_kind DR
                protocol TCP
                192.168.0.57                    # virtual ip address
        }

        track_script {
                chk_haproxy
        }
}

Configure keepalived. A great source that helped with this as well.
Any External Node

In the first window, create the following python code and start it:

[root@awx-mess01 ~]# cat ./receive-mq.py
#!/usr/bin/env python
import pika

connection = pika.BlockingConnection(pika.ConnectionParameters('rmq-c01'))
channel = connection.channel()

channel.queue_declare(queue='hello')

def callback(ch, method, properties, body):
    print(" [x] Received %r" % body)

channel.basic_consume(callback,
                      queue='hello',
                      no_ack=True)

print(' [*] Waiting for messages. To exit press CTRL+C')
channel.start_consuming()

[root@awx-mess01 ~]#

In the second window start the following code:

[root@linsrvj01 ~]# cat ./send-mq.py
#!/usr/bin/env python
import pika

connection = pika.BlockingConnection(pika.ConnectionParameters('rmq-c01'))
channel = connection.channel()
channel.queue_declare(queue='hello')

channel.basic_publish(exchange='',
                      routing_key='hello',
                      body='Hello World!')
print(" [x] Sent 'Hello World!'")

connection.close()
[root@linsrvj01 ~]#

Now keep rerunning the send-my.py code and notice how the receiver retrieves each message:

[root@linsrvj01 ~]# ./send-mq.py
[x] Sent 'Hello World!'
[root@linsrvj01 ~]# ./send-mq.py
[x] Sent 'Hello World!'
[root@linsrvj01 ~]#

On second terminal, watch the messages appear:

[root@awx-mess01 ~]# ./receive-mq.py
 [*] Waiting for messages. To exit press CTRL+C
 [x] Received 'Hello World!'
 [x] Received 'Hello World!'

 

Using the following
external tutorial,
use the following
python snippets
to test the RabbitMQ setup.  

PROBLEM:

If you get the below issue:

[root@rmq01 audit]# systemctl status rabbitmq-server -l
â rabbitmq-server.service – RabbitMQ broker
   Loaded: loaded (/usr/lib/systemd/system/rabbitmq-server.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Sat 2018-11-24 20:48:47 EST; 8min ago
  Process: 4593 ExecStop=/usr/lib/rabbitmq/bin/rabbitmqctl stop (code=exited, status=0/SUCCESS)
  Process: 4558 ExecStart=/usr/lib/rabbitmq/bin/rabbitmq-server -detached (code=exited, status=1/FAILURE)
 Main PID: 4558 (code=exited, status=1/FAILURE)

Nov 24 20:48:47 rmq01.nix.mds.xyz systemd[1]: Starting RabbitMQ broker…
Nov 24 20:48:47 rmq01.nix.mds.xyz rabbitmq-server[4558]: Warning: PID file not written; -detached was passed.
Nov 24 20:48:47 rmq01.nix.mds.xyz systemd[1]: rabbitmq-server.service: Got notification message from PID 4571, but reception only permitted for main PID 4558
Nov 24 20:48:47 rmq01.nix.mds.xyz rabbitmq-server[4558]: ERROR: node with name "rabbit" already running on "rmq01"
Nov 24 20:48:47 rmq01.nix.mds.xyz systemd[1]: rabbitmq-server.service: main process exited, code=exited, status=1/FAILURE

Nov 24 20:48:47 rmq01.nix.mds.xyz rabbitmqctl[4593]: Stopping and halting node rabbit@rmq01 …
Nov 24 20:48:47 rmq01.nix.mds.xyz systemd[1]: Failed to start RabbitMQ broker.
Nov 24 20:48:47 rmq01.nix.mds.xyz systemd[1]: Unit rabbitmq-server.service entered failed state.
Nov 24 20:48:47 rmq01.nix.mds.xyz systemd[1]: rabbitmq-server.service failed.
[root@rmq01 audit]# ps -ef|grep -Ei rabbit
rabbitmq  1801     1  0 19:31 ?        00:00:00 /usr/lib64/erlang/erts-5.10.4/bin/epmd -daemon
rabbitmq  5102     1  1 20:54 ?        00:00:02 /usr/lib64/erlang/erts-5.10.4/bin/beam.smp -W w -K true -A30 -P 1048576 — -root /usr/lib64/erlang -progname erl — -home /var/lib/rabbitmq — -pa /usr/lib/rabbitmq/lib/rabbitmq_server-3.3.5/sbin/../ebin -noshell -noinput -s rabbit boot -sname rabbit@rmq01 -boot start_sasl -config /etc/rabbitmq/rabbitmq -kernel inet_default_connect_options [{nodelay,true}] -sasl errlog_type error -sasl sasl_error_logger false -rabbit error_logger {file,"/var/log/rabbitmq/rabbit@rmq01.log"} -rabbit sasl_error_logger {file,"/var/log/rabbitmq/rabbit@rmq01-sasl.log"} -rabbit enabled_plugins_file "/etc/rabbitmq/enabled_plugins" -rabbit plugins_dir "/usr/lib/rabbitmq/lib/rabbitmq_server-3.3.5/sbin/../plugins" -rabbit plugins_expand_dir "/var/lib/rabbitmq/mnesia/rabbit@rmq01-plugins-expand" -os_mon start_cpu_sup false -os_mon start_disksup false -os_mon start_memsup false -mnesia dir "/var/lib/rabbitmq/mnesia/rabbit@rmq01" -kernel inet_dist_listen_min 25672 -kernel inet_dist_listen_max 25672 -noshell -noinput
rabbitmq  5146  5102  0 20:54 ?        00:00:00 inet_gethost 4
rabbitmq  5147  5146  0 20:54 ?        00:00:00 inet_gethost 4
root      5389  1671  0 20:56 pts/0    00:00:00 grep –color=auto -Ei rabbit
[root@rmq01 audit]#

It's probably because you need to run the following on all the nodes:

[root@rmq01 audit]# echo "net.ipv4.ip_nonlocal_bind = 1" >> /etc/sysctl.conf; echo "net.ipv4.ip_forward = 1" >> /etc/sysctl.conf; sysctl -p
net.ipv4.ip_nonlocal_bind = 1
net.ipv4.ip_forward = 1
[root@rmq01 audit]#

PROBLEM:

[root@rmq02 ~]# rabbitmqctl join_cluster rabbit@rmq01.nix.mds.xyz
Clustering node rabbit@rmq02 with 'rabbit@rmq01.nix.mds.xyz' …
Error: unable to connect to nodes [‘rabbit@rmq01.nix.mds.xyz’]: nodedown

=ERROR REPORT==== 24-Nov-2018::21:19:50 ===
** System NOT running to use fully qualified hostnames **
** Hostname rmq01.nix.mds.xyz is illegal **

DIAGNOSTICS
===========

attempted to contact: [‘rabbit@rmq01.nix.mds.xyz’]

rabbit@rmq01.nix.mds.xyz:
  * connected to epmd (port 4369) on rmq01.nix.mds.xyz
  * epmd reports node 'rabbit' running on port 25672
  * TCP connection succeeded but Erlang distribution failed
  * suggestion: hostname mismatch?
  * suggestion: is the cookie set correctly?

current node details:
– node name: rabbitmqctl2977@rmq02
– home dir: /var/lib/rabbitmq
– cookie hash: Cv6uyhSSoLwjp6RqeyCv2Q==

[root@rmq02 ~]#

The cookie was not distributed correctly as per above.  Distribute the cookie and try again.

PROBLEM:

[root@rmq03 audit]# systemctl start rabbitmq-server
Job for rabbitmq-server.service failed because the control process exited with error code. See "systemctl status rabbitmq-server.service" and "journalctl -xe" for details.
[root@rmq03 audit]# systemctl status rabbitmq-server -l
â rabbitmq-server.service – RabbitMQ broker
   Loaded: loaded (/usr/lib/systemd/system/rabbitmq-server.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Sat 2018-11-24 22:29:01 EST; 7s ago
  Process: 3405 ExecStop=/usr/lib/rabbitmq/bin/rabbitmqctl stop (code=exited, status=1/FAILURE)
  Process: 3371 ExecStart=/usr/lib/rabbitmq/bin/rabbitmq-server (code=exited, status=1/FAILURE)
 Main PID: 3371 (code=exited, status=1/FAILURE)

Nov 24 22:29:00 rmq03.nix.mds.xyz rabbitmqctl[3405]: {error_logger,{{2018,11,24},{22,29,0}},crash_report,[[{initial_call,{auth,init,[‘Argument__1’]}},{pid,<0.19.0>},{registered_name,[]},{error_info,{exit,{“Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces”,[{auth,init_cookie,0,[{file,”auth.erl”},{line,285}]},{auth,init,1,[{file,”auth.erl”},{line,139}]},{gen_server,init_it,6,[{file,”gen_server.erl”},{line,304}]},{proc_lib,init_p_do_apply,3,[{file,”proc_lib.erl”},{line,239}]}]},[{gen_server,init_it,6,[{file,”gen_server.erl”},{line,328}]},{proc_lib,init_p_do_apply,3,[{file,”proc_lib.erl”},{line,239}]}]}},{ancestors,[net_sup,kernel_sup,<0.10.0>]},{messages,[]},{links,[<0.17.0>]},{dictionary,[]},{trap_exit,true},{status,running},{heap_size,610},{stack_size,27},{reductions,635}],[]]}
Nov 24 22:29:00 rmq03.nix.mds.xyz rabbitmqctl[3405]: {error_logger,{{2018,11,24},{22,29,0}},supervisor_report,[{supervisor,{local,net_sup}},{errorContext,start_error},{reason,{"Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces",[{auth,init_cookie,0,[{file,”auth.erl”},{line,285}]},{auth,init,1,[{file,”auth.erl”},{line,139}]},{gen_server,init_it,6,[{file,”gen_server.erl”},{line,304}]},{proc_lib,init_p_do_apply,3,[{file,”proc_lib.erl”},{line,239}]}]}},{offender,[{pid,undefined},{name,auth},{mfargs,{auth,start_link,[]}},{restart_type,permanent},{shutdown,2000},{child_type,worker}]}]}
Nov 24 22:29:00 rmq03.nix.mds.xyz rabbitmqctl[3405]: {error_logger,{{2018,11,24},{22,29,0}},supervisor_report,[{supervisor,{local,kernel_sup}},{errorContext,start_error},{reason,{shutdown,{failed_to_start_child,auth,{“Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces”,[{auth,init_cookie,0,[{file,”auth.erl”},{line,285}]},{auth,init,1,[{file,”auth.erl”},{line,139}]},{gen_server,init_it,6,[{file,”gen_server.erl”},{line,304}]},{proc_lib,init_p_do_apply,3,[{file,”proc_lib.erl”},{line,239}]}]}}}},{offender,[{pid,undefined},{name,net_sup},{mfargs,{erl_distribution,start_link,[]}},{restart_type,permanent},{shutdown,infinity},{child_type,supervisor}]}]}
Nov 24 22:29:00 rmq03.nix.mds.xyz rabbitmqctl[3405]: {error_logger,{{2018,11,24},{22,29,0}},crash_report,[[{initial_call,{application_master,init,[‘Argument__1′,’Argument__2′,’Argument__3′,’Argument__4’]}},{pid,<0.9.0>},{registered_name,[]},{error_info,{exit,{{shutdown,{failed_to_start_child,net_sup,{shutdown,{failed_to_start_child,auth,{“Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces”,[{auth,init_cookie,0,[{file,”auth.erl”},{line,285}]},{auth,init,1,[{file,”auth.erl”},{line,139}]},{gen_server,init_it,6,[{file,”gen_server.erl”},{line,304}]},{proc_lib,init_p_do_apply,3,[{file,”proc_lib.erl”},{line,239}]}]}}}}},{kernel,start,[normal,[]]}},[{application_master,init,4,[{file,”application_master.erl”},{line,133}]},{proc_lib,init_p_do_apply,3,[{file,”proc_lib.erl”},{line,239}]}]}},{ancestors,[<0.8.0>]},{messages,[{‘EXIT’,<0.10.0>,normal}]},{links,[<0.8.0>,<0.7.0>]},{dictionary,[]},{trap_exit,true},{status,running},{heap_size,610},{stack_size,27},{reductions,149}],[]]}
Nov 24 22:29:00 rmq03.nix.mds.xyz rabbitmqctl[3405]: {error_logger,{{2018,11,24},{22,29,0}},std_info,[{application,kernel},{exited,{{shutdown,{failed_to_start_child,net_sup,{shutdown,{failed_to_start_child,auth,{“Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces”,[{auth,init_cookie,0,[{file,”auth.erl”},{line,285}]},{auth,init,1,[{file,”auth.erl”},{line,139}]},{gen_server,init_it,6,[{file,”gen_server.erl”},{line,304}]},{proc_lib,init_p_do_apply,3,[{file,”proc_lib.erl”},{line,239}]}]}}}}},{kernel,start,[normal,[]]}}},{type,permanent}]}
Nov 24 22:29:01 rmq03.nix.mds.xyz rabbitmqctl[3405]: {“Kernel pid terminated”,application_controller,”{application_start_failure,kernel,{{shutdown,{failed_to_start_child,net_sup,{shutdown,{failed_to_start_child,auth,{\”Error when reading /var/lib/rabbitmq/.erlang.cookie: eacces\”,[{auth,init_cookie,0,[{file,\”auth.erl\”},{line,285}]},{auth,init,1,[{file,\”auth.erl\”},{line,139}]},{gen_server,init_it,6,[{file,\”gen_server.erl\”},{line,304}]},{proc_lib,init_p_do_apply,3,[{file,\”proc_lib.erl\”},{line,239}]}]}}}}},{kernel,start,[normal,[]]}}}"}
Nov 24 22:29:01 rmq03.nix.mds.xyz systemd[1]: rabbitmq-server.service: control process exited, code=exited status=1
Nov 24 22:29:01 rmq03.nix.mds.xyz systemd[1]: Failed to start RabbitMQ broker.
Nov 24 22:29:01 rmq03.nix.mds.xyz systemd[1]: Unit rabbitmq-server.service entered failed state.
Nov 24 22:29:01 rmq03.nix.mds.xyz systemd[1]: rabbitmq-server.service failed.
[root@rmq03 audit]#

If you get the above, notice the cookie access error above in red.  Need to ensure the cookie is set to rabbitmq:rabbitmq :

[root@rmq03 audit]# ls -altri /var/lib/rabbitmq/.erlang.cookie
67885078 -r——–. 1 root root 20 Nov 24 21:24 /var/lib/rabbitmq/.erlang.cookie
[root@rmq03 audit]# chown rabbitmq:rabbitmq /var/lib/rabbitmq/.erlang.cookie
[root@rmq03 audit]# 

If you get the below later on when restarting:

Jan 20 10:18:32 rmq01.nix.mds.xyz rabbitmqctl[20055]: Stopping and halting node rabbit@rmq01 …
Jan 20 10:18:32 rmq01.nix.mds.xyz rabbitmqctl[20055]: Error: unable to connect to node rabbit@rmq01: nodedown
Jan 20 10:18:32 rmq01.nix.mds.xyz rabbitmqctl[20055]: DIAGNOSTICS
Jan 20 10:18:32 rmq01.nix.mds.xyz rabbitmqctl[20055]: ===========
Jan 20 10:18:32 rmq01.nix.mds.xyz rabbitmqctl[20055]: attempted to contact: [rabbit@rmq01]
Jan 20 10:18:32 rmq01.nix.mds.xyz rabbitmqctl[20055]: rabbit@rmq01:
Jan 20 10:18:32 rmq01.nix.mds.xyz rabbitmqctl[20055]: * unable to connect to epmd (port 4369) on rmq01: address (cannot connect to host/port)
Jan 20 10:18:32 rmq01.nix.mds.xyz rabbitmqctl[20055]: current node details:
Jan 20 10:18:32 rmq01.nix.mds.xyz rabbitmqctl[20055]: – node name: rabbitmqctl20055@rmq01
Jan 20 10:18:32 rmq01.nix.mds.xyz rabbitmqctl[20055]: – home dir: /var/lib/rabbitmq
Jan 20 10:18:32 rmq01.nix.mds.xyz rabbitmqctl[20055]: – cookie hash: 2uvw1wIo+kLjOrVK8HhicQ==
Jan 20 10:18:32 rmq01.nix.mds.xyz systemd[1]: rabbitmq-server.service: control process exited, code=exited status=2
Jan 20 10:18:32 rmq01.nix.mds.xyz systemd[1]: Failed to start RabbitMQ broker.
— Subject: Unit rabbitmq-server.service has failed

note that systemd / epmd has started with PID 1 because we had epmd enabled via systemd.  We should not have: 

[root@rmq01 rabbitmq]# netstat -pnlt
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:111             0.0.0.0:*               LISTEN      1/systemd
tcp        0      0 127.0.0.1:4369          0.0.0.0:*               LISTEN      1/systemd
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1059/sshd
tcp        0      0 127.0.0.1:25            0.0.0.0:*               LISTEN      1311/master
tcp6       0      0 :::111                  :::*                    LISTEN      1/systemd
tcp6       0      0 :::22                   :::*                    LISTEN      1059/sshd
tcp6       0      0 ::1:25                  :::*                    LISTEN      1311/master
[root@rmq01 rabbitmq]#

Disable epmd since rabbitmq-server will start that automatically:

[root@rmq02 audit]# systemctl disable epmd
Removed symlink /etc/systemd/system/multi-user.target.wants/epmd.service.
Removed symlink /etc/systemd/system/sockets.target.wants/epmd.socket.
[root@rmq02 audit]#

You may need to restart after the above.  This is how it should look like after a reboot:

[root@rmq01 ~]# netstat -pnlt
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:25672           0.0.0.0:*               LISTEN      1028/beam.smp
tcp        0      0 0.0.0.0:9000            0.0.0.0:*               LISTEN      1071/haproxy
tcp        0      0 0.0.0.0:111             0.0.0.0:*               LISTEN      1/systemd
tcp        0      0 0.0.0.0:4369            0.0.0.0:*               LISTEN      1135/epmd
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1030/sshd
tcp        0      0 192.168.0.57:15672      0.0.0.0:*               LISTEN      1071/haproxy
tcp        0      0 127.0.0.1:25            0.0.0.0:*               LISTEN      1328/master
tcp6       0      0 :::111                  :::*                    LISTEN      1/systemd
tcp6       0      0 :::4369                 :::*                    LISTEN      1135/epmd
tcp6       0      0 :::22                   :::*                    LISTEN      1030/sshd
tcp6       0      0 ::1:25                  :::*                    LISTEN      1328/master
[root@rmq01 ~]#

 

Enjoy!

REF: https://www.rabbitmq.com/ha.htmlhttp://roboconf.net/en/user-guide/clustered-rabbitmq.htmlhttps://www.rabbitmq.com/networking.html#portshttps://insidethecpu.com/2014/11/17/load-balancing-a-rabbitmq-cluster/ , https://github.com/ansible/awx/issues/574https://github.com/MrMEEE/awx-build/issues/26


Cheers,
TK


     
  Copyright © 2003 - 2013 Tom Kacperski (microdevsys.com). All rights reserved.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License