Cloudera Manager Installation Issues
Cloudera Manager Installation Issues
When getting the following errors below on the Cloudera Manager Installation on RHEL 7.2+, try both a Date and Time Configuration and stopping then starting the agent service with these commands:
systemctl stop cloudera-scm-agent
systemctl start cloudera-scm-agent
If you get the following messages. In the absence of a DNS server, also check and configure the /etc/hosts for the hostname and IP similar to this:
123.123.123.123 host.domain.xyz host
Or if you are using DD-WRT use records such as this:
address=/host1/192.168.0.165
address=/host1.mds.xyz/192.168.0.165
ptr-record=165.0.168.192.in-addr.arpa,"mds-host1.mds.xyz"
ptr-record=165.0.168.192.in-addr.arpa,"mds-host1"
Exact error message received:
Installation failed. Failed to receive heartbeat from agent.
- Ensure that the host's hostname is configured properly.
- Ensure that port 7182 is accessible on the Cloudera Manager Server (check firewall rules).
- Ensure that ports 9000 and 9001 are not in use on the host being added.
- Check agent logs in /var/log/cloudera-scm-agent/ on the host being added. (Some of the logs can be found in the installation details).
-
If Use TLS Encryption for Agents is enabled in Cloudera Manager (Administration -> Settings -> Security), ensure that
/etc/cloudera-scm-agent/config.ini
hasuse_tls=1
on the host being added. Restart the corresponding agent and click the Retry link here.
>>[14/Sep/2016 15:24:13 +0000] 16330 Dummy-14 agent ERROR Failed to kill process with pid 16358
OSError: [Errno 3] No such process
>>[14/Sep/2016 15:24:13 +0000] 16330 Dummy-14 agent ERROR Shutdown callback failed.
>>OSError: [Errno 9] Bad file descriptor
>>[14/Sep/2016 15:24:13 +0000] 16330 Dummy-14 agent ERROR Shutdown callback failed.
KeyError: 15
The proper response given is:
# nslookup 192.168.0.165
Server: 192.168.0.1
Address: 192.168.0.1#53
165.0.168.192.in-addr.arpa name = mds-host1.
165.0.168.192.in-addr.arpa name = mds-host1.mds.xyz.
#
Once installation progressed, we got these messages:
Transparent Huge Page Compaction is enabled and can cause significant performance problems. Run "echo never > /sys/kernel/mm/transparent_hugepage/defrag" to disable this, then add the same command to an init script such as /etc/rc.local so it will be set upon system reboot. The following hosts are affected:
mds-host05; mds-host[01-04]
Cloudera recommends setting /proc/sys/vm/swappiness to a maximum of 10. Current setting is 30. Use the sysctl command to change this setting at run time and edit /etc/sysctl.conf for this setting to be saved after a reboot. You can continue with installation, but Cloudera Manager might report that your hosts are unhealthy because they are swapping. The following hosts are affected:
mds-host05; mds-host[01-04]
The following failures were observed in checking hostnames…
Host mds-host01 expected to have name mds-host01 but resolved (InetAddress.getLocalHost().getHostName()) itself to mds-host01.mds.xyz.
Resolve these accordingly to above instructions and continue. To set the swappiness run the following:
sysctl -w vm.swappiness=10
And continue with the installation. If you get this error:
/usr/lib64/cmf/service/zookeeper/zkserver.sh: line 41: /var/lib/zookeeper/myid: Permission denied
Supervisor returned FATAL. Please check the role log file, stderr, or stdout. Completed only 0/1 steps. First failure: Command (85) has failed Failed to start role. Completed only 1/2 steps. First failure: Failed to execute command Start on service ZooKeeper
Simply change permissions like this:
101947599 d———. 2 root root 6 Sep 14 23:52 /var/lib/zookeeper
chmod 755 /var/lib/zookeeper
101947599 drwxr-xr-x. 2 root root 6 Sep 14 23:52 /var/lib/zookeeper
And continue the install. Possibly /usr/lib64/cmf/service/zookeeper/zkserver.sh might not be setting permissions correctly. Also set the folder to zookeeper.zookeeper as well:
chmod 755 /var/lib/zookeeper; chown zookeeper.zookeeper /var/lib/zookeeper; ls -altrid /var/lib/zookeeper
And continue. Also set the ACL's on the folder:
# setfacl -m "u:zookeeper:rwx,g:zookeeper:rwx" /var/lib/zookeeper/
# getfacl zookeeper
# file: zookeeper
# owner: zookeeper
# group: zookeeper
user::rwx
user:zookeeper:rwx
group::rwx
group:zookeeper:rwx
mask::rwx
other::rwx
#
Yet still did not work. Digging deeper we see:
[root@mds-host01 zookeeper]# pwd
/var/log/zookeeper
[root@mds-host01 zookeeper]# tail -f zookeeper-cmf-zookeeper-SERVER-mds-host01.log -n 10
2016-09-15 01:48:59,395 INFO org.apache.zookeeper.server.ZooKeeperServer: Server environment:user.home=/var/lib/zookeeper
2016-09-15 01:48:59,396 INFO org.apache.zookeeper.server.ZooKeeperServer: Server environment:user.dir=/run/cloudera-scm-agent/process/31-zookeeper-server
2016-09-15 01:48:59,396 ERROR org.apache.zookeeper.server.ZooKeeperServerMain: Unable to access datadir, exiting abnormally
org.apache.zookeeper.server.persistence.FileTxnSnapLog$DatadirException: Missing data directory /var/lib/zookeeper/version-2, automatic data directory creation is disabled (zookeeper.datadir.autocreate is false). Please create this directory manually.
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.<init>(FileTxnSnapLog.java:102)
at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:109)
at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:91)
at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:53)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:121)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:79)
So let's create it manually and retry the installation. So command was:
chmod 755 /var/lib/zookeeper; chown zookeeper.zookeeper /var/lib/zookeeper; ls -altrid /var/lib/zookeeper; setfacl -m "u:zookeeper:rwx,g:zookeeper:rwx" /var/lib/zookeeper/; getfacl /var/lib/zookeeper/; mkdir /var/lib/zookeeper/version-2; chown zookeeper.zookeeper /var/lib/zookeeper/version-2;
This time above fixed the issue. Retry and continue with installation.
Cheers,
TK
Use of /etc/rc.local is depreciated under RHEL7. Some other mechanism needs to be used to persistently set "/sys/kernel/mm/transparent_hugepage/defrag" to "never".