ERROR (10 skipped) Error sending messages to firehose (retry): mgmt-HOSTMONITOR
Getting this?
[24/May/2020 23:08:13 +0000] 5385 MonitorDaemon-Reporter throttling_logger ERROR (10 skipped) Error sending messages to firehose (retry): mgmt-HOSTMONITOR-a6c8a202b717eae93da5e0a53f184c3a
Traceback (most recent call last):
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/monitor/firehose.py", line 125, in _send
self._requestor.request('sendAgentMessages', dict(messages=UNICODE_SANITIZER(messages)))
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 141, in request
return self.issue_request(call_request, message_name, request_datum)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 254, in issue_request
call_response = self.transceiver.transceive(call_request)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 483, in transceive
result = self.read_framed_message()
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 487, in read_framed_message
response = self.conn.getresponse()
File "/usr/lib64/python2.7/httplib.py", line 1113, in getresponse
response.begin()
File "/usr/lib64/python2.7/httplib.py", line 444, in begin
version, status, reason = self._read_status()
File "/usr/lib64/python2.7/httplib.py", line 408, in _read_status
raise BadStatusLine(line)
BadStatusLine: ''
modify the line slightly to see exactly what host or port it's trying:
try:
if self._requestor is None:
self._transceiver = avro.ipc.HTTPTransceiver(self._address,
self._port)
self._requestor = avro.ipc.Requestor(FIREHOSE_MESSAGE_PROTOCOL,
self._transceiver)
initial_requestor_bytes = self._requestor.get_requestor_bytes_sent()
self._requestor.request('sendAgentMessages', dict(messages=UNICODE_SANITIZER(messages)))
self._last_message_transmit_duration_gauge.set_value(
(time.time() – start) * 1000)
self._message_transmit_succeeded_counter.increment()
self._requestor_bytes_sent.increment(
self._requestor.get_requestor_bytes_sent() – initial_requestor_bytes)
return True
except BadStatusLine, ex:
# We've lost our connection. In practice this usually means the server has
# closed a connection that we expect to be open because of HTTP keep-alive.
# We will do a single silent retry. If the problem persistest there, we'll
# log.
self._reset()
if retryOnBadStatusLine:
return self._send(messages, retryOnBadStatusLine=False)
self._message_transmit_failed_counter.increment()
# THROTTLED_LOG.exception("Error sending messages to firehose (retry): " +
# self.name)
THROTTLED_LOG.exception("Error sending messages to firehose (retry): %s . Address: %s . Port: %s" % ( self.name, self._address, self._port ))
return False
except Exception:
THROTTLED_LOG.exception("Error sending messages to firehose: " + self.name)
self._reset()
self._message_transmit_failed_counter.increment()
return False
Now when you start things up, you'll get some more meaningfull messages:
[24/May/2020 23:26:07 +0000] 6934 MonitorDaemon-Reporter firehoses INFO Creating a connection to the HOSTMONITOR.
[24/May/2020 23:26:08 +0000] 6934 MonitorDaemon-Reporter throttling_logger ERROR Error sending messages to firehose (retry): mgmt-HOSTMONITOR-a6c8a202b717eae93da5e0a53f184c3a . Address: cm-r01en02.mws.mds.xyz . Port: 9995
Traceback (most recent call last):
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/cmf/monitor/firehose.py", line 125, in _send
self._requestor.request('sendAgentMessages', dict(messages=UNICODE_SANITIZER(messages)))
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 141, in request
return self.issue_request(call_request, message_name, request_datum)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 254, in issue_request
call_response = self.transceiver.transceive(call_request)
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 483, in transceive
result = self.read_framed_message()
File "/opt/cloudera/cm-agent/lib/python2.7/site-packages/avro/ipc.py", line 487, in read_framed_message
response = self.conn.getresponse()
File "/usr/lib64/python2.7/httplib.py", line 1113, in getresponse
response.begin()
File "/usr/lib64/python2.7/httplib.py", line 444, in begin
version, status, reason = self._read_status()
File "/usr/lib64/python2.7/httplib.py", line 408, in _read_status
raise BadStatusLine(line)
BadStatusLine: ''
^C
[root@cm-awn01 pki]# nc -vz cm-r01en02.mws.mds.xyz 9995
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 108.168.115.113:9995.
Ncat: 0 bytes sent, 0 bytes received in 0.05 seconds.
[root@cm-awn01 pki]#
Notice the text in blue above. Keeping it in mind, consider this Haproxy configuration:
listen cm9995
log 127.0.0.1:514 local0 debug
bind srv-c01:9995
mode tcp
option tcplog
server cm-r01en01.mws.mds.xyz cm-r01en01.mws.mds.xyz check
server cm-r01en02.mws.mds.xyz cm-r01en02.mws.mds.xyz check
Notice that we have TCP in the HAproxy but perhaps CMA expects HTTP? Try setting it to HTTP: