Search This Blog

Tuesday, April 8, 2014

Cloudera SCM Agent Error

"This host had been out of contact with Cloudera Manager for too long. The host's Cloudera Manager agent's software version could not be determined."

Today I saw this error pop up on the CM4 hosts monitor.  Running /etc/init.d/cloudera-scm-agent status only confirmed that the agent was running.  However I needed to review the logs to find the error.

The log for the agent is located at /var/log/cloudera-scm-agent/cloudera-scm-agent.log

The error reported looked like this:

[08/Apr/2014 15:58:09 +0000] 1228 MainThread agent        ERROR    Heartbeating to failed.
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/src/cmf/", line 741, in send_heartbeat
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/", line 471, in __init__
  File "/usr/lib64/python2.6/", line 720, in connect
  File "/usr/lib64/python2.6/", line 553, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
gaierror: [Errno -5] No address associated with hostname

The problem was, that when the system rebooted, the file /etc/cloudera-scm-agent/config.ini was modified:

# Hostname of Cloudera SCM Server

The DNS server had an old host name entry for the IP address my Cloudera SCM Server was now using.  When the system restarted the agent, I believe a DNS lookup was performed using the IP and resolved the old host name.  My cluster uses /etc/hosts files to maintain name resolution, so I'm not 100% sure yet why this happened, but I speculate it is a result of the socket library in python, used by the cloudera SCM agent.

Resolved by changing the server_host value back to the host with the SCM server running on it.  Then restarted the cloudera-scm-agent service.

No comments:

Post a Comment