2 Replies Latest reply on Dec 15, 2014 10:54 AM by dreschler

    Connection to Agents get lost and cannot recover

    dreschler

      Hi,

       

      we observed the following problem at some of our linux machines:

       

      The RHQ server sometimes loses the connection to agents on other machines, which can be seen as the last availabilty ping is quite old (hours or days).

      Agents need to be restarted (or even the server also) to connect to the server again. We did not find any network issues so far and no error logs around the time of the last availability ping.

       

      What could be the reason for this? Did anyone else observe such problem?

        • 1. Re: Connection to Agents get lost and cannot recover
          burmanm

          Which version of RHQ? Also, was the agent actually still alive or only a zombie (I'm not sure which behaviour you describe with the "no error logs around ..") ? Was the agent printing any log lines at all after the last availability time?

          • 2. Re: Connection to Agents get lost and cannot recover
            dreschler

            Sorry for the late answer.

             

            We are using RHQ 4.9, the agent was still printing logs, but we could not find any relevant error log.

             

            In the meantime we set up another servers running RHQ. On this servers (different hardware config) the failure did not occur so far.

            So we are investigating what could be the difference, if you have any hint where to look at, it would be great.