1 2 Previous Next 18 Replies Latest reply on Apr 11, 2013 4:26 AM by acmtix

    RHQ Agent losing connection.

    acmtix

      Hi all,

       

      I have recently setup RHQ to monitor our internal application servers. I am running the latest version of RHQ.

       

      I have installed 5 agents by downloading the agent to each server and then using the rhq-agent.bat file to install the agent. The agent then registers with the server and I can see the agent on the dashboard.

       

      Unfortunately the agent does not update the server and I see a red exclamation mark for availability. The agent does not communicate from this point onwards.

       

      I have checked firewalls and anti-virus and turned them off.

       

      Any ideas?

       

      Eventually I would like to set this up to monitor my JVM's but at the moment it appears I can only inventory certain items on each server. Is there a way to monitor bespoke apps?

        • 1. Re: RHQ Agent losing connection.
          lkrejci

          To investigate your connection issues we'd need your agent logs. Do you see any suspicious error or warning messages in the logs?

           

          There is a way to monitor the JVM of any java application that can be connected to through JMX, so eventually you should be able to achieve your goal.

          • 2. Re: RHQ Agent losing connection.
            pilhuhn

            Do you keep the agent running after the initial setup? Do you import the server + agent?

            After that is done, do you run the agent as a different user (e.g. as a background task)?

            • 3. Re: RHQ Agent losing connection.
              acmtix

              I see a lot of this :

               

              2013-03-19 21:34:51,285 ERROR [RHQ Server Polling Thread] (enterprise.communications.command.client.JBossRemotingRemoteCommunicator)- {JBossRemotingRemoteCommunicator.init-callback-failed}The initialize callback has failed. It will be tried again. Cause: org.jboss.remoting.CannotConnectException:Can not connect http client invoker. Connection refused: connect. -> java.net.ConnectException:Connection refused: connect. Cause: org.jboss.remoting.CannotConnectException: Can not connect http client invoker. Connection refused: connect.

              2013-03-19 21:34:52,708 INFO  [InventoryManager.availability-1] (rhq.core.pc.inventory.AvailabilityExecutor)- Scan Starting: Tue Mar 19 21:34:52 GMT 2013

              2013-03-19 21:34:52,711 INFO  [InventoryManager.availability-1] (rhq.core.pc.inventory.AvailabilityExecutor)- Scan Ended   : Tue Mar 19 21:34:52 GMT 2013 : Scan [startTime=1363728892708, endTime=1363728892711, runtime=3, isFull=false, isForced=false, numResources=47, numGetAvailabilityCalls=11, numScheduledRandomly=0, numPushedByInterval=10, numAvailabilityChanges=0, numDeferToParent=0]

              2013-03-19 21:35:22,713 INFO  [InventoryManager.availability-1] (rhq.core.pc.inventory.AvailabilityExecutor)- Scan Starting: Tue Mar 19 21:35:22 GMT 2013

              2013-03-19 21:35:22,713 INFO  [InventoryManager.availability-1] (rhq.core.pc.inventory.AvailabilityExecutor)- Scan Ended   : Tue Mar 19 21:35:22 GMT 2013 : Scan [startTime=1363728922713, endTime=1363728922713, runtime=0, isFull=false, isForced=false, numResources=47, numGetAvailabilityCalls=2, numScheduledRandomly=0, numPushedByInterval=1, numAvailabilityChanges=0, numDeferToParent=0]

              2013-03-19 21:35:52,716 INFO  [InventoryManager.availability-1] (rhq.core.pc.inventory.AvailabilityExecutor)- Scan Starting: Tue Mar 19 21:35:52 GMT 2013

              2013-03-19 21:35:52,717 INFO  [InventoryManager.availability-1] (rhq.core.pc.inventory.AvailabilityExecutor)- Scan Ended   : Tue Mar 19 21:35:52 GMT 2013 : Scan [startTime=1363728952716, endTime=1363728952717, runtime=1, isFull=false, isForced=false, numResources=47, numGetAvailabilityCalls=9, numScheduledRandomly=0, numPushedByInterval=8, numAvailabilityChanges=0, numDeferToParent=0]

              2013-03-19 21:35:53,295 INFO  [RHQ Server Polling Thread] (enterprise.communications.command.client.JBossRemotingRemoteCommunicator)- {JBossRemotingRemoteCommunicator.changing-endpoint}Communicator is changing endpoint from [InvokerLocator [servlet://172.30.45.71/jboss-remoting-servlet-invoker/ServerInvokerServlet]] to [InvokerLocator [servlet://172.30.45.71/jboss-remoting-servlet-invoker/ServerInvokerServlet]]

              2013-03-19 21:35:55,298 WARN  [RHQ Server Polling Thread] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.failover-failed}Failed to failover to another server. Cause: org.jboss.remoting.CannotConnectException: Can not connect http client invoker. Connection refused: connect.

              2013-03-19 21:35:57,303 WARN  [RHQ Server Polling Thread] (org.rhq.enterprise.agent.FailoverFailureCallback)- {AgentMain.too-many-failover-attempts}Too many failover attempts have been made [1]. Exception that triggered the failover: [org.jboss.remoting.CannotConnectException: Can not connect http client invoker. Connection refused: connect.]

              2013-03-19 21:35:57,303 ERROR [RHQ Server Polling Thread] (enterprise.communications.command.client.JBossRemotingRemoteCommunicator)- {JBossRemotingRemoteCommunicator.init-callback-failed}The initialize callback has failed. It will be tried again. Cause: org.jboss.remoting.CannotConnectException:Can not connect http client invoker. Connection refused: connect. -> java.net.ConnectException:Connection refused: connect. Cause: org.jboss.remoting.CannotConnectException: Can not connect http client invoker. Connection refused: connect.

               

              And some of this :

               

              java.lang.IllegalStateException: The sender object is currently not sending commands now. Command not sent: [Command: type=[remotepojo]; cmd-in-response=[false]; config=[{rhq.timeout=1800000, rhq.send-throttle=true}]; params=[{invocation=NameBasedInvocation[mergeInventoryReport], targetInterfaceName=org.rhq.core.clientapi.server.discovery.DiscoveryServerService}]]

                        at org.rhq.enterprise.communications.command.client.ClientCommandSender.sendSynch(ClientCommandSender.java:631)

                        at org.rhq.enterprise.communications.command.client.ClientRemotePojoFactory$RemotePojoProxyHandler.invoke(ClientRemotePojoFactory.java:407)

                        at $Proxy3.mergeInventoryReport(Unknown Source)

                        at org.rhq.core.pc.inventory.InventoryManager.handleReport(InventoryManager.java:1047)

                        at org.rhq.core.pc.inventory.AutoDiscoveryExecutor.call(AutoDiscoveryExecutor.java:129)

                        at org.rhq.core.pc.inventory.AutoDiscoveryExecutor.run(AutoDiscoveryExecutor.java:91)

                        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)

                        at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)

                        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)

                        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)

                        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)

                        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)

                        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

                        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

                        at java.lang.Thread.run(Thread.java:662)

               

              And more of this :

               

              2013-03-20 17:11:58,385 INFO  [InventoryManager.discovery-1] (rhq.core.pc.inventory.AutoDiscoveryExecutor)- Executing server discovery scan...

              2013-03-20 17:11:58,559 INFO  [InventoryManager.discovery-1] (rhq.core.pc.inventory.AutoDiscoveryExecutor)- Discovered 0 new server(s).

              2013-03-20 17:11:58,559 INFO  [InventoryManager.discovery-1] (rhq.core.pc.inventory.InventoryManager)- Sending [server] inventory report to Server...

              2013-03-20 17:11:59,839 INFO  [InventoryManager.discovery-1] (rhq.core.pc.inventory.InventoryManager)- Syncing local inventory with Server inventory...

              2013-03-20 17:12:11,710 INFO  [MeasurementManager.sender-1] (rhq.core.pc.measurement.MeasurementSenderRunner)- Measurement collection for [12] metrics took 16ms - sending report to Server...

              2013-03-20 17:12:19,278 INFO  [InventoryManager.availability-1] (rhq.core.pc.inventory.AvailabilityExecutor)- Scan Starting: Wed Mar 20 17:12:19 GMT 2013

              2013-03-20 17:12:19,280 INFO  [InventoryManager.availability-1] (rhq.core.pc.inventory.AvailabilityExecutor)- Scan Ended   : Wed Mar 20 17:12:19 GMT 2013 : Scan [startTime=1363799539278, endTime=1363799539280, runtime=2, isFull=false, isForced=false, numResources=47, numGetAvailabilityCalls=5, numScheduledRandomly=0, numPushedByInterval=4, numAvailabilityChanges=0, numDeferToParent=0]

              2013-03-20 17:12:49,283 INFO  [InventoryManager.availability-1] (rhq.core.pc.inventory.AvailabilityExecutor)- Scan Starting: Wed Mar 20 17:12:49 GMT 2013

              2013-03-20 17:12:49,284 INFO  [InventoryManager.availability-1] (rhq.core.pc.inventory.AvailabilityExecutor)- Scan Ended   : Wed Mar 20 17:12:49 GMT 2013 : Scan [startTime=1363799569283, endTime=1363799569283, runtime=0, isFull=false, isForced=false, numResources=47, numGetAvailabilityCalls=4, numScheduledRandomly=0, numPushedByInterval=3, numAvailabilityChanges=0, numDeferToParent=0]

              2013-03-20 17:13:19,286 INFO  [InventoryManager.availability-1] (rhq.core.pc.inventory.AvailabilityExecutor)- Scan Starting: Wed Mar 20 17:13:19 GMT 2013

              2013-03-20 17:13:19,292 INFO  [InventoryManager.availability-1] (rhq.core.pc.inventory.AvailabilityExecutor)- Scan Ended   : Wed Mar 20 17:13:19 GMT 2013 : Scan [startTime=1363799599286, endTime=1363799599292, runtime=6, isFull=false, isForced=false, numResources=47, numGetAvailabilityCalls=7, numScheduledRandomly=0, numPushedByInterval=6, numAvailabilityChanges=0, numDeferToParent=0]

               

              Then after this point it stops working.

              • 4. Re: RHQ Agent losing connection.
                acmtix

                The agent is left permanently running in a cmd window. I have imported both server + agent. I have also tried running the agent again as a different user and this makes no difference. The agent seems to work for a bit and then is no longer available.

                • 5. Re: RHQ Agent losing connection.
                  lkrejci

                  Ok, so it seems that the agent fails to connect to the server.

                   

                  When you configure the agent, the following occurs:

                   

                  1) during setup, you provide connection details to the server.

                  2) The agent downloads the list of the servers from this "initial" connection and immediately switches to its "primary" server as defined in the "failover list" that it downloads using the initial connection.

                   

                  So one thing you can try is to look in Administration->Servers and look at the servers mentioned in there. The "endpoint" and "port" adresses form the failover list the agent tries to connect to. If those URLs are unreachable from the agents, they won't be able to work, even though your initial connection details might have been correct.

                  • 6. Re: RHQ Agent losing connection.
                    acmtix

                    Yes Lukas, that is right.

                     

                    I have used telnet to communicate to the agent port and the server port and both are fine.

                     

                    All communication ports seem to be fine, its just that the agent gives up and becomes unavailable to the server. This happens on all five agents I have installed.

                    • 7. Re: RHQ Agent losing connection.
                      mazz

                      This is usually due to the server being configured with a public endpoint that is not accessible via the agent machines.

                       

                      Please read this section - the yellow box is important.

                       

                      https://docs.jboss.org/author/display/RHQ/High+Availability#HighAvailability-FailoverLists

                      • 8. Re: RHQ Agent losing connection.
                        mazz
                        • 9. Re: RHQ Agent losing connection.
                          acmtix

                          Hi John,

                           

                          I had this problem initially so I moved the server so that it is on the same network as the servers to be monitored. They can all communicate via internal IP addresses. I reinstalled RHQ at this point.

                           

                          The agents still do not communicate properly with the server.

                          • 10. Re: RHQ Agent losing connection.
                            pilhuhn

                            Could that be DNS issues, where e.g. the IP 1.2.3.4 gives on reverse lookup server.frobnitz.com, but looking up server.frobnitz.com gives 2.3.4.5 ?

                            • 11. Re: RHQ Agent losing connection.
                              acmtix

                              Reverse DNS isn't an issue because they use internal IP addressing. Thanks anyway.

                              • 12. Re: RHQ Agent losing connection.
                                pilhuhn

                                Is there a configured server on servlet://172.30.45.71/   (port 7080 )  ?   Can you telnet into this from the agent machines ?

                                The logs list "connection refused", so either the IP is wrong and we have to find out why or a firewall prevents access.

                                Did you perhaps disable http (port 7080) on the server for UI usage and did not tell the agents to use the https/ssl port 7443 as well?


                                • 13. Re: RHQ Agent losing connection.
                                  acmtix

                                  Thanks for your help.

                                   

                                  There is no firewall installed on these servers. They are in the same subnet.

                                   

                                  There is a configured RHQ server on :

                                  172.30.45.79

                                   

                                  and the agent is installed on :

                                  172.30.45.71

                                   

                                  Telnet can communicate between the two servers.

                                  I have not disabled port 7080 and have looked to see if antivirus or any other kind of firewall is blocking ports. Everything seems to be fine and all ports are open. The agents just stop communicating.

                                   

                                  I am using the --cleanconfig switch when I run the agents.

                                   

                                  I am very confused to why this isn't working?

                                  • 14. Re: RHQ Agent losing connection.
                                    pilhuhn

                                    Can you please paste in what you provide when you start the agent with --clean?

                                     

                                    I am confused here as well, as what you describe all sounds logical.

                                     

                                    With respect to

                                     

                                      Reverse DNS isn't an issue because they use internal IP addressing.

                                     

                                    Can you please check that there is no (wrong) mapping for the two IPs you gave above in all of the /etc/hosts files?

                                     

                                    Do you run one RHQ server or multiple? In the case of one RHQ server only, there should be no need for failover anyway.

                                    1 2 Previous Next