-
1. Re: RHQ Agent losing connection.
lkrejci Apr 4, 2013 6:42 AM (in response to acmtix)To investigate your connection issues we'd need your agent logs. Do you see any suspicious error or warning messages in the logs?
There is a way to monitor the JVM of any java application that can be connected to through JMX, so eventually you should be able to achieve your goal.
-
2. Re: RHQ Agent losing connection.
pilhuhn Apr 4, 2013 7:08 AM (in response to acmtix)Do you keep the agent running after the initial setup? Do you import the server + agent?
After that is done, do you run the agent as a different user (e.g. as a background task)?
-
3. Re: RHQ Agent losing connection.
acmtix Apr 4, 2013 7:09 AM (in response to lkrejci)I see a lot of this :
2013-03-19 21:34:51,285 ERROR [RHQ Server Polling Thread] (enterprise.communications.command.client.JBossRemotingRemoteCommunicator)- {JBossRemotingRemoteCommunicator.init-callback-failed}The initialize callback has failed. It will be tried again. Cause: org.jboss.remoting.CannotConnectException:Can not connect http client invoker. Connection refused: connect. -> java.net.ConnectException:Connection refused: connect. Cause: org.jboss.remoting.CannotConnectException: Can not connect http client invoker. Connection refused: connect.
2013-03-19 21:34:52,708 INFO [InventoryManager.availability-1] (rhq.core.pc.inventory.AvailabilityExecutor)- Scan Starting: Tue Mar 19 21:34:52 GMT 2013
2013-03-19 21:34:52,711 INFO [InventoryManager.availability-1] (rhq.core.pc.inventory.AvailabilityExecutor)- Scan Ended : Tue Mar 19 21:34:52 GMT 2013 : Scan [startTime=1363728892708, endTime=1363728892711, runtime=3, isFull=false, isForced=false, numResources=47, numGetAvailabilityCalls=11, numScheduledRandomly=0, numPushedByInterval=10, numAvailabilityChanges=0, numDeferToParent=0]
2013-03-19 21:35:22,713 INFO [InventoryManager.availability-1] (rhq.core.pc.inventory.AvailabilityExecutor)- Scan Starting: Tue Mar 19 21:35:22 GMT 2013
2013-03-19 21:35:22,713 INFO [InventoryManager.availability-1] (rhq.core.pc.inventory.AvailabilityExecutor)- Scan Ended : Tue Mar 19 21:35:22 GMT 2013 : Scan [startTime=1363728922713, endTime=1363728922713, runtime=0, isFull=false, isForced=false, numResources=47, numGetAvailabilityCalls=2, numScheduledRandomly=0, numPushedByInterval=1, numAvailabilityChanges=0, numDeferToParent=0]
2013-03-19 21:35:52,716 INFO [InventoryManager.availability-1] (rhq.core.pc.inventory.AvailabilityExecutor)- Scan Starting: Tue Mar 19 21:35:52 GMT 2013
2013-03-19 21:35:52,717 INFO [InventoryManager.availability-1] (rhq.core.pc.inventory.AvailabilityExecutor)- Scan Ended : Tue Mar 19 21:35:52 GMT 2013 : Scan [startTime=1363728952716, endTime=1363728952717, runtime=1, isFull=false, isForced=false, numResources=47, numGetAvailabilityCalls=9, numScheduledRandomly=0, numPushedByInterval=8, numAvailabilityChanges=0, numDeferToParent=0]
2013-03-19 21:35:53,295 INFO [RHQ Server Polling Thread] (enterprise.communications.command.client.JBossRemotingRemoteCommunicator)- {JBossRemotingRemoteCommunicator.changing-endpoint}Communicator is changing endpoint from [InvokerLocator [servlet://172.30.45.71/jboss-remoting-servlet-invoker/ServerInvokerServlet]] to [InvokerLocator [servlet://172.30.45.71/jboss-remoting-servlet-invoker/ServerInvokerServlet]]
2013-03-19 21:35:55,298 WARN [RHQ Server Polling Thread] (org.rhq.enterprise.agent.AgentMain)- {AgentMain.failover-failed}Failed to failover to another server. Cause: org.jboss.remoting.CannotConnectException: Can not connect http client invoker. Connection refused: connect.
2013-03-19 21:35:57,303 WARN [RHQ Server Polling Thread] (org.rhq.enterprise.agent.FailoverFailureCallback)- {AgentMain.too-many-failover-attempts}Too many failover attempts have been made [1]. Exception that triggered the failover: [org.jboss.remoting.CannotConnectException: Can not connect http client invoker. Connection refused: connect.]
2013-03-19 21:35:57,303 ERROR [RHQ Server Polling Thread] (enterprise.communications.command.client.JBossRemotingRemoteCommunicator)- {JBossRemotingRemoteCommunicator.init-callback-failed}The initialize callback has failed. It will be tried again. Cause: org.jboss.remoting.CannotConnectException:Can not connect http client invoker. Connection refused: connect. -> java.net.ConnectException:Connection refused: connect. Cause: org.jboss.remoting.CannotConnectException: Can not connect http client invoker. Connection refused: connect.
And some of this :
java.lang.IllegalStateException: The sender object is currently not sending commands now. Command not sent: [Command: type=[remotepojo]; cmd-in-response=[false]; config=[{rhq.timeout=1800000, rhq.send-throttle=true}]; params=[{invocation=NameBasedInvocation[mergeInventoryReport], targetInterfaceName=org.rhq.core.clientapi.server.discovery.DiscoveryServerService}]]
at org.rhq.enterprise.communications.command.client.ClientCommandSender.sendSynch(ClientCommandSender.java:631)
at org.rhq.enterprise.communications.command.client.ClientRemotePojoFactory$RemotePojoProxyHandler.invoke(ClientRemotePojoFactory.java:407)
at $Proxy3.mergeInventoryReport(Unknown Source)
at org.rhq.core.pc.inventory.InventoryManager.handleReport(InventoryManager.java:1047)
at org.rhq.core.pc.inventory.AutoDiscoveryExecutor.call(AutoDiscoveryExecutor.java:129)
at org.rhq.core.pc.inventory.AutoDiscoveryExecutor.run(AutoDiscoveryExecutor.java:91)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
And more of this :
2013-03-20 17:11:58,385 INFO [InventoryManager.discovery-1] (rhq.core.pc.inventory.AutoDiscoveryExecutor)- Executing server discovery scan...
2013-03-20 17:11:58,559 INFO [InventoryManager.discovery-1] (rhq.core.pc.inventory.AutoDiscoveryExecutor)- Discovered 0 new server(s).
2013-03-20 17:11:58,559 INFO [InventoryManager.discovery-1] (rhq.core.pc.inventory.InventoryManager)- Sending [server] inventory report to Server...
2013-03-20 17:11:59,839 INFO [InventoryManager.discovery-1] (rhq.core.pc.inventory.InventoryManager)- Syncing local inventory with Server inventory...
2013-03-20 17:12:11,710 INFO [MeasurementManager.sender-1] (rhq.core.pc.measurement.MeasurementSenderRunner)- Measurement collection for [12] metrics took 16ms - sending report to Server...
2013-03-20 17:12:19,278 INFO [InventoryManager.availability-1] (rhq.core.pc.inventory.AvailabilityExecutor)- Scan Starting: Wed Mar 20 17:12:19 GMT 2013
2013-03-20 17:12:19,280 INFO [InventoryManager.availability-1] (rhq.core.pc.inventory.AvailabilityExecutor)- Scan Ended : Wed Mar 20 17:12:19 GMT 2013 : Scan [startTime=1363799539278, endTime=1363799539280, runtime=2, isFull=false, isForced=false, numResources=47, numGetAvailabilityCalls=5, numScheduledRandomly=0, numPushedByInterval=4, numAvailabilityChanges=0, numDeferToParent=0]
2013-03-20 17:12:49,283 INFO [InventoryManager.availability-1] (rhq.core.pc.inventory.AvailabilityExecutor)- Scan Starting: Wed Mar 20 17:12:49 GMT 2013
2013-03-20 17:12:49,284 INFO [InventoryManager.availability-1] (rhq.core.pc.inventory.AvailabilityExecutor)- Scan Ended : Wed Mar 20 17:12:49 GMT 2013 : Scan [startTime=1363799569283, endTime=1363799569283, runtime=0, isFull=false, isForced=false, numResources=47, numGetAvailabilityCalls=4, numScheduledRandomly=0, numPushedByInterval=3, numAvailabilityChanges=0, numDeferToParent=0]
2013-03-20 17:13:19,286 INFO [InventoryManager.availability-1] (rhq.core.pc.inventory.AvailabilityExecutor)- Scan Starting: Wed Mar 20 17:13:19 GMT 2013
2013-03-20 17:13:19,292 INFO [InventoryManager.availability-1] (rhq.core.pc.inventory.AvailabilityExecutor)- Scan Ended : Wed Mar 20 17:13:19 GMT 2013 : Scan [startTime=1363799599286, endTime=1363799599292, runtime=6, isFull=false, isForced=false, numResources=47, numGetAvailabilityCalls=7, numScheduledRandomly=0, numPushedByInterval=6, numAvailabilityChanges=0, numDeferToParent=0]
Then after this point it stops working.
-
4. Re: RHQ Agent losing connection.
acmtix Apr 4, 2013 7:24 AM (in response to pilhuhn)The agent is left permanently running in a cmd window. I have imported both server + agent. I have also tried running the agent again as a different user and this makes no difference. The agent seems to work for a bit and then is no longer available.
-
5. Re: RHQ Agent losing connection.
lkrejci Apr 4, 2013 8:43 AM (in response to acmtix)Ok, so it seems that the agent fails to connect to the server.
When you configure the agent, the following occurs:
1) during setup, you provide connection details to the server.
2) The agent downloads the list of the servers from this "initial" connection and immediately switches to its "primary" server as defined in the "failover list" that it downloads using the initial connection.
So one thing you can try is to look in Administration->Servers and look at the servers mentioned in there. The "endpoint" and "port" adresses form the failover list the agent tries to connect to. If those URLs are unreachable from the agents, they won't be able to work, even though your initial connection details might have been correct.
-
6. Re: RHQ Agent losing connection.
acmtix Apr 4, 2013 9:18 AM (in response to lkrejci)Yes Lukas, that is right.
I have used telnet to communicate to the agent port and the server port and both are fine.
All communication ports seem to be fine, its just that the agent gives up and becomes unavailable to the server. This happens on all five agents I have installed.
-
7. Re: RHQ Agent losing connection.
mazz Apr 4, 2013 9:34 AM (in response to acmtix)This is usually due to the server being configured with a public endpoint that is not accessible via the agent machines.
Please read this section - the yellow box is important.
https://docs.jboss.org/author/display/RHQ/High+Availability#HighAvailability-FailoverLists
-
8. Re: RHQ Agent losing connection.
mazz Apr 4, 2013 9:36 AM (in response to acmtix)Oh, also, there is an FAQ that talks about this exact problem:
-
9. Re: RHQ Agent losing connection.
acmtix Apr 4, 2013 9:56 AM (in response to mazz)Hi John,
I had this problem initially so I moved the server so that it is on the same network as the servers to be monitored. They can all communicate via internal IP addresses. I reinstalled RHQ at this point.
The agents still do not communicate properly with the server.
-
10. Re: RHQ Agent losing connection.
pilhuhn Apr 4, 2013 9:58 AM (in response to acmtix)Could that be DNS issues, where e.g. the IP 1.2.3.4 gives on reverse lookup server.frobnitz.com, but looking up server.frobnitz.com gives 2.3.4.5 ?
-
11. Re: RHQ Agent losing connection.
acmtix Apr 4, 2013 10:32 AM (in response to pilhuhn)Reverse DNS isn't an issue because they use internal IP addressing. Thanks anyway.
-
12. Re: RHQ Agent losing connection.
pilhuhn Apr 10, 2013 9:03 AM (in response to acmtix)Is there a configured server on servlet://172.30.45.71/ (port 7080 ) ? Can you telnet into this from the agent machines ?
The logs list "connection refused", so either the IP is wrong and we have to find out why or a firewall prevents access.
Did you perhaps disable http (port 7080) on the server for UI usage and did not tell the agents to use the https/ssl port 7443 as well?
-
13. Re: RHQ Agent losing connection.
acmtix Apr 10, 2013 9:09 AM (in response to pilhuhn)Thanks for your help.
There is no firewall installed on these servers. They are in the same subnet.
There is a configured RHQ server on :
172.30.45.79 and the agent is installed on :
172.30.45.71 Telnet can communicate between the two servers.
I have not disabled port 7080 and have looked to see if antivirus or any other kind of firewall is blocking ports. Everything seems to be fine and all ports are open. The agents just stop communicating.
I am using the --cleanconfig switch when I run the agents.
I am very confused to why this isn't working?
-
14. Re: RHQ Agent losing connection.
pilhuhn Apr 10, 2013 9:19 AM (in response to acmtix)Can you please paste in what you provide when you start the agent with --clean?
I am confused here as well, as what you describe all sounds logical.
With respect to
Reverse DNS isn't an issue because they use internal IP addressing.
Can you please check that there is no (wrong) mapping for the two IPs you gave above in all of the /etc/hosts files?
Do you run one RHQ server or multiple? In the case of one RHQ server only, there should be no need for failover anyway.