-
1. Re: Bug 1017961, which has to do with MBeans appearing down
tsegismont Nov 4, 2013 10:08 AM (in response to genman)Hi Elias,
I finally found a way to reproduce your issue.
I tried with a Tomcat server in inventory, and Tomcat needs to be back up *BEFORE* the agent could notice the "UP -> DOWN" availability change. After some time (depending on how the availability scan interval is configured), the nested resources come back in UP state.
Can you confirm?
I think it happens when:
* the managed server goes down
* an availability check is executed for a nested resource
* an availability check is executed for the top server resource
How far have you got with your patch? You can attach it (even untested) to the bug report.
Thanks for tracking this. Nice catch!
Thomas
-
2. Re: Bug 1017961, which has to do with MBeans appearing down
genman Nov 4, 2013 10:51 AM (in response to tsegismont)Thomas Segismont wrote:
Hi Elias,
I finally found a way to reproduce your issue.
I tried with a Tomcat server in inventory, and Tomcat needs to be back up *BEFORE* the agent could notice the "UP -> DOWN" availability change. After some time (depending on how the availability scan interval is configured), the nested resources come back in UP state.
Can you confirm?
I think it happens when:
* the managed server goes down
* an availability check is executed for a nested resource
* an availability check is executed for the top server resource
How far have you got with your patch? You can attach it (even untested) to the bug report.
Thanks for tracking this. Nice catch!
Thomas
Yes, the agent has to be up, the managed server restarted, and some random ordering. I created a test harness but it couldn't reproduce the problem on OS X. I think you are right about the ordering issue. I didn't realize the ordering of the checks mattered.
I want to test the patch for a bit. I also need to get it looked at by my company, which takes a few weeks sometimes.
The one downside with my patch is the EmsBeans are not cached and looked up each time. (They are cached anyway by the server, just the Map lookup is done each check.) I don't think the caching is worth the extra risk.
-
3. Re: Bug 1017961, which has to do with MBeans appearing down
tsegismont Nov 4, 2013 11:07 AM (in response to genman)Elias,
Thanks for your answer. I think we can keep caching if we refresh the EmsConnection when the availability check fails. I'll push a fix soon and will tell you about it.
Cheers,
Thomas
-
4. Re: Bug 1017961, which has to do with MBeans appearing down
genman Nov 4, 2013 2:47 PM (in response to tsegismont)I tried doing a 'refresh' on EmsConnection--it wasn't working right. It also has the unintended side-effect of tossing out your entire cache if one MBean really is gone, so the performance is likely worse than before in many circumstances.
Reviewing the code, there is a lot of cruft that needs to be gotten rid of.
Simpuru izu besto, as the Japanese would say.
-
5. Re: Bug 1017961, which has to do with MBeans appearing down
tsegismont Nov 12, 2013 8:29 AM (in response to genman)Hi Elias,
The issue I found with Tomcat turned out to be something unrelated. I created another BZ to track it:
Bug 1029373 - "Tomcat Web Application (WAR)" components stay down when server comes back up
So I'm stuck again because I can't reproduce your issue with Storage Node. You talked about Flume as well. How do you monitor it? A custom plugin based on JMX plugin?
Regards,
Thomas
-
6. Re: Bug 1017961, which has to do with MBeans appearing down
genman Nov 12, 2013 12:21 PM (in response to tsegismont)Flume uses the JMX plugin as base, yes. I've seen the same issue with any component that uses the JMX plugin, including the storage nodes as you can see in the original post.
As for Tomcat, my fix works for the Tomcat HTTP connector, which was appearing down. As for the .war component appearing down, that may be a separate issue like you say.
My issue 1017961 may be a problem specific to the version of Linux distro (EL6) or JVM (1.6.0_38) I'm using. Still, my fix works well and improves the code.
I hope you also consider https://bugzilla.redhat.com/show_bug.cgi?id=971615 as well. I'm getting tired of having to port my fixes to each subsequent RHQ release.
-
7. Re: Bug 1017961, which has to do with MBeans appearing down
tsegismont Nov 12, 2013 12:43 PM (in response to genman)Flume uses the JMX plugin as base, yes. I've seen the same issue with any component that uses the JMX plugin, including the storage nodes as you can see in the original post.
As for Tomcat, my fix works for the Tomcat HTTP connector, which was appearing down. As for the .war component appearing down, that may be a separate issue like you say.
My issue 1017961 may be a problem specific to the version of Linux distro (EL6) or JVM (1.6.0_38) I'm using. Still, my fix works well and improves the code.
I understand your fix works but I need to be able to reproduce the problem. That's why I was asking about Flume. Are you using the Oracle VM or OpenJDK? I will try again with Java 6.
I hope you also consider https://bugzilla.redhat.com/show_bug.cgi?id=971615 as well. I'm getting tired of having to port my fixes to each subsequent RHQ release.
Mazz has recently re-targeted BZ971615 to RHQ4.10. I think there are good chances to see it fixed in the next RHQ release.
I hope you will keep on reporting issues and providing patches as you already do. You're a great contributor to our community and we would all be sad to see you move away.
Thomas
-
8. Re: Bug 1017961, which has to do with MBeans appearing down
genman Nov 12, 2013 2:49 PM (in response to tsegismont)-bash-4.1$ java -version
java version "1.6.0_38"
Java(TM) SE Runtime Environment (build 1.6.0_38-b05)
Java HotSpot(TM) 64-Bit Server VM (build 20.13-b02, mixed mode)
It's the Oracle version.
-
9. Re: Re: Bug 1017961, which has to do with MBeans appearing down
tsegismont Nov 14, 2013 12:28 PM (in response to genman)Elias,
A quick update: I was able to reproduce your problem with a Tomcat server on a Linux machine running OpenJDK6:
2013-11-14 17:41:39,740 WARN [InventoryManager.availability-1] (rhq.core.pc.inventory.AvailabilityExecutor)- Availability collection failed with exception on Resource[id=10206, uuid=c 24df09c-0ee0-4171-bce0-e2b6a4721493, type={Tomcat}Memory Pool, key=java.lang:name=CMS Perm Gen,type=MemoryPool, name=CMS Perm Gen, parent=Memory Subsystem], availability will be report ed as DOWN java.lang.reflect.UndeclaredThrowableException at sun.proxy.$Proxy113.isRegistered(Unknown Source) at org.mc4j.ems.impl.jmx.connection.bean.DMBean.isRegistered(DMBean.java:188) at org.rhq.plugins.jmx.MBeanResourceComponent.isMBeanAvailable(MBeanResourceComponent.java:242) at org.rhq.plugins.jmx.MBeanResourceComponent.getAvailability(MBeanResourceComponent.java:229) at sun.reflect.GeneratedMethodAccessor51.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.rhq.core.pc.inventory.ResourceContainer$ComponentInvocation.call(ResourceContainer.java:654) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:679) Caused by: java.rmi.ConnectException: Connection refused to host: 192.168.13.13; nested exception is: java.net.ConnectException: Connection refused at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:619) at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:216) at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202) at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:128) at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl_Stub.isRegistered(Unknown Source) at javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.isRegistered(RMIConnector.java:847) at sun.reflect.GeneratedMethodAccessor91.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.mc4j.ems.impl.jmx.connection.support.providers.proxy.JMXRemotingMBeanServerProxy.invoke(JMXRemotingMBeanServerProxy.java:59) ... 13 more Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:385) at java.net.Socket.connect(Socket.java:546) at java.net.Socket.connect(Socket.java:495) at java.net.Socket.<init>(Socket.java:392) at java.net.Socket.<init>(Socket.java:206) at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:40) at sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:146) at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:613) ... 23 more
I hope to close the bug by tomorrow.
Regards,
Thomas
-
10. Re: Bug 1017961, which has to do with MBeans appearing down
tsegismont Nov 15, 2013 12:22 PM (in response to tsegismont)Hi Elias,
Please have a look at https://bugzilla.redhat.com/show_bug.cgi?id=1017961#c5 and following. Let's continue the discussion there.
Regards,