-
30. Re: Deadlock when using netty NIO acceptor
zohar_melamed Jul 28, 2011 12:14 PM (in response to postmaxin)Tyler , that describes exactly what we do.
-
31. Re: Deadlock when using netty NIO acceptor
clebert.suconic Jul 28, 2011 1:31 PM (in response to zohar_melamed)Check your Thread Pool Size. Maybe the system is starving since the getMessageCount() is looking at the Executors.
Also, can someone try this change on QueueImpl.getMessageCount()
Use Branch_2_2_AS7:
public long getMessageCount()
{
final CountDownLatch latch = new CountDownLatch(1);
final AtomicLong count = new AtomicLong(0);
getExecutor().execute(new Runnable()
{
public void run()
{
count.set(getInstantMessageCount());
latch.countDown();
}
});
try
{
if (!latch.await(10, TimeUnit.SECONDS))
{
throw new IllegalStateException("Timed out on waiting for MessageCount");
}
}
catch (Exception e)
{
log.warn(e.getMessage(), e);
}
return count.get();
}
You will need the proper imports
-
32. Re: Deadlock when using netty NIO acceptor
clebert.suconic Jul 29, 2011 6:15 PM (in response to clebert.suconic)Ok, so I see two separate issues here:
One is due to using getMessageCount() if paging. I can produce a patch on monday. I wonder if Tyler MacDonald would be able to give it a try.
The other is calling createConsumer on multiple threads, using the same Connection. We should fix it next week as well. (we don't have a patch yet but Howard Gao should be working on it).
-
33. Re: Deadlock when using netty NIO acceptor
postmaxin Jul 29, 2011 6:43 PM (in response to clebert.suconic)Awesome, thanks Clebert!
It's mitigated by having my JMX scraper call Thread.sleep(1) each cycle of it's loop.... so once you have your patches, I'll take the sleeps out and see how it goes.
-
34. Re: Deadlock when using netty NIO acceptor
clebert.suconic Jul 29, 2011 10:54 PM (in response to postmaxin)Tyler, can you provide me some code for what you're doing? (just to have an idea on how you are accessing management?)
-
35. Re: Deadlock when using netty NIO acceptor
clebert.suconic Jul 29, 2011 11:33 PM (in response to clebert.suconic)Actually, I have already placed a fix on this branch:
http://anonsvn.jboss.org/repos/hornetq/branches/Branch_2_2_EAP_cluster_clean2/
I wouldn't place it in production.. but if you can make a test.
-
36. Re: Deadlock when using netty NIO acceptor
postmaxin Jul 30, 2011 1:01 AM (in response to clebert.suconic)Here you go. It basically just walks JMX and returns any integer keys so we can pass them on to graphs and alarms.
package org.crackerjack.jmx2counters; import java.util.Set; import java.util.Map; import java.lang.Long; import java.lang.NumberFormatException; import java.lang.management.ManagementFactory; import javax.management.MBeanAttributeInfo; import javax.management.MBeanServer; import javax.management.MBeanInfo; import javax.management.ObjectName; import javax.management.ObjectInstance; import javax.management.InstanceNotFoundException; import javax.management.MBeanException; import javax.management.IntrospectionException; import javax.management.AttributeNotFoundException; import javax.management.ReflectionException; import javax.management.openmbean.CompositeData; import javax.management.RuntimeMBeanException; public abstract class SimpleCounterGrabber { private MBeanServer platformServer; protected SimpleCounterGrabber(String name) { super(name); platformServer = ManagementFactory.getPlatformMBeanServer(); } protected String scrubCounterName(String key) { return key.replaceAll("\\s", "_").replaceAll(",", "-").replaceAll("\"", "'"); } protected void addObjectToCounters(Map counters, String key, Object value) { try { long longValue = new Long(value.toString()); counters.put(scrubCounterName(key), longValue); } catch(NumberFormatException e) { // value cannot be represented as a string, not much I can do here. } catch(NullPointerException e) { // value does not exist } } protected void addCompositeToCounters(Map counters, String key, CompositeData value) throws InterruptedException { for(String k : value.getCompositeType().keySet()) { addObjectToCounters(counters, key + "." + k, value.get(k)); Thread.sleep(1); } } protected void addObjectToCounters(Map counters, MBeanAttributeInfo info, String key, Object value) throws InterruptedException { if(value instanceof CompositeData) { addCompositeToCounters(counters, key, (CompositeData)value); } else { addObjectToCounters(counters, key, value); } } protected void getObjectInstanceCounters(Map counters, ObjectInstance instance) throws InstanceNotFoundException, MBeanException, AttributeNotFoundException, IntrospectionException, ReflectionException { ObjectName objectName = instance.getObjectName(); MBeanInfo beanInfo = platformServer.getMBeanInfo(objectName); for(MBeanAttributeInfo i : beanInfo.getAttributes()) { String attributeName = i.getName(); String key = objectName.getCanonicalName() + ":" + attributeName; try { Object value = platformServer.getAttribute(objectName, attributeName); addObjectToCounters(counters, i, key, value); Thread.sleep(1); } catch (RuntimeMBeanException e) { if(e.getCause() instanceof UnsupportedOperationException) { // some keys will expose themselves in the listing even if they are // not supported, and then rudely throw an exception when you try // to look at them. } else { throw e; } } catch (InterruptedException ie) { throw new RuntimeException(ie); } } } public Map makeCounters() { Map counters = super.makeCounters(); return this.makeCounters(counters); } protected Map makeCounters(Map counters) { Set mbeans = platformServer.queryMBeans(null, null); for(ObjectInstance i : mbeans) { try { getObjectInstanceCounters(counters, i); Thread.sleep(1); } catch(Exception e) { throw new RuntimeException(e); } } return counters; } }
-
37. Re: Deadlock when using netty NIO acceptor
postmaxin Jul 30, 2011 1:02 AM (in response to postmaxin)Note the Thread.sleep(1) calls -- with those, the problem nearly disappears. Without them, we're in a tight loop without any yielding, which i think exasperates the deadlock.
-
38. Re: Deadlock when using netty NIO acceptor
clebert.suconic Aug 1, 2011 12:42 AM (in response to postmaxin)Can you try the branch I asked you ? (as a test)
If still happening, please provide me a thread dump.
(BTW: You are overusing management... but the change should keep up better with this).
-
39. Re: Deadlock when using netty NIO acceptor
postmaxin Aug 5, 2011 9:42 PM (in response to clebert.suconic)I'll give it a shot between now and monday. FWIW, expiry threads seem to exasperate the deadlock as well.
-
40. Re: Deadlock when using netty NIO acceptor
postmaxin Aug 8, 2011 12:59 PM (in response to clebert.suconic)Clebert,
That branch appears to alleviate the consumer deadlock somewhat, but it appears there is still something wrong with expiry. I threads producing to 70 queues, only 65 of which had consumers. Messages were set to expire after 10 seconds, and I had the expiry check interval set to 30 seconds. I had a consumer attached to the expiry thread that periodically log how many messages it had consumed.
Under this setup, the expiry thread only appeared to do its work 3 or 4 times before stopping/deadlocking entirely. I even stopped my producers and consumers and waited for a few hours and the expiry thread never expired another message, instead I still have thousands of messages that should have expired sitting in my other queues.
Stack trace is here: http://pastebin.com/7PQFzPzR
-
41. Re: Deadlock when using netty NIO acceptor
clebert.suconic Aug 8, 2011 3:21 PM (in response to postmaxin)Did you get any exception logs?
-
42. Re: Deadlock when using netty NIO acceptor
postmaxin Aug 8, 2011 3:38 PM (in response to clebert.suconic)None at all. And if you look at like #421 of that paste, it appears that the reaper thread is still running.
-
43. Re: Deadlock when using netty NIO acceptor
clebert.suconic Aug 11, 2011 6:07 PM (in response to postmaxin)@tyler: you may try https://svn.jboss.org/repos/hornetq/branches/Branch_2_2_EAP/ if you'd like... there's no synchronization happening on the queue through expiration any more.. it's just an executor being used (actor-like).
I've also fixed the deadlock found by Carl. I think it will be fine now.