1 2 3 Previous Next 43 Replies Latest reply on Aug 11, 2011 6:07 PM by clebert.suconic Go to original post
      • 30. Re: Deadlock when using netty NIO acceptor
        zohar_melamed

        Tyler , that describes exactly what we do.

        • 31. Re: Deadlock when using netty NIO acceptor
          clebert.suconic

          Check your Thread Pool Size. Maybe the system is starving since the getMessageCount() is looking at the Executors.

           

           

          Also, can someone try this change on QueueImpl.getMessageCount()

           

           

          Use Branch_2_2_AS7:

           

           

             public long getMessageCount()

             {

                final CountDownLatch latch = new CountDownLatch(1);

                final AtomicLong count = new AtomicLong(0);

               

                getExecutor().execute(new Runnable()

                {

                   public void run()

                   {

                      count.set(getInstantMessageCount());

                      latch.countDown();

                   }

                });

               

                try

                {

                   if (!latch.await(10, TimeUnit.SECONDS))

                   {

                      throw new IllegalStateException("Timed out on waiting for MessageCount");

                   }

                }

                catch (Exception e)

                {

                   log.warn(e.getMessage(), e);

                }

               

                return count.get();

             }

           

           

          You will need the proper imports

          • 32. Re: Deadlock when using netty NIO acceptor
            clebert.suconic

            Ok, so I see two separate issues here:

             

             

            One is due to using getMessageCount() if paging. I can produce a patch on monday. I wonder if Tyler MacDonald would be able to give it a try.

             

             

            The other is calling createConsumer on multiple threads, using the same Connection. We should fix it next week as well. (we don't have a patch yet but Howard Gao should be working on it).

            • 33. Re: Deadlock when using netty NIO acceptor
              postmaxin

              Awesome, thanks Clebert!

               

              It's mitigated by having my JMX scraper call Thread.sleep(1) each cycle of it's loop.... so once you have your patches, I'll take the sleeps out and see how it goes.

              • 34. Re: Deadlock when using netty NIO acceptor
                clebert.suconic

                Tyler, can you provide me some code for what you're doing? (just to have an idea on how you are accessing management?)

                • 35. Re: Deadlock when using netty NIO acceptor
                  clebert.suconic

                  Actually, I have already placed a fix on this branch:

                   

                  http://anonsvn.jboss.org/repos/hornetq/branches/Branch_2_2_EAP_cluster_clean2/

                   

                   

                   

                  I wouldn't place it in production.. but if you can make a test.

                  • 36. Re: Deadlock when using netty NIO acceptor
                    postmaxin

                    Here you go. It basically just walks JMX and returns any integer keys so we can pass them on to graphs and alarms.

                     

                     

                      package org.crackerjack.jmx2counters;
                      
                      import java.util.Set;
                      import java.util.Map;
                      import java.lang.Long;
                      import java.lang.NumberFormatException;
                      import java.lang.management.ManagementFactory;
                      
                      import javax.management.MBeanAttributeInfo;
                      import javax.management.MBeanServer;
                      import javax.management.MBeanInfo;
                      import javax.management.ObjectName;
                      import javax.management.ObjectInstance;
                      import javax.management.InstanceNotFoundException;
                      import javax.management.MBeanException;
                      import javax.management.IntrospectionException;
                      import javax.management.AttributeNotFoundException;
                      import javax.management.ReflectionException;
                      import javax.management.openmbean.CompositeData;
                      import javax.management.RuntimeMBeanException;
                      
                      public abstract class SimpleCounterGrabber {
                        private MBeanServer platformServer;
                      
                        protected SimpleCounterGrabber(String name) {
                          super(name);
                          platformServer = ManagementFactory.getPlatformMBeanServer();
                        }
                      
                        protected String scrubCounterName(String key) {
                          return key.replaceAll("\\s", "_").replaceAll(",", "-").replaceAll("\"", "'");
                        }
                      
                        protected void addObjectToCounters(Map counters, String key, Object value) {
                          try {
                            long longValue = new Long(value.toString());
                            counters.put(scrubCounterName(key), longValue);
                          } catch(NumberFormatException e) {
                            // value cannot be represented as a string, not much I can do here.
                          } catch(NullPointerException e) {
                            // value does not exist
                          }
                        }
                      
                        protected void addCompositeToCounters(Map counters, String key, CompositeData value) throws InterruptedException {
                          for(String k : value.getCompositeType().keySet()) {
                            addObjectToCounters(counters, key + "." + k, value.get(k));
                            Thread.sleep(1);
                          }
                        }
                      
                        protected void addObjectToCounters(Map counters, MBeanAttributeInfo info, String key, Object value) throws InterruptedException {
                          if(value instanceof CompositeData) {
                            addCompositeToCounters(counters, key, (CompositeData)value);
                          } else {
                            addObjectToCounters(counters, key, value);
                          }
                        }
                      
                        protected void getObjectInstanceCounters(Map counters, ObjectInstance instance)
                          throws InstanceNotFoundException, MBeanException, AttributeNotFoundException, IntrospectionException, ReflectionException
                        {
                          ObjectName objectName = instance.getObjectName();
                          MBeanInfo beanInfo = platformServer.getMBeanInfo(objectName);
                      
                          for(MBeanAttributeInfo i : beanInfo.getAttributes()) {
                            String attributeName = i.getName();
                            String key = objectName.getCanonicalName() + ":" + attributeName;
                            try {
                              Object value = platformServer.getAttribute(objectName, attributeName);
                              addObjectToCounters(counters, i, key, value);
                              Thread.sleep(1);
                            } catch (RuntimeMBeanException e) {
                              if(e.getCause() instanceof UnsupportedOperationException) {
                                // some keys will expose themselves in the listing even if they are
                                // not supported, and then rudely throw an exception when you try
                                // to look at them.
                              } else {
                                throw e;
                              }
                            } catch (InterruptedException ie) {
                              throw new RuntimeException(ie);
                            }
                          }
                        }
                      
                        public Map makeCounters() {
                          Map counters = super.makeCounters();
                          return this.makeCounters(counters);
                        }
                      
                        protected Map makeCounters(Map counters) {
                          Set mbeans = platformServer.queryMBeans(null, null);
                      
                          for(ObjectInstance i : mbeans) {
                            try {
                              getObjectInstanceCounters(counters, i);
                              Thread.sleep(1);
                            } catch(Exception e) {
                              throw new RuntimeException(e);
                            }
                          }
                      
                          return counters;
                        }
                      }
                      
                    
                    • 37. Re: Deadlock when using netty NIO acceptor
                      postmaxin

                      Note the Thread.sleep(1) calls -- with those, the problem nearly disappears. Without them, we're in a tight loop without any yielding, which i think exasperates the deadlock.

                      • 38. Re: Deadlock when using netty NIO acceptor
                        clebert.suconic

                        Can you try the branch I asked you ? (as a test)

                         

                         

                        If still happening, please provide me a thread dump.

                         

                        (BTW: You are overusing management...     but the change should keep up better with this).

                        • 39. Re: Deadlock when using netty NIO acceptor
                          postmaxin

                          I'll give it a shot between now and monday. FWIW, expiry threads seem to exasperate the deadlock as well.

                          • 40. Re: Deadlock when using netty NIO acceptor
                            postmaxin

                            Clebert,

                                That branch appears to alleviate the consumer deadlock somewhat, but it appears there is still something wrong with expiry. I threads producing to 70 queues, only 65 of which had consumers. Messages were set to expire after 10 seconds, and I had the expiry check interval set to 30 seconds. I had a consumer attached to the expiry thread that periodically log how many messages it had consumed.

                             

                               Under this setup, the expiry thread only appeared to do its work 3 or 4 times before stopping/deadlocking entirely. I even stopped my producers and consumers and waited for a few hours and the expiry thread never expired another message, instead I still have thousands of messages that should have expired sitting in my other queues.

                             

                               Stack trace is here: http://pastebin.com/7PQFzPzR

                            • 41. Re: Deadlock when using netty NIO acceptor
                              clebert.suconic

                              Did you get any exception logs?

                              • 42. Re: Deadlock when using netty NIO acceptor
                                postmaxin

                                None at all. And if you look at like #421 of that paste, it appears that the reaper thread is still running.

                                • 43. Re: Deadlock when using netty NIO acceptor
                                  clebert.suconic

                                  @tyler: you may try https://svn.jboss.org/repos/hornetq/branches/Branch_2_2_EAP/ if you'd like...   there's no synchronization happening on the queue through expiration any more.. it's just an executor being used (actor-like).

                                   

                                   

                                  I've also fixed the deadlock found by Carl. I think it will be fine now.

                                  1 2 3 Previous Next