7 Replies Latest reply on Feb 14, 2012 10:53 PM by dhirwinjr

    Certain JMS topics just stop responding

    dhirwinjr

      All,

       

      I'm using a standalone instance of HornetQ 2.2.5 in RHEL in a non-J2EE environment. I'm having a problem where after a certain period of time (could be days, could be a week or more) clients are unable to publish to a JMS topic (our application has 40+ JMS topics defined in the hornetq-jms.xml file accessible via JNDI). When this happens, however, often other clients have no problem publishing to other JMS topics (i.e. the whole HornetQ server doesn't become hosed). When the problem develops the clients sending to the topic in question end up blocking on the call messageProducer.send(objMessage). With the current client failure check set to 5 minutes I see the client unable to publish for approximately 5 minutes (I have a timeout when trying to publish) and I then see this error:

       

      ERROR [2012-02-12 14:19:03,693] [Thread-2] [protocol.jms.lib.ATMSContext$ReconnectExceptionListener] -- Received JMS connection exception [errorCode: DISCONNECT]: javax.jms.JMSException: HornetQException[errorCode=3 message=Did not recei

      ve data from server for org.hornetq.core.remoting.impl.netty.NettyConnection@686490[local= /172.30.1.11:60603, remote=/172.30.1.10:10049]]

      javax.jms.JMSException: HornetQException[errorCode=3 message=Did not receive data from server for org.hornetq.core.remoting.impl.netty.NettyConnection@686490[local= /172.30.1.11:60603, remote=/172.30.1.10:10049]]

              at org.hornetq.jms.client.HornetQConnection$JMSFailureListener.connectionFailed(HornetQConnection.java:643)

              at org.hornetq.core.client.impl.ClientSessionFactoryImpl.callFailureListeners(ClientSessionFactoryImpl.java:818)

              at org.hornetq.core.client.impl.ClientSessionFactoryImpl.failoverOrReconnect(ClientSessionFactoryImpl.java:605)

              at org.hornetq.core.client.impl.ClientSessionFactoryImpl.handleConnectionFailure(ClientSessionFactoryImpl.java:482)

              at org.hornetq.core.client.impl.ClientSessionFactoryImpl.access$800(ClientSessionFactoryImpl.java:78)

              at org.hornetq.core.client.impl.ClientSessionFactoryImpl$DelegatingFailureListener.connectionFailed(ClientSessionFactoryImpl.java:1318)

              at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.callFailureListeners(RemotingConnectionImpl.java:528)

              at org.hornetq.core.protocol.core.impl.RemotingConnectionImpl.fail(RemotingConnectionImpl.java:298)

              at org.hornetq.core.client.impl.ClientSessionFactoryImpl$PingRunnable$1.run(ClientSessionFactoryImpl.java:1376)

              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)

              at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

              at java.lang.Thread.run(Unknown Source)

      Caused by: HornetQException[errorCode=3 message=Did not receive data from server for org.hornetq.core.remoting.impl.netty.NettyConnection@686490[local= /172.30.1.11:60603, remote=/172.30.1.10:10049]]

              at org.hornetq.core.client.impl.ClientSessionFactoryImpl$PingRunnable.run(ClientSessionFactoryImpl.java:1366)

              at org.hornetq.core.client.impl.ClientSessionFactoryImpl$ActualScheduledPinger.run(ClientSessionFactoryImpl.java:1337)

              at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)

              at java.util.concurrent.FutureTask$Sync.innerRunAndReset(Unknown Source)

              at java.util.concurrent.FutureTask.runAndReset(Unknown Source)

              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(Unknown Source)

              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(Unknown Source)

              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)

              ... 3 more

      WARN [2012-02-12 14:19:03,694] [Thread-2] [protocol.jms.lib.ATMSContext] -- Reconnecting...

       

       

      Our particular application has no requirements for persistence so I've disabled all persistence via the <persistence-enabled>false</persistence-enabled>. I've also explicitly not defined a dead-letter-address or expiry-address so that these messages are (or should be) just dropped. Our requirements don't need all messages to be delivered so I've set the address-full-policy to DROP.

       

      Currently we're using ObjectMessages when publishing (we at times tend to publish rather long arrays of Serializable objects) although the objects that do get published aren't overly complicated (just beans that are Serializable).

       

      As an example here's the code we currently use to publish to a particular JMS topic:

       

          /**

           * Attempt to publish a message to the given JMS topic.

           *

           * @param jmsTopic

           * @param messageName

           * @param obj object to publish

           * @throws JMSException

           * @throws RemoteException

           * @throws NamingException

           */

          public synchronized static final void publishMessage(String jmsTopic, String messageName,

                  Serializable obj) throws JMSException, RemoteException, NamingException {

              logger.debug("Publishing [topic: " + jmsTopic + ", messageName: " + messageName + "]");

       

       

              if (jmsTopic == null) {

                  logger.warn("Null topic...ignoring publish request");

                  return;

              }

       

                // use a cached version of the JMS session object

              Session session = sessionMap.get(jmsTopic);

              if (session == null) {

                  session = ATMSContext.getProducingSession();

                  sessionMap.put(jmsTopic, session);

              }

       

       

              // get the message producer, creating one lazily as needed

              MessageProducer messageProducer = producerMap.get(jmsTopic);

              if (messageProducer == null) {

                  messageProducer = session.createProducer(ATMSContext.getDestination(jmsTopic));

       

                  // set the message expiration time

                  messageProducer.setTimeToLive(messageTimeToLive);

       

                  producerMap.put(jmsTopic, messageProducer);

              }

       

              ObjectMessage objMessage = session.createObjectMessage(obj);

              objMessage.setStringProperty(ATMSContext.MESSAGE_NAME, messageName);

              messageProducer.send(objMessage);

       

              logger.debug("Successfully published [topic: " + jmsTopic + ", messageName: " + messageName

                      + "]");

          }

       

       

      I unfortunately don't have any test cases that will immeidatley reproduce the problem (which is frustrating on my end). I'm including our current configurations as a reference. Any suggestions as to why sometimes only particular JMS topics are getting hosed?

       

      Thanks,

      Dave

        • 1. Re: Certain JMS topics just stop responding
          clebert.suconic

          It seems you are having connection issues.

           

          You should probably configure reconnection, and maybe window-size so it will confirm and reattach your topics.

           

          This is well documented. But if you have a specific question about how to do it just drop us a line and we can help you.

          1 of 1 people found this helpful
          • 2. Re: Certain JMS topics just stop responding
            dhirwinjr

            Thank you Clebert for you response. It would seem that we are having network connection issues but these JMS clients are on servers that are all connected to the messaging server via one of two Cisco switches on a GigE network and to my knowledge we're not having network connection issues elsewhere. But, I'd like to implement your suggestion and read the "Client Reconnection and Session Reattachment" section of the manual and followed the "reattach-node" example. Per the documentation I'm going to update my hornetq-jms.xml file to include the re-attachment properties (the connection-ttl and other related properties were already present):

             

            <!-- These are new -->
            <retry-interval>2000</retry-interval>
            
            <retry-interval-multiplier>1.0</retry-interval-multiplier>
            
            <reconnect-attempts>-1</reconnect-attempts>
            
            <confirmation-window-size>1048576</confirmation-window-size>
            
            <!-- These existed previously -->
            <connection-ttl>120000</connection-ttl>
            
            <client-failure-check-period>60000</client-failure-check-period>
            
            <min-large-message-size>5242880</min-large-message-size>
            

             

            I know that settings/configurations are specific to one's setup but do these seem to make sense in general?

             

            Additionally, per the user manul it says that when the client reconnects or re-attaches any registered JMS ExceptionListener will be called. In our own internal JMS connection manager I do have a registered ExceptionListener that previously would manually (i.e. handled by our application) attempt to unsubscribe all MessageListenes and then re-subscribe to all the previously subscribed topics (as well and re-binding any RMI objects but that's separate) whether the jmsException.getErrorCode() was FAILOVER or DISCONNECT. I'm assuming that if the client successfully re-attaches the topics, throws a FAILOVER ExceptionListener that I then don't want to go and attempt to re-subscribe my topics (b/c they were just automatically re-attached). Does that make sense?

             

            Thanks again,

            Dave

            • 3. Re: Certain JMS topics just stop responding
              clebert.suconic

              I'm not 100% sure.. I would have to recheck the code. But I believe that if you are doing a regular reconnection, you won't get the ExceptionListener called.

               

              We may take a look if you're having issues.

               

               

              @AndyTaylor: do you remember this detail in top of your head? if not I will have to read the code to refresh my memory.

              • 4. Re: Certain JMS topics just stop responding
                clebert.suconic

                Maybe also some intermitent issue on large GCs. Why don't you take a look at GCs? Are you using any paging?

                • 5. Re: Certain JMS topics just stop responding
                  dhirwinjr

                  Clebert Suconic wrote:

                   

                  Maybe also some intermitent issue on large GCs. Why don't you take a look at GCs? Are you using any paging?

                   

                  I'll take a look at the GC activity but last I looked there's not a lot going on. BTW, I have all persistence disabled as it's not really a requirement for our particular application (I have <persistence-enabled>false</persistence-enabled> in the hornetq-configuration.xml file so by this I'm assuming that all paging is disabled; there's no paging files that I can find).

                   

                  Thanks,

                  Dave

                  • 6. Re: Certain JMS topics just stop responding
                    clebert.suconic

                    If you have an inactive subscription.. or a lazy subscription,  messages may build up use more memory: a result: more work to be done by the VM.

                     

                    Also: look at the VM settings. Parallel Garbage Collections... etc....

                    • 7. Re: Certain JMS topics just stop responding
                      dhirwinjr

                      Clebert Suconic wrote:

                       

                      It seems you are having connection issues.

                       

                      You should probably configure reconnection, and maybe window-size so it will confirm and reattach your topics.

                       

                      This is well documented. But if you have a specific question about how to do it just drop us a line and we can help you.

                       

                      Clebert,

                       

                      Just wondering if there's any other possible problem other than a connection issue. I say this because we have multiple individual Java processes that are all running on the same physical server yet when we experience this issue it doesn't happen to all Java processes and JMS clients; it usually only happens to one of the Java processes. If it were a network issue I would expect to see this problem across all the Java processes (each of which are a JMS client) not just a one of the processes. Any further thoughts?

                       

                      Thanks,

                      Dave