9 Replies Latest reply on Oct 11, 2010 11:04 PM by clebert.suconic

    consumer/producer can't reconnect to queue after hornet restart

    hughbragg

      When I restart hornet, the consumer and the producer both try to reconnect automatically but both fail.

       

      I have the producer and consumer in a loop trying to reconnect to hornet if the connection fails for any reason. They both do a full startup each loop.

      When hornet comes back up they think they have reconnected but sending fails and receiving detects no messages.

      Hornet thinks it has no consumers but the consumer actually does a full reconnect after a minute of idle time and no errors are detected.

      The only way to get things rolling again is to restart both producer and consumer.

       

      This code is run each time an error is caught or after 1 minute idle time on the consumer, but it still won't connect even when hornet is back up and hornet doesn't see a new consumer connecting. No exception is thrown.

      {code}

      private void SetupPTP() throws JMSException {

          // Step 1. Directly instantiate the JMS Queue object.

          queue = HornetQJMSClient.createQueue(cfg.CONFIG_QUEUE);

       

          // Step 2. Instantiate the TransportConfiguration object which contains

          // the knowledge of what transport to use,

          Map<String, Object> connectionParams = new HashMap<String, Object>();

          connectionParams.put(TransportConstants.PORT_PROP_NAME, cfg.CONFIG_PORT);

          connectionParams.put(TransportConstants.HOST_PROP_NAME, cfg.CONFIG_HOST);

       

          TransportConfiguration transportConfiguration =

              new TransportConfiguration(

                   NettyConnectorFactory.class.getName(),

                   connectionParams);

       

          // Step 3 Directly instantiate the JMS ConnectionFactory

          // object using that TransportConfiguration

          ConnectionFactory cf = HornetQJMSClient

                  .createConnectionFactory(transportConfiguration);

       

          // Step 4.Create a JMS Connection

          queueConnection = (QueueConnection)cf.createConnection();

       

          // Step 5. Create a JMS Session

          queueSession = (QueueSession)queueConnection

                  .createSession(true, Session.AUTO_ACKNOWLEDGE);

       

          // Step 6. Create the consumer

          consumer = this.jmsSession.createConsumer(queue);


        }

      {code}

       

      Can anyone offer an explanation as to why?

       

      Is there a way to recommence processing after hornet goes down without restarting the consumer/producer?

       

      Cheers

        • 1. Re: consumer/producer can't reconnect to queue after hornet restart
          timfox

          There's a whole chapter in the user manual on reconnection. HornetQ can do this automaically (if you want)

           

          There are also fully working examples in the distro demonstrating both automatic and manual reconnection.

          1 of 1 people found this helpful
          • 2. Re: consumer/producer can't reconnect to queue after hornet restart
            hughbragg

            That's right but it doesn't seem to cover specifically what I'm doing.

            In one case I have no control over the HornetQ server and it doesn't have the configuration I'd like.

            I want to create a JMS MessageReceiver which handles restarts/network interruptions without any intervention.

            There is no failover and I'm using JMS directly not over JNDI but not withstanding that is there any problem with doing this:

             

            {code}

            HornetQConnectionFactory hqcf = HornetQJMSClient

                      .createConnectionFactory(transportConfiguration);

                 hqcf.setClientFailureCheckPeriod(cfg.CONFIG_IDLE_TIMEOUT_SECS * 1000);

                 hqcf.setRetryInterval(2000);          // 2 seconds for first retry

                 hqcf.setRetryIntervalMultiplier(1.5); // 1.5 times loner betrween retrys

                 hqcf.setMaxRetryInterval(20000);      // Wait max 20 secs between retrys

                 hqcf.setReconnectAttempts(-1);        // Retry forever

            {code}

             

            I can't seem to find the api for this one and the documentation seems to be refering to this in the context of the server.

            Anyway, all my tests so far indicate that this is fine. It seem to work, but I can't lookup the api for the public methods so I'm a bit unsure what to expect.

            • 3. Re: consumer/producer can't reconnect to queue after hornet restart
              timfox

              I'm not really sure what you're trying to achieve here. If you post a self contained test program with config demonstrating the issue, someone can take a look.

              • 4. Re: consumer/producer can't reconnect to queue after hornet restart
                hughbragg

                Well I may have to, but I thought I'd explained it clearly. I just wanted to get some feedback on if the code segment I posted was valid. The context isn't so important I wouldn't have thought. Is there an api javadoc for this class?

                 

                Anyway, I've run into a problem which looks far more serious but may be related to all the problems I found and not been able to diagnose so far.

                I don't think Hornets implementation of javax.jms.QueueReceiver.receiveNoWait() works reliably.

                 

                Sometimes when my consumer system is supposed to shutdown the JMS process gets hung. I couldn't reproduce it reliably. It just seems to occur randomly. So I was debugging some unrelated code and I triggered a shutdown but the jms process was the only thread which remained active. I ran a jstack on it and I've added the trace.

                 

                It looks like the HornetQ-client-global-scheduled-threads group is waiting for a java.util.concurrent.locks.AbstractQueuedSynchronizer while my Thread-2 is running receiveNoWait. JMS reports there is a consumer on this queue.

                 

                I ran jstack several times but there was no change. As soon as I put a message onto the queue, the process got the message and tried to commit but got this exception:

                 

                {code}

                javax.jms.TransactionRolledBackException The transaction was rolled back on failover to a backup server

                        org.hornetq.core.client.impl.ClientSessionImpl.rollbackOnFailover(ClientSessionImpl.java:497)

                        org.hornetq.core.client.impl.ClientSessionImpl.commit(ClientSessionImpl.java:507)

                        org.hornetq.core.client.impl.DelegatingSession.commit(DelegatingSession.java:156)

                        org.hornetq.jms.client.HornetQSession.commit(HornetQSession.java:229)

                        com.agilityapplications.adapt.JmsQ.commit(JmsQ.java:61)

                        com.agilityapplications.adapt.simpsons.Homer.acknowledgeBatch(Homer.java:65)

                        com.agilityapplications.adapt.simpsons.Homer.myTask(Homer.java:45)

                        com.agilityapplications.adapt.simpsons.Simpson.run(Simpson.java:41)

                {code}

                 

                And then it finally exited.

                 

                This seems to be fairly random so I don't know how to reproduce it.

                 

                My question is, how can receiveNoWait() block?

                • 5. Re: consumer/producer can't reconnect to queue after hornet restart
                  timfox

                  We would really need to see:

                   

                  a) a self contained code example (with config)

                  b) What your expectations are

                  c) What you actually observed

                   

                  Otherwise it's hard to guess what you're trying to do, whether your expectations are correct, and whether there is a bug. We don't have time to diligently study every post to try and reconstruct what's going on.

                   

                  http://community.jboss.org/wiki/Howtoreportabugissue

                   

                  Regarding transaction rolled back exception. This is normal if failover occurred in the middle of a tx. See the user manual for a detailed description of this and how to handle it. There is also an example in the distro that shows this.

                   

                  There is no javadoc for HornetQConnectionFactory currently, but the methods just map to the connection factory config params as described in the user manual. So it should be trivial to match them up:

                   

                  http://hornetq.sourceforge.net/docs/hornetq-2.1.2.Final/user-manual/en/html/configuration-index.html#d0e12990

                  • 6. Re: consumer/producer can't reconnect to queue after hornet restart
                    hughbragg

                    That's fair enough Tim.

                     

                    I've uploaded a mostly self contained example and the config is out of the box standalone non-clustered except I've commented out the JNDI stuff and added a queue called databus. Is it possible for you to run a remote server with that config and dump some messages on it? I don't know how to put this together into a self contained unit. I can post you my jmsSender too if you want, but I expect you have something else better that can do that same job.

                     

                    What I expected was that the call to QueueReceiver.receiveNoWait() to return immediately or throw an error even when the server became unavailable.

                    I want to be able to take control back and do something like shutdown or try to reconnect.

                     

                    The specification is quite clear about this:

                    [http://download.oracle.com/docs/cd/E17802_01/products/products/jms/javadoc-102a/javax/jms/MessageConsumer.html#receiveNoWait%28%29]

                     

                    What I observe is that if the connection drops out before this is called then this call waits for it to failover.

                    The other problem I see is that when the server does return, the client still waits there.

                    When a message is placed on the queue, the receiveNoWait call finally returns with a valid message, but when committing this message a TransactionRolledBackException is thrown.

                     

                    I realise I've setup an auto failover, but that shouldn't affect the return on receiveNoWait.

                     

                     

                    What I intend to do now is implement something to do all the reconnection inside the application so I can stay responsive to shutdown requests without having to worry about the status of the remote server.

                    • 7. Re: consumer/producer can't reconnect to queue after hornet restart
                      timfox

                      It would be expected that commit throws TransactionRolledBackException if the transaction spans a failover. See user manual for a detailed description of why this is the case (or consult the example in the distro).

                       

                      It's not right that receiveNoWait stalls though, there should be a timeout there. If you add a JIRA for that, it can be tracked.

                       

                      If you don't want reconnect but want transparent reattach you need to set confirmation-window-size to > 0, as described in the user manual.

                      • 8. Re: consumer/producer can't reconnect to queue after hornet restart
                        hughbragg

                        Thank you for helping.

                         

                        I got this working using the application managed restore. receiveNoWait functions as I'd expect it to in that case.

                         

                        I understand about the TransactionRolledBackException. Since it stalls in receiveNoWait this behaviour makes no sense.

                         

                        Can you please add a JIRA for me? I believe you can do that. I'm still finding my way around this site.

                         

                        Perhaps it was my confirmation-window-size setting that is related to this problem.

                         

                        The  problem was that during transparent re-attachment I couldn't easily  regain control of that thread so that jvm wouldn't shutdown properly  until that call returned. This forced me to kill the instance uncleanly.  Now I manage the reconnection in the application, I have a loop which  keeps checking for a shutdown message. It works well for me now.

                        • 9. Re: consumer/producer can't reconnect to queue after hornet restart
                          clebert.suconic
                          1 of 1 people found this helpful