0 Replies Latest reply on Feb 12, 2013 8:41 PM by mgiannini

    ClientConsumer "hanging" frequently after reconnection/failover

    mgiannini

      I am testing the client reconnection/failover capabilites of HornetQ for an appliation. I am using HornetQ 2.3.0.CR1. I am currently running a single node (no backup) hornetq cluster for testing purposes (see attahed hornetq-configuration.xml).

       

       

      Summary: I am seeing that with client reconnection configured, a ClientConsumer will frequently hang after reconnection when it is doing a consumer.receive(timeout) call.

       

      Details:

       

      I configure my ServerLocator as

       

      {code}

            serverLocator.setClientFailureCheckPeriod(10000L);

            serverLocator.setReconnectAttempts(-1);

      {code}

      Then in my client code I am doing simply

       

      {code}

          while (!die)

          {

            try

            {

              ClientMessage msg = consumer.receive(5000L);

              if (msg == null)

                LOGGER.info("no message received, trying again...");

            }

            catch (HornetQException e)

            {

            }

          }

      {code}

      I see the logger info messages indicating no message is received when I first start the application.

       

      I configured a SessionFactory FailureEventListener to trace failover as

       

      {code}

          sessionFactory.addFailoverListener(new FailoverEventListener()

          {

            @Override

            public void failoverEvent(FailoverEventType eventType)

            {

              if (eventType.equals(FailoverEventType.FAILOVER_FAILED))

                System.out.println("++++++++++++++++++ FAILOVER_FAILED");

              else if (eventType.equals(FailoverEventType.FAILURE_DETECTED))

                System.out.println("**************** FAILURE_DETECTED");

              else if (eventType.equals(FailoverEventType.FAILOVER_COMPLETED))

                System.out.println("%%%%%%%%%%%%% FAILOVER_COMPLETED");

            }

          });

      {code}

      When I kill the hornetq server I do see the appropriate trace (FAILURE_DETECTED) followed by FAILOVER_COMPLETED when I restart the hornetq server.

       

      {quote}

      **************** FAILURE_DETECTED

      20:10:20,011 WARN  [org.hornetq.core.client] HQ212050: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=b81e9285-7554-11e2-a4bd-55da25511ba8

      %%%%%%%%%%%%% FAILOVER_COMPLETED

      {quote}

      However, I only very rarely see that my ClientConsumer starts receiving message again.  I never see it resume when I get the above warning.  I did a Thread dump and see that Hornetq appears to be waiting at

       

      {quote}

      "EchoW:Thread-0" prio=10 tid=0x00007f5ad82a4800 nid=0x139f in Object.wait() [0x00007f5acc102000]

         java.lang.Thread.State: TIMED_WAITING (on object monitor)

                at java.lang.Object.wait(Native Method)

                - waiting on <0x00000000ecb32d78> (a org.hornetq.core.client.impl.ClientConsumerImpl)

                at org.hornetq.core.client.impl.ClientConsumerImpl.receive(ClientConsumerImpl.java:251)

                - locked <0x00000000ecb32d78> (a org.hornetq.core.client.impl.ClientConsumerImpl)

                at org.hornetq.core.client.impl.ClientConsumerImpl.receive(ClientConsumerImpl.java:393)

      {quote}

      Sometimes I actually get an IllegalStateException:

       

      {quote}

      Exception in thread "EchoW:Thread-0" java.lang.IllegalStateException: Cannot send a packet while channel is doing failover

      at org.hornetq.core.protocol.core.impl.ChannelImpl.send(ChannelImpl.java:247)

      at org.hornetq.core.protocol.core.impl.ChannelImpl.send(ChannelImpl.java:195)

      at org.hornetq.core.client.impl.ClientSessionImpl.forceDelivery(ClientSessionImpl.java:430)

      at org.hornetq.core.client.impl.ClientConsumerImpl.receive(ClientConsumerImpl.java:294)

      at org.hornetq.core.client.impl.ClientConsumerImpl.receive(ClientConsumerImpl.java:393)

      {quote}

      Could someone confirm the presence of a bug or provide some insight into what I might be doing wrong?

       

      Thanks and Regards,

      matthew