0 Replies Latest reply on Nov 19, 2010 4:43 AM by Masao Kato

    lost message after the recover of the network error by cluster message transfer

    Masao Kato Newbie

      Hi, I am testing the cluster with HornetQ on JBoss5. (HornetQ2.1.1Final with JBossEAP5.1)

       

      There is the case that The network error generates it by the message transport between cluster nodes. The message has disappeared after the network error recovers.

       

      I reproduced a problem by the following scenarios.

       

      1. Constitute a cluster in two nodes(node#0,node#1)
      2. Send a message to node#0 from a client
      3. When a message is forwarded to node#1 by node#0, stop with a remote debugger.(*1)
      4. The network cable of node#1 is pulled out.
      5. Restart node#0 from the stop of the breakpoint.
      6. The network cable of node#1 ties when doing for a while.

      It waits until "WARN  [org.hornetq.core.protocol.core.impl.RemotingConnectionImpl] (hornetq-failure-check-thread) Connection failure has been detected: Did not receive ping from /***.***.***.***:****. It is likely the client has exited or crashed without closing its connection, or the network between the server and client has failed. The connection will now be closed. [code=3]" outputs it with server.log of node#0.

       

      *1 BreakPoint
      BridgeImpl.java
          handle(final MessageReference ref)
              producer.send(dest, message);

       

      After connecting the network, the message doesn't remain in node#0 and #1.
      And, the warning has come out.
      "WARN  [ChannelImpl] Can't find packet to clear:  last received command id 10 first stored command id 0"

       


      This message is output in clearUpTo() of org.hornetq.core.protocol.core.impl.ChannelImpl

      ==

         private void clearUpTo(final int lastReceivedCommandID)
         {
            final int numberToClear = 1 + lastReceivedCommandID - firstStoredCommandID;
      
            if (numberToClear == -1)
            {
               throw new IllegalArgumentException("Invalid lastReceivedCommandID: " + lastReceivedCommandID);
            }
      
            int sizeToFree = 0;
      
            for (int i = 0; i < numberToClear; i++)
            {
               final Packet packet = resendCache.poll();
      
               if (packet == null)
               {
                  ChannelImpl.log.warn("Can't find packet to clear: " + " last received command id " +
                                       lastReceivedCommandID +
                                       " first stored command id " +
                                       firstStoredCommandID);
                  return;
               }
      
               if (packet.getType() != PacketImpl.PACKETS_CONFIRMED)
               {
                  sizeToFree += packet.getPacketSize();
               }
      
               if (commandConfirmationHandler != null)
               {
                  commandConfirmationHandler.commandConfirmed(packet);
               }
            }
      
            firstStoredCommandID += numberToClear;
         }
      
      
      

      ==

       

      It means that lastReceivedCommandID and resendCache.size() does not match.
      In addition, the data of resendCache are handled when a value of lastReceivedCommandID will be anything.

       

      I think:

      • When a network error happens, lastReceivedCommandID slips off

      Perhaps I think that command about reconnecting it by failover increases

      • As for the response that is not the forwarding message, the ACK processing of forwarding is done.

       

      Is this thought right?
      I hope this problem solves.

       

      Thanks.