0 Replies Latest reply on Nov 18, 2010 9:40 PM by ganso

    Cluster message transfer locks after node hard kill

    ganso

      Hi, I am testing the cluster with HornetQ on JBoss5. (HornetQ2.1.2Final / JBossEAP5.1)

       

      There is the case that message transfer locks when I do kill of the server process of the destination by cluster message transfer.

       

      It seems to be stored away by forwarding Queue when I send a message from a client after having done kill of a server node.

       

      But, there is no response when forwarding  Queue is confirmed with the JMX console of a forwarding former node.

       


      I succeeded in causing this problem by the following scenarios and took a thread dump.

       

      1. Constitute a cluster in two nodes(node#0,node#1)
      2. Send a message to node#0 from a client
      3. When a message is forwarded to node#1 by node#0, stop with a remote debugger.(*1)
      4. Stop node#1(kill -9)
      5. Restart node#0 from the stop of the breakpoint.
      6. Send a message from a client and try to watch forwarding Queue from a JMX console
      7. Take a thread dump of node#0(kill -3)

       

      *1 BreakPoint
      BridgeImpl.java
         handle(final MessageReference ref)
           producer.send(dest, message);

       


      Check a thread dump:

      • Thread-21 (group:HornetQ-server-threads463989851-1539190483)
        forwarding thread
        waiting for Semaphore with two locks(0x00002aaacd48ac70,0x00002aaacd48ae10)
      • Thread-1 (group:HornetQ-client-global-threads-64314422)
        FailoverManager?
        waiting for a lock of Thread-21(0x00002aaacd48ac70)
      • http-192.168.10.40-8080-1
        JMX console access
        waiting for a lock of Thread-21(0x00002aaacd48ae10)

       


      FailoverManager does not work unless Thread-21 works, and the JMX console access does not work.

       

      Thread-21 seems to wait for credit from node#1, but node#1 is already done kill.

       

      I think that this is very bad. However, the following comments are written in the source code.
      ==
      ClientProducerCreditsImpl.acquireCredits
      // This will block if credits are not available
      ==

       

      This lock seems to be expected, but is there any work around?
      and, it is not good that FailoverManager stops in this lock.

       

      Thanks