0 Replies Latest reply on Nov 18, 2010 9:40 PM by Masao Kato

    Cluster message transfer locks after node hard kill

    Masao Kato Newbie

      Hi, I am testing the cluster with HornetQ on JBoss5. (HornetQ2.1.2Final / JBossEAP5.1)


      There is the case that message transfer locks when I do kill of the server process of the destination by cluster message transfer.


      It seems to be stored away by forwarding Queue when I send a message from a client after having done kill of a server node.


      But, there is no response when forwarding  Queue is confirmed with the JMX console of a forwarding former node.


      I succeeded in causing this problem by the following scenarios and took a thread dump.


      1. Constitute a cluster in two nodes(node#0,node#1)
      2. Send a message to node#0 from a client
      3. When a message is forwarded to node#1 by node#0, stop with a remote debugger.(*1)
      4. Stop node#1(kill -9)
      5. Restart node#0 from the stop of the breakpoint.
      6. Send a message from a client and try to watch forwarding Queue from a JMX console
      7. Take a thread dump of node#0(kill -3)


      *1 BreakPoint
         handle(final MessageReference ref)
           producer.send(dest, message);


      Check a thread dump:

      • Thread-21 (group:HornetQ-server-threads463989851-1539190483)
        forwarding thread
        waiting for Semaphore with two locks(0x00002aaacd48ac70,0x00002aaacd48ae10)
      • Thread-1 (group:HornetQ-client-global-threads-64314422)
        waiting for a lock of Thread-21(0x00002aaacd48ac70)
      • http-
        JMX console access
        waiting for a lock of Thread-21(0x00002aaacd48ae10)


      FailoverManager does not work unless Thread-21 works, and the JMX console access does not work.


      Thread-21 seems to wait for credit from node#1, but node#1 is already done kill.


      I think that this is very bad. However, the following comments are written in the source code.
      // This will block if credits are not available


      This lock seems to be expected, but is there any work around?
      and, it is not good that FailoverManager stops in this lock.