2 Replies Latest reply on Aug 19, 2005 5:01 PM by javajedi

    HA destination ends up on the wrong box

      I'm running into what looks like a bug using HA JMS. At some point while the cluster is running and working fine, the cluster looks like it starts shuffling things around. I'm not quite sure if the singleton JMS destination is moving or not during this period, but I suspect it is. If I invoke the showHistory operation on the DefaultPartition in the JMX console, I see the following, which is happening right around the time that stuff stops working:

      8/19/05 3:37 AM : Node suspected: liven:38967 (additional data: 17 bytes)
      8/19/05 3:37 AM : Node suspected: liven:38967 (additional data: 17 bytes)
      8/19/05 3:37 AM : New view: [10.67.89.133:1099, 10.67.89.132:1099] with viewId: 3 (old view: [10.67.89.132:1099, 10.67.89.133:1099] )
      8/19/05 3:37 AM : setState called on partition
      8/19/05 4:22 AM : Node suspected: liven:39030 (additional data: 17 bytes)
      8/19/05 4:22 AM : New view: [10.67.89.132:1099] with viewId: 0 (old view: [10.67.89.133:1099, 10.67.89.132:1099] )
      8/19/05 4:22 AM : setState called on partition
      8/19/05 4:22 AM : New view: [10.67.89.132:1099, 10.67.89.133:1099] with viewId: 5 (old view: [10.67.89.132:1099] )

      It looks like in the first case, (at 3:37 AM), the cluster view changed its ordering. Does this move the HA JMS destination? Then later at 4:22, one of the nodes completely left the view, but only for less than a minute, then it came back. So I came in this morning to find that all of the nodes in the cluster were connected to HA JMS fine, and saw 0 messages on the HA queue, with the HA queue being hosted on 10.67.89.132. If I look at the queue in the JMX console, its QueueDepth is 0. However, if I look at 10.67.89.133's JMX console, the QueueDepth for the HA queue is 227. So there are messages still sitting in the queue, but nobody is receiving them, because everyone is connecting to HA JMS, which is being served by 10.67.89.132, and the version of the HA queue on that box doesn't see any messages on the queue. I also verified this situation by connecting to HA JMS from an independent command-line client just now, and it sees 0 messages on the HA queue.

      Does anyone have any suggestions for me as to how I can work around this problem and/or enable any logging that might help track down what's going on?