Nov 9, 2012

    Rejoin the cluster

    Yan Wang

      Hi! All,


      We have the issue once the network is blocked. We have three nodes called "XDataGrid Container 200040","XDataGrid Container 200041" and "XDataGrid Controller Node 8249". Once we block the network commuincation in one node, say one of "XDataGrid Container", The node is removed from cluster. The cache view is changed accordingly. Everything is OK. Then we unblock the network. However seems the communication to that nodes is still not estabilished. We got 


      2012-11-07 15:18:46,526 [OOB-17,XDataGrid Container 200040-25123] WARN    -  - XDataGrid Container 200040-25123: dropped message 74 from XDataGrid Container 200041-40206 (sender not in table [XDataGrid Container 200040-25123, XDataGrid Controller Node 8249-37773]), view=[XDataGrid Container 200040-25123|9] [XDataGrid Container 200040-25123, XDataGrid Controller Node 8249-37773]




      We use TCPPING. We also notice in TCPPING, The cluster membership estabilished at very begining (once you start infinispan cache). If you make your TCPPING time out in short say 1ms. Then no one will have the right membership at the begining. In this case, we have the same kind of errors. And each node can just keep rejecting the messages from other node.


      Anything we need to do for this case? Shall we restart the cache once we try to rejoin the cluster?


      Thanks for any help!