3 Replies Latest reply on Aug 16, 2011 1:27 PM by gernot.bauer

    Problems with DIST-SYNC Cluster when restarting nodes

    gernot.bauer

      Hi,

       

      I am running a DIST-SYNC infinispan cluster with hot-rod server, BDBJE cache store and Infinispan 4.2.1-FINAL on EC2. When the cluster crashes or was shut down, the nodes do not join correctly after I restarted them.


      The steps I perform are (IP addresses are masked - I use the machine's public IP address for startup):

      • Start node1 (./startServer.sh -r hotrod -p 11222 -l 0.0.0.1 -c ec2InfinispanConfig.xml)
      • Wait until this node is up and running (= all caches have bean loaded and the server is bound to port 11222)
      • Start node2 (./startServer.sh -r hotrod -p 11222 -l 0.0.0.2 -c ec2InfinispanConfig.xml)

       

      The log shows that the second node tries to join the cluster, but soon after the start i see log messages like the following on node 1:

      ERROR [org.jgroups.protocols.TCP] (OOB-2,infinispan-cluster-set,ip-0-0-0-1-493) failed sending message to ip-0-0-0-2-60013 (60108 bytes): java.lang.IllegalStateException: Queue full, cause: null

       

      Does anyone have any idea what is going wrong? Right now, my only "workaround" is to clear the cache store. This is acceptable for dev, but unfortunately not for production.