4 Replies Latest reply on Jul 7, 2016 2:15 PM by wdfink

    Incorrect cluster re-balance by reconnecting node after network outage

    sjaiswal

      Hi All,

       

      I have been working on a complex system of embedded cache and we are multiple issues regarding incorrect re-balance and split brain scenarios.

       

      below are the details of how application is current working.

       

      infinispan verison 8.1

      wildfly version 10

      java version 1.8

      jgroups transport 3.6

       

      lets assume there are n numbers of nodes.

      1. only one node can write data to caches through application.

      2. num_owners of caches is default as 2.

      3. caches are distributed to serve more clients ( clients are listening to cache all the time ).

      4. writing/updating of data on caches can happen any point of time.

      5. some of caches are made as tree cache programmatically while others are not.

      6. transaction is used as pessimistic for tree caches with auto-commit and batch mode.

      7. locking is used as serializable with default settings.

      8. versioning is used as simple for all the caches

       

      Description of issue

      example 1 .

      conditions : one of node had network outage for some time, meanwhile caches had been updated by the desired and has been reblanced with other remanding nodes successfully.

      issue : node under network outage comes back online

       

      result 1 : cluster view ends up with cluster splits with incorrect data on more than one node. or

      result 2 : re joining node is able to join back original cluster but ends up corrupting data on one of more than one nodes.

       

      expected behavior : re joining node should be able to join and update to the latest data on original cluster and serve the clients latest cache entries.

       

      can you please suggest how can we control this situation. we have tried out almost all of the configurations. As per behavior restart can solve formation of cluster splits but its not an option to do so.

       

      i can provide configurations if needed. please let me know.

       

      any suggestions are welcome