3 Replies Latest reply on Mar 9, 2016 6:54 AM by vdzhuvinov

    Partition-Handling in 2 node cluster

    rz1911

      Hi all,

       

      I'm new to infinispan and try to setup a replicated cache in a 2 node cluster. To support CP of the CAP-Theorem I added <partition-handling enabled="true"/> like this:

       

      <replicated-cache name="the-default-cache" statistics="true">

         <locking isolation="REPEATABLE_READ"
         acquire-timeout="20000"
         write-skew="true"
         concurrency-level="5000"
         striping="false"
         />

       

        <transaction
         transaction-manager-lookup="org.infinispan.transaction.lookup.JBossStandaloneJTAManagerLookup"
         complete-timeout="600000"
         stop-timeout="30000"
         auto-commit="false"
         locking="OPTIMISTIC"
         mode="NON_DURABLE_XA"
         />

       

         <versioning scheme="SIMPLE"/>

         <partition-handling enabled="true"/>

         <state-transfer enabled="false" await-initial-transfer="false" chunk-size="100" />

       

      </replicated-cache>


      <replicated-cache name="myCache" statistics="true" mode="SYNC" remote-timeout="20000">


      Cluster startup an normal operations are working fine. My problem is, when I shutdown one node with EmbeddedCachemanager.stop() the other node enters DEGRADED_MODE and the cluster is no loger available.

      So my Question is, how can I stop one cluster node and start it later again, without loosing availability.


      Thanks for your feedback!

      Ralf



        • 1. Re: Partition-Handling in 2 node cluster
          dan.berindei

          I'm afraid our partition handling doesn't work very well with just 2 nodes - we have a silent assumption that you'll have at least 3 nodes so that you'll always have 50% + 1 of the nodes running (except for when you really have a partition).

           

          From an implementation perspective, the running node enters DEGRADED_MODE because we don't differentiate enough between clean shutdown and a crash.

           

          But even from a theoretical perspective, preserving consistency requires a majority of nodes to be available. If you have only 2 nodes, you can stop a node, isolate it from the first node, then start it, and both nodes will be available for writing. Even bigger clusters are vulnerable here, because the starting node doesn't know what the majority should be. (Only since 8.2.Beta2 you can prevent this with the new <transport initial-cluster-size="2"/> configuration.)

           

          Please create an issue in JIRA if my theory hasn't convinced you, and maybe you'll convince us to ignore the majority rule for clean shutdown

          • 2. Re: Partition-Handling in 2 node cluster
            rvansa

            Dan, I think that it makes perfect sense that when a node leaves voluntarily (as opposed to abrupt termination of network connection), the cluster would not enter degraded state.

             

            I think that the troublesome part could be even current 'clean shutdown'. IIRC, when a node shutdowns in stop(), it just says 'Bye, I am leaving', but it's not waiting if the rebalance finishes completely (and if another node crashes during the rebalance, data can be lost, instead of providing it from the leaving-but-not-left node). So, a proper clean shutdown should install topology with writeCH without the leaving node but readCH with it, and the leaving node should not leave until next topology with readCH without it. Then, the fact that the node actually left should not make the cluster enter degraded state. Maybe I was technically incorrect somewhere above (I can't recall how the available nodes set in topology is used), but I think that the gist is clean.

            • 3. Re: Partition-Handling in 2 node cluster
              vdzhuvinov

              That's good to know, to keep partition handling disabled for 2 nodes, unless the DEGRADED_MODE is desired or can be tolerated.