9 Replies Latest reply on Dec 13, 2007 7:54 AM by manik

    Partitioning in JBoss Cache - JBCACHE-60

    manik

      This is in reference to JBCACHE-60 and a wiki page I put up discussing some initial designs.

      Partitioning is basically a concept taking Buddy Replication a few steps further, primarily removing the need for session affinity.

      Please have a look at my initial thoughts on the wiki - I will elaborate with more diagrams, etc. on the wiki soon.

      Cheers,
      Manik


        • 1. Re: Partitioning in JBoss Cache - JBCACHE-60
          brian.stansberry

          I like. This can be a more effective solution for the session replication use case as well.

          Particularly like the hard and soft limits. I think that's an important concept to allow a tradeoff between redundancy and the need to move data around the cluster following a failure. If I configure 2 buddies with a min of 1 and one of my buddies fails, there is no need to immediately move any data around the cluster upon failure. Any movement will just be for requested data, as the requests come in.

          • 2. Re: Partitioning in JBoss Cache - JBCACHE-60
            brian.stansberry

            The regions as the unit of granularity for a PartitionGroup can be very helpful as well.

            For example, one thing that could be useful would be picking separate PartitionGroup per session. Effect of this is session copies for an "owner" are distributed around the cluster, rather than all on one or two nodes. With that, the session mgmt layer can work with the load balancer to provide info about who the preferred failover target is, without having to worry that upon failure that target is going to have to take over the workload of 100% of the failed node's sessions.

            (Note there's no need to have a "PartitionGroup per session". You just have enough regions/PartitionGroups to adequately divide the failover load and then organize how the sessions are cached accordingly.)

            • 3. Re: Partitioning in JBoss Cache - JBCACHE-60
              manik

               

              "bstansberry@jboss.com" wrote:
              (Note there's no need to have a "PartitionGroup per session". You just have enough regions/PartitionGroups to adequately divide the failover load and then organize how the sessions are cached accordingly.)


              Yes, you could use a fixed number of partition groups and spread the number of sessions in each PG evenly.

              • 4. Re: Partitioning in JBoss Cache - JBCACHE-60
                galder.zamarreno

                As Manik said, this links very nicely to my suggestion to be able to define regions based on regular expressions

                http://jira.jboss.com/jira/browse/JBCACHE-1122

                where all nodes matching a regular expression are part of a region and belong to a specific PartitionGroup.

                You could then differentiate between structural nodes and data nodes and define how many copies of each region should be in the cluster? with data nodes backup requirements being more important than structural ones?

                Regardless, I'm gonna link JBCACHE-1122 to JBCACHE-60 so that we have it in mind.

                • 5. Re: Partitioning in JBoss Cache - JBCACHE-60
                  manik

                   

                  "galder.zamarreno@jboss.com" wrote:
                  You could then differentiate between structural nodes and data nodes and define how many copies of each region should be in the cluster?


                  JBCACHE-1153

                  • 6. Re: Partitioning in JBoss Cache - JBCACHE-60
                    mircea.markus

                    some thoughts on data gravitation.
                    As the session affinity is not enforced (this means that affinity might vary from total randomness to a certain degree) the following optimization might be performed:
                    - if the call is made on a node that is not part of the partition, then rather tunnel the call to one of the nodes in the partition(determine it based on replicated meta data) than to add the node to the partition. This would be more efficient. When a certain threshold of hits on that out-of-partition node is reached (this means that there is a degree of affinity) - only then to add the node to the partition.

                    The above optimization relies on tunneling calls being (much?) efficient than replicating entire state of a subtree into an node( this also includes operation like flush to prevent changes to the cache, which might be quite expensive)

                    Also the join-partition logic might be made more smarter, to be aware of things like subtree size, hits to other nodes etc - more flexibility

                    • 7. Re: Partitioning in JBoss Cache - JBCACHE-60
                      manik

                      Well, this threshold would be configurable, naturally. Probably relate to a function of number of hits over a given period of time rather than a fixed number of hits, as this needs to decay as well. Perhaps the same half-life parameters used for the degree of participation.

                      Tunneling (ClusteredCacheLoader?) is pretty efficient compared to a state transfer as it only pulls back the contents of individual node as opposed to an entire region.

                      FLUSH blocking the entire cluster for state transfer within a specific Partition would be a problem though, although Bela did mention some improvements in JGroups 2.6 that may be able to deal with this. Bela?

                      • 8. Re: Partitioning in JBoss Cache - JBCACHE-60
                        belaban

                        We can definitely look into a 'partial flush', so we block only the services on the Multiplexer, which are involved in the JOIN and/or state transfer.

                        Vladimir will look into this once he's back from California.

                        • 9. Re: Partitioning in JBoss Cache - JBCACHE-60
                          manik

                          FYI I've updated the wiki with changes to how metadata is replicated. Please have a look and comment.