1 Reply Latest reply on Jul 18, 2011 9:04 AM by dan.berindei

    Can I specify (or even find out) which nodes own particular caches?

    alex.heneveld

      For performance reasons I'd like to control which nodes hold particular caches (or keys within a cache), and constrain certain nodes from hosting certain data--in DIST mode of course.

       

      Is this supported and are there examples or instructions I can look at?

       

      I'd also like to confirm I'm looking at the right things for seeing where data is being held, and know when it is being moved on rehash.  The nearest I've found are AdvancedCache.getAffectedNodes(...), which looks promising (and does it include L1 caches which will be invalidated); and CacheEntry{Passivated,Activated}Event, but this I suspect from the javadoc is local to the cache in the same JVM (since it says it gives the cache which generated the event) which would mean a listener at every location to detect the changes (and some careful dodging of race conditions when sewing these results together).

       

      There are two reasons this control would be nice:  one is in a wide-area deployment, we know most data will be local to a particular datacenter; the other is that most pieces of data will be updated by only one location, but read by multiple.  If we can locate the updater with one of the owners, and put the replicants in the same datacenter, then we get some obvious performance wins.  Neither is essential (the fact that put async returns future is great!) but if we can do it, we'd like to of course.

       

      Thanks,

      Alex

        • 1. Re: Can I specify (or even find out) which nodes own particular caches?
          dan.berindei

          Alex Heneveld wrote:

          For performance reasons I'd like to control which nodes hold particular caches (or keys within a cache), and constrain certain nodes from hosting certain data--in DIST mode of course.

           

          We don't support asymmetric caches (yet), so every cache has to exist on all the cluster nodes. The closest thing we have to specifying where a key should be stored is the grouping API.

           

          You could use KeyAffinityService to generate a random key for a specific server and group on that key, but nodes joining and leaving the cluster will sooner or later change the ownership of that key and the Grouper interface shouldn't change its result after a topology change.

           

          I'd also like to confirm I'm looking at the right things for seeing where data is being held, and know when it is being moved on rehash.  The nearest I've found are AdvancedCache.getAffectedNodes(...), which looks promising (and does it include L1 caches which will be invalidated); and CacheEntry{Passivated,Activated}Event, but this I suspect from the javadoc is local to the cache in the same JVM (since it says it gives the cache which generated the event) which would mean a listener at every location to detect the changes (and some careful dodging of race conditions when sewing these results together).

           

          cache.getAdvancedCache().getDistributionManager().getAffectedNodes(k1) won't include L1 caches in its results. Rehashing also won't trigger CacheEntry{Passivated,Activated}Events, those are triggered only when a key is passivated to/activated from a cache store.

           

          We don't have a way of listening on individual keys being rehashed. As a workaround you can listen on TopologyChangedEvent and iterate on all the local keys checking event.getConsistentHashAtStart().locate(key, numOwners).equals(event.getConsistentHashAtEnd().locate(key, numOwners). This is a bit clunky though, so please open a JIRA issue for an individual key rehashed event.

           

          There are two reasons this control would be nice:  one is in a wide-area deployment, we know most data will be local to a particular datacenter; the other is that most pieces of data will be updated by only one location, but read by multiple.  If we can locate the updater with one of the owners, and put the replicants in the same datacenter, then we get some obvious performance wins.  Neither is essential (the fact that put async returns future is great!) but if we can do it, we'd like to of course.

          The current topology-aware consistent hash (TACH) implementation picks owners on as many different sites/racks/machines as possible, but you could extend it to pick owners in a single site.

           

          It would be interesting to extend the grouping API so that you can specify a siteId as the grouping key and always have the primary owner in that site, and I think you could also extend TACH to do that.

          1 of 1 people found this helpful