2 Replies Latest reply on Sep 19, 2016 9:06 AM by illia.khokholkov

    [ModeShape 5.x] Caching strategy

    illia.khokholkov

      I am struggling to find a description of the caching mechanism used by ModeShape 5.x. The documentation [1] is great, but unfortunately it did not help to get a gist of caching policies. I was able to find only the notes presented below, which are certainly helpful, but not complete.

       

      Clustering [2]:

      A cluster in this model can have any number of members each with it's own in-memory cache but all using a shared database for persisting and reading the content.

      Persistence [3]:

      ModeShape 3 and 4 used, in additional to the main Infinispan cache which stored the repository data, a second, local, in-memory cache, for each repository workspace in order to provider fast read access to frequently used nodes. This cache exists solely for performance reasons and ModeShape 5 preserves the concept, using a LRU ConcurrentMap implementation.

      Repository and Session [4]:

      ModeShape uses the copy-on-write behavior. Note that this is different than ModeShape 2.x, which used copy-on-read.

       

      Knowing that my understanding of caching behavior is practically non-existent, please consider the scenario presented below.

       

      Cluster members: M1, M2.

      Application consumers: C1, C2.

      Node: N.

       

      1. C1 creates N in M1. The N gets persisted in DB, the cache in M1 is now aware of N and a notification about new node creation is sent to M2 via JGroups.
      2. M2 sees a message from M1, however, its cache does not contain N, so nothing has to be refreshed.
      3. C2 gets N, but does so using M2. Since this is a read-only operation, no notifications to other members get sent.
      4. M2 loads N in its cache.
      5. C1 updates N in M1 by changing some of its properties. A change notification is getting ready to be sent to M2.
      6. M1 loses network connectivity before JGroups message about node update gets sent to M2.
      7. C2 gets N from M2, expecting to see changes made by C1, however, nothing changed, because N was already in the cache of M2 and no update notifications were received.
      8. M2 has a stale data now, i.e. N is no longer current. Is M2 cache entry for N ever going to be updated, assuming M1 no longer updates N (so that update notification does not get sent, even if network connectivity on M1 is re-established) and M2 only reads N?

       

      What are the caching policies regarding the following?

       

      1. Adding a new node to the cache (i.e. when exactly a new entry gets added).
      2. Removing a node from the cache (i.e. under what conditions an entry gets removed).
      3. Refreshing an already cached node (i.e. when an entry gets refreshed, e.g., periodically, on write, etc.).
      4. There is a property called "cacheSize" under "workspaces" entry in repository JSON configuration file. Is it related to the caching of JCR nodes a consumer directly works with, or is it something related to internal caching done by ModeShape under the hood for some kind of system nodes? Furthermore, if that value is set to 0, is it effectively cancels the caching, i.e. forces to always read from the underlying DB?

       

      Many thanks in advance, any help is greatly appreciated. My apologies if I missed the part of the official documentation that explains exactly what I want to know about caching.

       

      [1] Home - ModeShape 5 - Project Documentation Editor

      [2] Clustering - ModeShape 5 - Project Documentation Editor

      [3] Persistence - ModeShape 5 - Project Documentation Editor

      [4] Repository and Session - ModeShape 5 - Project Documentation Editor

        • 1. Re: [ModeShape 5.x] Caching strategy
          hchiorean

          The answer is that there are no "caching policies" in ModeShape 5. There is only a simple in-memory cache which uses a built-in LRU eviction policy for each repository workspace. This is what the "cacheSize" entry controls - how many nodes can be kept in memory for each workspace.

           

          Your scenario describes a network partition and if that happens, not only you will get stale data on M2 but prior to ModeShape 5.2 (which should hopefully be released soon) you could get data corruption since global locking relies entirely on JGroups. This is described in the ModeShape clustering documentation: Clustering - ModeShape 5 - Project Documentation Editor .

           

          Once we release 5.2, you'll have the option of using DB locking as opposed to JGroups locking which should ensure that at least data doesn't get corrupted since DB locks "should be" globally exclusive. Split brain scenarios will still be subject to stale reads (because we still rely on JGroups to send messages in order to clear the caches) but should prevent write data corruption since prior to writing a node any cluster node will read that latest persisted data. You could prevent stale reads by setting "cacheSize" to 0 which means that each read will always load data from the DB, but this may impact performance.

           

          The bottom line here is that unfortunately ModeShape is not a highly scalable, partition tolerant storage system - it's simply not possible to have this and still cover all the JCR rules and constraints. So ideally before considering horizontal scaling, you should consider vertical scaling first.

          • 2. Re: [ModeShape 5.x] Caching strategy
            illia.khokholkov

            Thank you, this answers all questions I had regarding caching in ModeShape 5.x.