1 Reply Latest reply on Apr 10, 2014 4:56 AM by rvansa

    CacheStore vs. transactional updates

    mmechell

      Hi all,

       

      i have implemented a custom cache store that uses JPA to persist values to a database using a proper table.

       

      From a functional point of view everything works as expected, however i still have some doubts about performance and consistency.

       

      As stated in the User Guide [1], the store won't be part of the transaction in which data is put in the cache.

      As far as i understand, the store will be updated once the transaction has been committed.

      Now, i guess that the store will not be updated in the scope of a single transaction to the database, but, most probably, without any transactional context ... is this right?

      In this way we cannot do any jdbc-level batching to the database and we would pay a roundtrip for every value to update!

      Needless to say, we could end up with partial flushes to the database happening ...

       

      Are there any workarounds for these problems? Would these problems be there with different cache stores (e.g. LevelDB, etc.)?

       

      BTW, i'm now relying on the following approach as a workaround:

       

      0. Configure Cache with no store (in memory only)

      1. Put all the data in cache without any transactions

      2. Keep track of the number of values put

      3. Do whatever needs to be done with the data (Map/Reduce, DistExec, etc.)

      4. Run a DistTask that flushes the values in the nodes to the database in a single transaction and returns the number of flushed values

      5. Centrally check that the total number of flushed values equals the number of values in 2.

       

      This still leaves me with the problem of dealing with partial updates (when a node fails its flush task execution), but at least it is a better approach performance-wise.

       

      Any suggestions?

       

      Thanks,

      marco

       

      ----

      [1] http://infinispan.org/docs/6.0.x/user_guide/user_guide.html#_cache_loaders_and_transactional_caches

        • 1. Re: CacheStore vs. transactional updates
          rvansa

          Hi Marco,

           

          just to be sure - do you know there's already a JPA store implementation? See https://github.com/infinispan/infinispan/tree/master/persistence/jpa

           

          Regarding TXs: The JPA store write occurs in the second phase of Infinispan TX transaction commit. There's really no space for failures - when the DB transaction fails, you can try again or leave the DB out-of-sync with Infinispan and throw an exception (however, this would not rollback the other writes in the Infinispan TX).

           

          The way how JPA store is designed adheres to expected use case - cache or in-memory data grid with transactions involving few keys (say, tens at most), not flushing entire cache into JPA. You're absolutely right that the performance is not optimal even for this UC as for more writes there's the latency round-trip. I am not 100% sure but async store could deal with this, executing the stores in parallel - however, the level of concurrency would be limited, we still cannot speak about any batching.

           

          I don't think that huge transactions involving updates of millions of entries are something that could be recommended. The whole transaction is sometimes transmitted over network as single message, therefore, it has to be buffered.

           

          However, some decent batching could be handy. You can add a feature request for AdvancedCacheWriter.update(Set<MarshalledEntry<? extends K, ? extends V> entries)