5 Replies Latest reply on Jun 3, 2008 12:04 PM by brian.stansberry

    state transfer optimizations - persisitng state with JDBCCac

    mircea.markus

      Background: When integrating persistent state, individual insert operations are being triggered on JDBCCacheLoader. This might be optimized by batching these insert calls, though gaining significant performance.

      Now this is the short story, the long one is as that it's not only inserts taking place when transferring persistent state. The general algorithm of transferring state for a node '/a/b/c' is:
      a)if ('/a/b/c' does not exists) add nodes /a, /a/b and /a/b/c (ones that !exist)
      else //it exists
      b) replace ('/a/b/c') existing attributes with the new ones.
      in the case of a) there are 2*Fqn.size db interactions (an exists and an insert for each new node) + an initial query to check whether the node exists
      in the case of b) there are 2 query only (initial one for existence and one for update)

      Considering that this is done for each transfered node, performance is low.

      Another approach would be to:
      - read all the pre-existing state in memory (1 query)
      - do a merge in memory
      - persist all merged state in one batch ( 2 batch operations are actually necessary as some updates also need to be performed)
      At a glance this should be much efficient as it reduces the number of DB interactions to 3.

      Issues that might be with the new approach:
      - all pre-existing state would be loaded in memory for the merge. High memory consumption; possible OOM
      - if there are too many statements in the batch, there might be a failure at commit time: I remember having problems on Oracle with 10k stamens in a batch. This might be fixed though, splinting into 1k (configurable) sized batches
      - others?



        • 1. Re: state transfer optimizations - persisitng state with JDB
          mircea.markus
          • 2. Re: state transfer optimizations - persisitng state with JDB
            manik

            I think the biggest problem will be the in-memory merge and OOMs. One of the primary reasons to use a cache loader is because you have more state in the cache than memory. :-)

            Lets think why we bother with the !exists test in memory first. If this is just an optimisation so we don't have to write the state to the DB when the state already exists, then in this case the optimisation doesn't help but hinder. We should just write *everything* to the CL.

            The other reason why you may not want to write everything to the CL is if you are using passivation. Then, stuff in-memory should not be in the CL.

            So, perhaps this is what we need to do (if we are using a JDBC cache loader only):

            1. If !using passivation, write all state to the DB, regardless of whether it exists in memory or not.
            2. If using passivation, when attempting to deserialize state to put into your batch, ignore statements which pertain to Fqns that are in memory.

            I agree about having configurable batch size limits, with perhaps a 1k batch size default.

            • 3. Re: state transfer optimizations - persisitng state with JDB
              brian.stansberry

              Isn't the idea of state transfer that the recipient has no state in the region being transferred; i.e. it's a complete replace?

              If so, then for sure there's no need for an exists check.

              And, if so, then in the passivation case it's the responsibility of the sender to properly segregate the in-memory nodes from the passivated nodes (i.e. it's a bug if it isn't that way already). So, no need to check if a node in the persistent state is in memory before writing it.

              • 4. Re: state transfer optimizations - persisitng state with JDB
                manik

                There could be state on the recipient, since in-memory state is integrated before persistent state. This is why I suspected that the exists() check is an optimisation, since integrating in-memory state will result in a cache loader put() as well.

                And as for passivation, if the state is in-memory (after integrating in-memory state), it won't (and shouldn't) be in the cacheloader.

                • 5. Re: state transfer optimizations - persisitng state with JDB
                  brian.stansberry

                  Ah, now I understand. Yeah, that makes sense.