9 Replies Latest reply on Jan 4, 2006 9:17 AM by manik

    Proposal for performance changes

    belaban

      [posted by Bela on behalf of Brian Dueck]
      Optimize performance of Fqn.toString()
      ======================================
      org.jboss.cache.Fqn
      - added optimization to set stringRepresentation during construction for
      some constructors (avoids unnecessarily looping through all elements in toString() for deeply nested Fqn's)

      General performance and concurrency improvements for TreeNode
      =============================================================
      org.jboss.cache.TreeNode
      - changed dataLock to Object() since none of the ReentranLock methods were
      being used - we're just doing synchronize() on it.
      - improved concurrency of initialization and access of children
      - children()
      - improved concurrency by avoiding unnecessary synchronized() calls
      - initialize children of root node (Fqn.isRoot()==true) node to
      ConcurrentHashMap instead of ConcurrentReaderHashMap
      to allow for greater concurrency of modification of nodes just off
      the root
      - getOrCreateChild() & createChild() methods
      - reworked to avoid making unnecessary synchronized(childrenMutex) calls
      - the strategy is to only lock when we don't find the child

      General performance and concurrency improvements for TransactionEntry
      =====================================================================
      org.jboss.cache.TransactionEntry
      - changed TransactionEntry.locks from List based implementation to
      ConcurrentReaderHashMap. This avoids expensive linear scanning calls
      to locks.contains() in addLock() and also improves concurrency by
      avoiding blocking synchronized() calls where most of the access to
      locks is read-only. I'm using ConcurrentReaderHashMap here more as
      a Set than a Map since there is no value to store. We're really just
      interested in efficient searches in locks and highly concurrent reads.

      - I'm not sure of the impact of this change on
      org.jboss.cache.interceptors.OptimisticLockingInterceptor and
      org.jboss.cache.interceptors.PessimisticLockInterceptor
      which might be relying on some implementation specifics of the order of
      insertion of entries into what getLocks() returns. Seems to me though
      that if they do have some sort of dependency on ordering that there
      should be a better way to deal with this...

      - this change impacts one public method - getLocks() which previously
      returned a java.util.List and now returns a java.util.Set. Only one test
      is affected by this (org.jboss.cache.cache.optimistic.OpLockingInterceptorTest)
      which depends on List.get(int) method that isn't available in Set.

      General performance and concurrency improvements for TransactionTable
      =====================================================================
      org.jboss.cache.TransactionTable
      - changed TransactionTable.tx_map and TransactionTable.txs to
      ConcurrentHashMap to allow for concurrent writers. Since there is only
      one TransactionTable within the TreeCache any thread participating in
      a transaction will need to compete for read and write access.

      General performance and concurrency improvements for TreeCache
      ==============================================================
      org.jboss.cache.TreeCache
      - getCurrentTransaction()
      - removed unnecessary synchronized(tx_table) call as TransactionTable
      handles synchronization by virtue of its ConcurrentHashMap implementation


      Improve performance and concurrency of handling of TreeCacheListeners
      =====================================================================
      org.jboss.cache.TreeCache

      - added concurrency optimizations to differentiate how TreeCache handles
      TreeCacheListeners
      - the root problem is that calling ConcurrentHashMap.keySet().iterator()
      causes synchronization across the ConcurrentHashMap instance. From then on
      using the iterator is highly concurrent and efficient, but obtaining the
      iterator involves expensive locking. Look at ConcurrentHashMap.HashIterator.HashIterator()
      to see the underlying issue.
      - when you couple the above with the fact that the number of listeners
      in the cache is typically very low (normally just 1 for the eviction listener)
      this adds up to a *lot* of overhead and locking each time the TreeCache
      must notify a listener, and for the most part that listener is only the
      EvictionListener
      - here's the changes I made to solve the problem:
      - added TreeCache.setEvictionListener() and modified
      org.jboss.cache.eviction.RegionManager to call
      this vs. TreeCache.addTreeCacheListener. setEvictionListener stores the
      TreeCacheListener in TreeCache .evictionPolicyListener, not the listeners
      Map
      - added TreeCache.hasListeners private boolean variable that's used to see
      if listeners exist before trying to iterate
      - modified the TreeCache.notifyXXX() methods to use
      TreeCache.evictionPolicyListener and hasListeners

      Re-added ability to filter eviction events (performance & functionality)
      ========================================================================
      - In my scenario using JBossCache, I have certain subtrees in the cache that must be treated
      atomically from an eviction perspective. i.e. I can't allow just one branch of the subtree
      to be evicted. The whole subtree needs to be evicted at once, not just piece by piece.

      In JBoss 1.2.3 and prior, it was possible to support this scenario by implementing my own
      eviction policy (CascadingLRUPolicy) by subclassing LRUPolicy and overriding the evict() method.
      Since my "atomic subtrees" always started just off the cache root so I just
      ignored any evict() calls except where Fqn.size() == 1.

      I then made a further optimization by overriding the TreeCacheListener methods implemented on
      LRUPolicy to filter out any eviction events prior to them getting inserted into the bounded
      buffer. The rational was that since I wasn't interested in evicting certain nodes,
      I could save a bunch of time by simply causing JBossCache to ignore those events in the
      first place.

      The result was a big jump in performance and scalability in my tests.

      While it is still possible to implement the evict() portion of my CascadingLRUPolicy, it looks
      like some refactoring and redesign of how eviction policies and the region manager
      interact makes it impossible to implement the filtering optimization.

      So here's what I've done to re-add the functionality based on the current JBoss 1.2.4 (head):

      org.jboss.cache.eviction.EvictionPolicy
      - added a new method called EvictionPolicy.canIgnoreEvent(Fqn)
      - canIgnoreEvent is called by RegionManager.EvictionTreeCacheListener prior to
      adding the event to the region event buffer.
      - if canIgnoreEvent returns true, the event is not added to the region event buffer
      - if canIgnoreEvent returns false (default), the event is added to the region event
      buffer as normal

      org.jboss.cache.eviction.RegionManager
      - added calls to EvictionPolicy.canIgnoreEvent in EvictionTreeCacheListener and
      AOPEvictionTreeCacheListener

      org.jboss.cache.eviction.BaseEvictionPolicy
      - provide default implementation of canIgnoreEvent that always returns false

        • 1. Re: Proposal for performance changes
          belaban

          [reply by Bela]
          > Bela;
          > As promised I've signed the contributor's agreement and have written a summary below of the initial changes that I would like to contribute. I've also attached a patch diff of these changes.
          > Send me an email after you've had a chance to review this information, and then I'll give you a call so we can discuss these items further and so I can pick your brain regarding clustering deadlock.
          >
          > Optimize performance of Fqn.toString()
          > ======================================
          > org.jboss.cache.Fqn
          > - added optimization to set stringRepresentation during construction for
          > some constructors (avoids unnecessarily looping through all elements in toString() for deeply nested Fqn's)



          I'm not sure I like the idea of keeping a string rep around for each Fqn. I made a comment on the forum today

          > General performance and concurrency improvements for TreeNode
          > =============================================================
          > org.jboss.cache.TreeNode
          > - changed dataLock to Object() since none of the ReentranLock methods were
          > being used - we're just doing synchronize() on it.



          OK

          > - improved concurrency of initialization and access of children
          > - children()
          > - improved concurrency by avoiding unnecessary synchronized() calls



          OK

          > - initialize children of root node (Fqn.isRoot()==true) node to
          > ConcurrentHashMap instead of ConcurrentReaderHashMap
          > to allow for greater concurrency of modification of nodes just off
          > the root



          OK

          > - getOrCreateChild() & createChild() methods
          > - reworked to avoid making unnecessary synchronized(childrenMutex) calls
          > - the strategy is to only lock when we don't find the child



          Sounds like double-checked locking, which usually doesn't work. How do you avoid some other
          thread creating a child (or deleting it) concurrently ?

          > General performance and concurrency improvements for TransactionEntry
          > =====================================================================
          > org.jboss.cache.TransactionEntry
          > - changed TransactionEntry.locks from List based implementation to
          > ConcurrentReaderHashMap. This avoids expensive linear scanning calls
          > to locks.contains() in addLock() and also improves concurrency by
          > avoiding blocking synchronized() calls where most of the access to
          > locks is read-only. I'm using ConcurrentReaderHashMap here more as
          > a Set than a Map since there is no value to store. We're really just
          > interested in efficient searches in locks and highly concurrent reads.



          OK - too bad ConcurrentReaderList is missing... CCRHM is quite expensive to create, but since we're using only 1 lock table per cache this should be fine.



          > - I'm not sure of the impact of this change on
          > org.jboss.cache.interceptors.OptimisticLockingInterceptor and
          > org.jboss.cache.interceptors.PessimisticLockInterceptor
          > which might be relying on some implementation specifics of the order of
          > insertion of entries into what getLocks() returns. Seems to me though
          > that if they do have some sort of dependency on ordering that there
          > should be a better way to deal with this...



          I see - so you want to discuss this with Manik on the forum, in some code the locks will need to be released in reverse order of acquisition...


          > - this change impacts one public method - getLocks() which previously
          > returned a java.util.List and now returns a java.util.Set. Only one test
          > is affected by this (org.jboss.cache.cache.optimistic.OpLockingInterceptorTest)
          > which depends on List.get(int) method that isn't available in Set.



          Maybe we could downgrade the List to a Collection then ?

          > General performance and concurrency improvements for TransactionTable
          > =====================================================================
          > org.jboss.cache.TransactionTable
          > - changed TransactionTable.tx_map and TransactionTable.txs to
          > ConcurrentHashMap to allow for concurrent writers. Since there is only
          > one TransactionTable within the TreeCache any thread participating in
          > a transaction will need to compete for read and write access.



          Hmm, I recall I had a reason for using ConcurrentReaderHashMap rather than ConcurrentHashMap...
          Did you measure any performance gains with this change ? My gut feeling is that we should be
          able to quantify performance gains before changing critical code.


          > General performance and concurrency improvements for TreeCache
          > ==============================================================
          > org.jboss.cache.TreeCache
          > - getCurrentTransaction()
          > - removed unnecessary synchronized(tx_table) call as TransactionTable
          > handles synchronization by virtue of its ConcurrentHashMap implementation



          OK. We want to change this anyways by getting the TX only once from the TxManager, and then shipping it with the
          MethodCall (feature of JGroups 2.2.9). At a later point, we want to switch from MethodCall to Invocation, which has payloads.


          > Improve performance and concurrency of handling of TreeCacheListeners
          > =====================================================================
          > org.jboss.cache.TreeCache
          > - added concurrency optimizations to differentiate how TreeCache handles
          > TreeCacheListeners
          > - the root problem is that calling ConcurrentHashMap.keySet().iterator()
          > causes synchronization across the ConcurrentHashMap instance. From then on
          > using the iterator is highly concurrent and efficient, but obtaining the
          > iterator involves expensive locking. Look at ConcurrentHashMap.HashIterator.HashIterator()
          > to see the underlying issue.
          > - when you couple the above with the fact that the number of listeners
          > in the cache is typically very low (normally just 1 for the eviction listener)
          > this adds up to a *lot* of overhead and locking each time the TreeCache
          > must notify a listener, and for the most part that listener is only the
          > EvictionListener
          > - here's the changes I made to solve the problem:
          > - added TreeCache.setEvictionListener() and modified
          > org.jboss.cache.eviction.RegionManager to call
          > this vs. TreeCache.addTreeCacheListener. setEvictionListener stores the
          > TreeCacheListener in TreeCache .evictionPolicyListener, not the listeners
          > Map
          > - added TreeCache.hasListeners private boolean variable that's used to see
          > if listeners exist before trying to iterate
          > - modified the TreeCache.notifyXXX() methods to use
          > TreeCache.evictionPolicyListener and hasListeners



          OK



          > Re-added ability to filter eviction events (performance & functionality)
          > ========================================================================
          > - In my scenario using JBossCache, I have certain subtrees in the cache that must be treated
          > atomically from an eviction perspective. i.e. I can't allow just one branch of the subtree
          > to be evicted. The whole subtree needs to be evicted at once, not just piece by piece.
          >
          > In JBoss 1.2.3 and prior, it was possible to support this scenario by implementing my own
          > eviction policy (CascadingLRUPolicy) by subclassing LRUPolicy and overriding the evict() method. Since my "atomic subtrees" always started just off the cache root so I just
          > ignored any evict() calls except where Fqn.size() == 1.
          > I then made a further optimization by overriding the TreeCacheListener methods implemented on
          > LRUPolicy to filter out any eviction events prior to them getting inserted into the bounded
          > buffer. The rational was that since I wasn't interested in evicting certain nodes,
          > I could save a bunch of time by simply causing JBossCache to ignore those events in the
          > first place.
          >
          > The result was a big jump in performance and scalability in my tests.
          >
          > While it is still possible to implement the evict() portion of my CascadingLRUPolicy, it looks
          > like some refactoring and redesign of how eviction policies and the region manager
          > interact makes it impossible to implement the filtering optimization.
          >
          > So here's what I've done to re-add the functionality based on the current JBoss 1.2.4 (head):
          >
          > org.jboss.cache.eviction.EvictionPolicy
          > - added a new method called EvictionPolicy.canIgnoreEvent(Fqn)
          > - canIgnoreEvent is called by RegionManager.EvictionTreeCacheListener prior to
          > adding the event to the region event buffer.
          > - if canIgnoreEvent returns true, the event is not added to the region event buffer
          > - if canIgnoreEvent returns false (default), the event is added to the region event
          > buffer as normal
          > org.jboss.cache.eviction.RegionManager
          > - added calls to EvictionPolicy.canIgnoreEvent in EvictionTreeCacheListener and
          > AOPEvictionTreeCacheListener
          > org.jboss.cache.eviction.BaseEvictionPolicy
          > - provide default implementation of canIgnoreEvent that always returns false
          >



          Ben ?

          I think we recently added eviction of entire subtrees as a feature, or is it on the roadmap ?

          • 2. Re: Proposal for performance changes
            manik

             


            > General performance and concurrency improvements for TransactionEntry
            > =====================================================================
            > org.jboss.cache.TransactionEntry
            > - changed TransactionEntry.locks from List based implementation to
            > ConcurrentReaderHashMap. This avoids expensive linear scanning calls
            > to locks.contains() in addLock() and also improves concurrency by
            > avoiding blocking synchronized() calls where most of the access to
            > locks is read-only. I'm using ConcurrentReaderHashMap here more as
            > a Set than a Map since there is no value to store. We're really just
            > interested in efficient searches in locks and highly concurrent reads.


            OK - too bad ConcurrentReaderList is missing... CCRHM is quite expensive to create, but since we're using only 1 lock table per cache this should be fine.


            CCRHM and CCHM are just convenience classes around the SyncMap class. Similar to SyncMap, there is also a SyncList class which lets you create a concurrent reader list by passing in a standard list instance and a read/write lock.

            List list = new SyncList(new ArrayList(), new FIFOReadWriteLock());
            




            • 3. Re: Proposal for performance changes
              manik

               


              > - I'm not sure of the impact of this change on
              > org.jboss.cache.interceptors.OptimisticLockingInterceptor and
              > org.jboss.cache.interceptors.PessimisticLockInterceptor
              > which might be relying on some implementation specifics of the order of
              > insertion of entries into what getLocks() returns. Seems to me though
              > that if they do have some sort of dependency on ordering that there
              > should be a better way to deal with this...



              I see - so you want to discuss this with Manik on the forum, in some code the locks will need to be released in reverse order of acquisition...


              At the moment both interceptors expect to receive a list of locks in the order in which they are created. I'll look into whether this is a necessity or whether locks can be removed in any order...



              • 4. Re: Proposal for performance changes
                manik

                 


                > - this change impacts one public method - getLocks() which previously
                > returned a java.util.List and now returns a java.util.Set. Only one test
                > is affected by this (org.jboss.cache.cache.optimistic.OpLockingInterceptorTest)
                > which depends on List.get(int) method that isn't available in Set.



                Maybe we could downgrade the List to a Collection then ?


                The only place I see us using getLocks().get(i) is here:

                 for (Iterator it = workspace.getNodes().values().iterator(); it.hasNext();)
                 {
                 DataNode node = ((WorkspaceNode) it.next()).getNode();
                 assertEquals(node.getLock(), entry.getLocks().get(i));
                 i++;
                 }
                


                This could easily be replaced with a block like the following, to achieve the same effect:

                 for (Iterator it = workspace.getNodes().values().iterator(); it.hasNext();)
                 {
                 DataNode node = ((WorkspaceNode) it.next()).getNode();
                 assertTrue(entry.getLocks().contains(node.getLock()));
                 }
                
                


                Not a lot of sense in downgrading this to a collection, IMO.

                • 5. Re: Proposal for performance changes
                  manik

                   


                  > General performance and concurrency improvements for TreeCache
                  > ==============================================================
                  > org.jboss.cache.TreeCache
                  > - getCurrentTransaction()
                  > - removed unnecessary synchronized(tx_table) call as TransactionTable
                  > handles synchronization by virtue of its ConcurrentHashMap implementation



                  OK. We want to change this anyways by getting the TX only once from the TxManager, and then shipping it with the
                  MethodCall (feature of JGroups 2.2.9). At a later point, we want to switch from MethodCall to Invocation, which has payloads.


                  With 1.3, we will be doing something similar anyway - using an InvocationContext object, that sits in ThreadLocal. The first interceptor in the chain looks up the TX and GTX and places these in the InvocationContext for other interceptors to use.

                  This will, as Bela said, eventually be upgraded to a part of the MethodCall payload so no lookup will happen at all, and eventually grow to an Invocation object that is pushed out on the wire.

                  • 6. Re: Proposal for performance changes
                    manik

                    Re; my comment about getLocks() returning a List above, the unit test checks the order of the locks in the lock table. We could only use contains() in the assertion instead once we're sure we do not require lock ordering in the interceptors.

                    I think we should move this to a separate discussion thread as this one will inevitably get cluttered! :)

                    • 7. Re: Proposal for performance changes
                      belaban

                      The SyncMap is afaik not the same as ConcurrentHashMap, which has a number of different buckets, each with its own lock. So CCHM is much more efficient than SyncMap.
                      However, using CCHM as a list, in this case we should check (a.k.a. measure) new SyncList(list) against CCHM before we switch to it.

                      • 8. Re: Proposal for performance changes
                        belaban

                        Wrt locks: so the order is important, and I think we need to release the locks in reverse order of acquisition.
                        I would like to see performance numbers that warrant a switch in the first place.

                        • 9. Re: Proposal for performance changes
                          manik