9 Replies Latest reply on Jan 4, 2006 9:17 AM by manik

Proposal for performance changes

belaban Jan 4, 2006 3:18 AM

[posted by Bela on behalf of Brian Dueck]
Optimize performance of Fqn.toString()
======================================
org.jboss.cache.Fqn
- added optimization to set stringRepresentation during construction for
some constructors (avoids unnecessarily looping through all elements in toString() for deeply nested Fqn's)

General performance and concurrency improvements for TreeNode
=============================================================
org.jboss.cache.TreeNode
- changed dataLock to Object() since none of the ReentranLock methods were
being used - we're just doing synchronize() on it.
- improved concurrency of initialization and access of children
- children()
- improved concurrency by avoiding unnecessary synchronized() calls
- initialize children of root node (Fqn.isRoot()==true) node to
ConcurrentHashMap instead of ConcurrentReaderHashMap
to allow for greater concurrency of modification of nodes just off
the root
- getOrCreateChild() & createChild() methods
- reworked to avoid making unnecessary synchronized(childrenMutex) calls
- the strategy is to only lock when we don't find the child

General performance and concurrency improvements for TransactionEntry
=====================================================================
org.jboss.cache.TransactionEntry
- changed TransactionEntry.locks from List based implementation to
ConcurrentReaderHashMap. This avoids expensive linear scanning calls
to locks.contains() in addLock() and also improves concurrency by
avoiding blocking synchronized() calls where most of the access to
locks is read-only. I'm using ConcurrentReaderHashMap here more as
a Set than a Map since there is no value to store. We're really just
interested in efficient searches in locks and highly concurrent reads.

- I'm not sure of the impact of this change on
org.jboss.cache.interceptors.OptimisticLockingInterceptor and
org.jboss.cache.interceptors.PessimisticLockInterceptor
which might be relying on some implementation specifics of the order of
insertion of entries into what getLocks() returns. Seems to me though
that if they do have some sort of dependency on ordering that there
should be a better way to deal with this...

- this change impacts one public method - getLocks() which previously
returned a java.util.List and now returns a java.util.Set. Only one test
is affected by this (org.jboss.cache.cache.optimistic.OpLockingInterceptorTest)
which depends on List.get(int) method that isn't available in Set.

General performance and concurrency improvements for TransactionTable
=====================================================================
org.jboss.cache.TransactionTable
- changed TransactionTable.tx_map and TransactionTable.txs to
ConcurrentHashMap to allow for concurrent writers. Since there is only
one TransactionTable within the TreeCache any thread participating in
a transaction will need to compete for read and write access.

General performance and concurrency improvements for TreeCache
==============================================================
org.jboss.cache.TreeCache
- getCurrentTransaction()
- removed unnecessary synchronized(tx_table) call as TransactionTable
handles synchronization by virtue of its ConcurrentHashMap implementation

Improve performance and concurrency of handling of TreeCacheListeners
=====================================================================
org.jboss.cache.TreeCache

- added concurrency optimizations to differentiate how TreeCache handles
TreeCacheListeners
- the root problem is that calling ConcurrentHashMap.keySet().iterator()
causes synchronization across the ConcurrentHashMap instance. From then on
using the iterator is highly concurrent and efficient, but obtaining the
iterator involves expensive locking. Look at ConcurrentHashMap.HashIterator.HashIterator()
to see the underlying issue.
- when you couple the above with the fact that the number of listeners
in the cache is typically very low (normally just 1 for the eviction listener)
this adds up to a *lot* of overhead and locking each time the TreeCache
must notify a listener, and for the most part that listener is only the
EvictionListener
- here's the changes I made to solve the problem:
- added TreeCache.setEvictionListener() and modified
org.jboss.cache.eviction.RegionManager to call
this vs. TreeCache.addTreeCacheListener. setEvictionListener stores the
TreeCacheListener in TreeCache .evictionPolicyListener, not the listeners
Map
- added TreeCache.hasListeners private boolean variable that's used to see
if listeners exist before trying to iterate
- modified the TreeCache.notifyXXX() methods to use
TreeCache.evictionPolicyListener and hasListeners

Re-added ability to filter eviction events (performance & functionality)
========================================================================
- In my scenario using JBossCache, I have certain subtrees in the cache that must be treated
atomically from an eviction perspective. i.e. I can't allow just one branch of the subtree
to be evicted. The whole subtree needs to be evicted at once, not just piece by piece.

In JBoss 1.2.3 and prior, it was possible to support this scenario by implementing my own
eviction policy (CascadingLRUPolicy) by subclassing LRUPolicy and overriding the evict() method.
Since my "atomic subtrees" always started just off the cache root so I just
ignored any evict() calls except where Fqn.size() == 1.

I then made a further optimization by overriding the TreeCacheListener methods implemented on
LRUPolicy to filter out any eviction events prior to them getting inserted into the bounded
buffer. The rational was that since I wasn't interested in evicting certain nodes,
I could save a bunch of time by simply causing JBossCache to ignore those events in the
first place.

The result was a big jump in performance and scalability in my tests.

While it is still possible to implement the evict() portion of my CascadingLRUPolicy, it looks
like some refactoring and redesign of how eviction policies and the region manager
interact makes it impossible to implement the filtering optimization.

So here's what I've done to re-add the functionality based on the current JBoss 1.2.4 (head):

org.jboss.cache.eviction.EvictionPolicy
- added a new method called EvictionPolicy.canIgnoreEvent(Fqn)
- canIgnoreEvent is called by RegionManager.EvictionTreeCacheListener prior to
adding the event to the region event buffer.
- if canIgnoreEvent returns true, the event is not added to the region event buffer
- if canIgnoreEvent returns false (default), the event is added to the region event
buffer as normal

org.jboss.cache.eviction.RegionManager
- added calls to EvictionPolicy.canIgnoreEvent in EvictionTreeCacheListener and
AOPEvictionTreeCacheListener

org.jboss.cache.eviction.BaseEvictionPolicy
- provide default implementation of canIgnoreEvent that always returns false

1. Re: Proposal for performance changes

belaban Jan 4, 2006 3:19 AM (in response to belaban)

[reply by Bela]
> Bela;
> As promised I've signed the contributor's agreement and have written a summary below of the initial changes that I would like to contribute. I've also attached a patch diff of these changes.
> Send me an email after you've had a chance to review this information, and then I'll give you a call so we can discuss these items further and so I can pick your brain regarding clustering deadlock.
>
> Optimize performance of Fqn.toString()
> ======================================
> org.jboss.cache.Fqn
> - added optimization to set stringRepresentation during construction for
> some constructors (avoids unnecessarily looping through all elements in toString() for deeply nested Fqn's)

I'm not sure I like the idea of keeping a string rep around for each Fqn. I made a comment on the forum today

> General performance and concurrency improvements for TreeNode
> =============================================================
> org.jboss.cache.TreeNode
> - changed dataLock to Object() since none of the ReentranLock methods were
> being used - we're just doing synchronize() on it.

OK

> - improved concurrency of initialization and access of children
> - children()
> - improved concurrency by avoiding unnecessary synchronized() calls

OK

> - initialize children of root node (Fqn.isRoot()==true) node to
> ConcurrentHashMap instead of ConcurrentReaderHashMap
> to allow for greater concurrency of modification of nodes just off
> the root

OK

> - getOrCreateChild() & createChild() methods
> - reworked to avoid making unnecessary synchronized(childrenMutex) calls
> - the strategy is to only lock when we don't find the child

Sounds like double-checked locking, which usually doesn't work. How do you avoid some other
thread creating a child (or deleting it) concurrently ?

> General performance and concurrency improvements for TransactionEntry
> =====================================================================
> org.jboss.cache.TransactionEntry
> - changed TransactionEntry.locks from List based implementation to
> ConcurrentReaderHashMap. This avoids expensive linear scanning calls
> to locks.contains() in addLock() and also improves concurrency by
> avoiding blocking synchronized() calls where most of the access to
> locks is read-only. I'm using ConcurrentReaderHashMap here more as
> a Set than a Map since there is no value to store. We're really just
> interested in efficient searches in locks and highly concurrent reads.

OK - too bad ConcurrentReaderList is missing... CCRHM is quite expensive to create, but since we're using only 1 lock table per cache this should be fine.

> - I'm not sure of the impact of this change on
> org.jboss.cache.interceptors.OptimisticLockingInterceptor and
> org.jboss.cache.interceptors.PessimisticLockInterceptor
> which might be relying on some implementation specifics of the order of
> insertion of entries into what getLocks() returns. Seems to me though
> that if they do have some sort of dependency on ordering that there
> should be a better way to deal with this...

I see - so you want to discuss this with Manik on the forum, in some code the locks will need to be released in reverse order of acquisition...

> - this change impacts one public method - getLocks() which previously
> returned a java.util.List and now returns a java.util.Set. Only one test
> is affected by this (org.jboss.cache.cache.optimistic.OpLockingInterceptorTest)
> which depends on List.get(int) method that isn't available in Set.

Maybe we could downgrade the List to a Collection then ?

> General performance and concurrency improvements for TransactionTable
> =====================================================================
> org.jboss.cache.TransactionTable
> - changed TransactionTable.tx_map and TransactionTable.txs to
> ConcurrentHashMap to allow for concurrent writers. Since there is only
> one TransactionTable within the TreeCache any thread participating in
> a transaction will need to compete for read and write access.

Hmm, I recall I had a reason for using ConcurrentReaderHashMap rather than ConcurrentHashMap...
Did you measure any performance gains with this change ? My gut feeling is that we should be
able to quantify performance gains before changing critical code.

> General performance and concurrency improvements for TreeCache
> ==============================================================
> org.jboss.cache.TreeCache
> - getCurrentTransaction()
> - removed unnecessary synchronized(tx_table) call as TransactionTable
> handles synchronization by virtue of its ConcurrentHashMap implementation

OK. We want to change this anyways by getting the TX only once from the TxManager, and then shipping it with the
MethodCall (feature of JGroups 2.2.9). At a later point, we want to switch from MethodCall to Invocation, which has payloads.

> Improve performance and concurrency of handling of TreeCacheListeners
> =====================================================================
> org.jboss.cache.TreeCache
> - added concurrency optimizations to differentiate how TreeCache handles
> TreeCacheListeners
> - the root problem is that calling ConcurrentHashMap.keySet().iterator()
> causes synchronization across the ConcurrentHashMap instance. From then on
> using the iterator is highly concurrent and efficient, but obtaining the
> iterator involves expensive locking. Look at ConcurrentHashMap.HashIterator.HashIterator()
> to see the underlying issue.
> - when you couple the above with the fact that the number of listeners
> in the cache is typically very low (normally just 1 for the eviction listener)
> this adds up to a *lot* of overhead and locking each time the TreeCache
> must notify a listener, and for the most part that listener is only the
> EvictionListener
> - here's the changes I made to solve the problem:
> - added TreeCache.setEvictionListener() and modified
> org.jboss.cache.eviction.RegionManager to call
> this vs. TreeCache.addTreeCacheListener. setEvictionListener stores the
> TreeCacheListener in TreeCache .evictionPolicyListener, not the listeners
> Map
> - added TreeCache.hasListeners private boolean variable that's used to see
> if listeners exist before trying to iterate
> - modified the TreeCache.notifyXXX() methods to use
> TreeCache.evictionPolicyListener and hasListeners

OK

> Re-added ability to filter eviction events (performance & functionality)
> ========================================================================
> - In my scenario using JBossCache, I have certain subtrees in the cache that must be treated
> atomically from an eviction perspective. i.e. I can't allow just one branch of the subtree
> to be evicted. The whole subtree needs to be evicted at once, not just piece by piece.
>
> In JBoss 1.2.3 and prior, it was possible to support this scenario by implementing my own
> eviction policy (CascadingLRUPolicy) by subclassing LRUPolicy and overriding the evict() method. Since my "atomic subtrees" always started just off the cache root so I just
> ignored any evict() calls except where Fqn.size() == 1.
> I then made a further optimization by overriding the TreeCacheListener methods implemented on
> LRUPolicy to filter out any eviction events prior to them getting inserted into the bounded
> buffer. The rational was that since I wasn't interested in evicting certain nodes,
> I could save a bunch of time by simply causing JBossCache to ignore those events in the
> first place.
>
> The result was a big jump in performance and scalability in my tests.
>
> While it is still possible to implement the evict() portion of my CascadingLRUPolicy, it looks
> like some refactoring and redesign of how eviction policies and the region manager
> interact makes it impossible to implement the filtering optimization.
>
> So here's what I've done to re-add the functionality based on the current JBoss 1.2.4 (head):
>
> org.jboss.cache.eviction.EvictionPolicy
> - added a new method called EvictionPolicy.canIgnoreEvent(Fqn)
> - canIgnoreEvent is called by RegionManager.EvictionTreeCacheListener prior to
> adding the event to the region event buffer.
> - if canIgnoreEvent returns true, the event is not added to the region event buffer
> - if canIgnoreEvent returns false (default), the event is added to the region event
> buffer as normal
> org.jboss.cache.eviction.RegionManager
> - added calls to EvictionPolicy.canIgnoreEvent in EvictionTreeCacheListener and
> AOPEvictionTreeCacheListener
> org.jboss.cache.eviction.BaseEvictionPolicy
> - provide default implementation of canIgnoreEvent that always returns false
>

Ben ?

I think we recently added eviction of entire subtrees as a feature, or is it on the roadmap ?
Actions
2. Re: Proposal for performance changes

manik Jan 4, 2006 5:42 AM (in response to belaban)
> General performance and concurrency improvements for TransactionEntry
> =====================================================================
> org.jboss.cache.TransactionEntry
> - changed TransactionEntry.locks from List based implementation to
> ConcurrentReaderHashMap. This avoids expensive linear scanning calls
> to locks.contains() in addLock() and also improves concurrency by
> avoiding blocking synchronized() calls where most of the access to
> locks is read-only. I'm using ConcurrentReaderHashMap here more as
> a Set than a Map since there is no value to store. We're really just
> interested in efficient searches in locks and highly concurrent reads.

OK - too bad ConcurrentReaderList is missing... CCRHM is quite expensive to create, but since we're using only 1 lock table per cache this should be fine.

CCRHM and CCHM are just convenience classes around the SyncMap class. Similar to SyncMap, there is also a SyncList class which lets you create a concurrent reader list by passing in a standard list instance and a read/write lock.

List list = new SyncList(new ArrayList(), new FIFOReadWriteLock());
Actions
3. Re: Proposal for performance changes

manik Jan 4, 2006 5:46 AM (in response to belaban)

> - I'm not sure of the impact of this change on
> org.jboss.cache.interceptors.OptimisticLockingInterceptor and
> org.jboss.cache.interceptors.PessimisticLockInterceptor
> which might be relying on some implementation specifics of the order of
> insertion of entries into what getLocks() returns. Seems to me though
> that if they do have some sort of dependency on ordering that there
> should be a better way to deal with this...

I see - so you want to discuss this with Manik on the forum, in some code the locks will need to be released in reverse order of acquisition...

At the moment both interceptors expect to receive a list of locks in the order in which they are created. I'll look into whether this is a necessity or whether locks can be removed in any order...
Actions
4. Re: Proposal for performance changes

manik Jan 4, 2006 5:53 AM (in response to belaban)
> - this change impacts one public method - getLocks() which previously
> returned a java.util.List and now returns a java.util.Set. Only one test
> is affected by this (org.jboss.cache.cache.optimistic.OpLockingInterceptorTest)
> which depends on List.get(int) method that isn't available in Set.

Maybe we could downgrade the List to a Collection then ?

The only place I see us using getLocks().get(i) is here:

for (Iterator it = workspace.getNodes().values().iterator(); it.hasNext();) { DataNode node = ((WorkspaceNode) it.next()).getNode(); assertEquals(node.getLock(), entry.getLocks().get(i)); i++; }

This could easily be replaced with a block like the following, to achieve the same effect:

for (Iterator it = workspace.getNodes().values().iterator(); it.hasNext();) { DataNode node = ((WorkspaceNode) it.next()).getNode(); assertTrue(entry.getLocks().contains(node.getLock())); }

Not a lot of sense in downgrading this to a collection, IMO.
Actions
5. Re: Proposal for performance changes

manik Jan 4, 2006 5:56 AM (in response to belaban)

> General performance and concurrency improvements for TreeCache
> ==============================================================
> org.jboss.cache.TreeCache
> - getCurrentTransaction()
> - removed unnecessary synchronized(tx_table) call as TransactionTable
> handles synchronization by virtue of its ConcurrentHashMap implementation

OK. We want to change this anyways by getting the TX only once from the TxManager, and then shipping it with the
MethodCall (feature of JGroups 2.2.9). At a later point, we want to switch from MethodCall to Invocation, which has payloads.

With 1.3, we will be doing something similar anyway - using an InvocationContext object, that sits in ThreadLocal. The first interceptor in the chain looks up the TX and GTX and places these in the InvocationContext for other interceptors to use.

This will, as Bela said, eventually be upgraded to a part of the MethodCall payload so no lookup will happen at all, and eventually grow to an Invocation object that is pushed out on the wire.
Actions
6. Re: Proposal for performance changes

manik Jan 4, 2006 6:44 AM (in response to belaban)

Re; my comment about getLocks() returning a List above, the unit test checks the order of the locks in the lock table. We could only use contains() in the assertion instead once we're sure we do not require lock ordering in the interceptors.

I think we should move this to a separate discussion thread as this one will inevitably get cluttered! :)
Actions
7. Re: Proposal for performance changes

belaban Jan 4, 2006 8:12 AM (in response to belaban)

The SyncMap is afaik not the same as ConcurrentHashMap, which has a number of different buckets, each with its own lock. So CCHM is much more efficient than SyncMap.
However, using CCHM as a list, in this case we should check (a.k.a. measure) new SyncList(list) against CCHM before we switch to it.
Actions
8. Re: Proposal for performance changes

belaban Jan 4, 2006 8:13 AM (in response to belaban)

Wrt locks: so the order is important, and I think we need to release the locks in reverse order of acquisition.
I would like to see performance numbers that warrant a switch in the first place.
Actions
9. Re: Proposal for performance changes

manik Jan 4, 2006 9:17 AM (in response to belaban)

Wrt locks, see my comments on http://www.jboss.com/index.html?module=bb&op=viewtopic&p=3915506#3915506
Actions

Go to original post