I should add as well, that my tests are running on HP-UX, which has a particular "feature" of being extremely inefficient when multiple threads are contending for the same lock.
On other platforms (Solaris, Win32), the locking is there, but not quite as noticeable.
I am not sure how much we can do about this? On one hand, you can avoid this by creating multiple roots, e.g., /root1, /root2, etc to parallize the access.
But if you can't, that's the price you pay vs. if you have a single hashmap, I guess.
Any suggestion is welcome though.
Unfortunately the problem isn't solved by having multiple root nodes (e.g. /root1, /root2) as you suggest because the problem is with TreeCache.root.
If I create /root1 and /root2, these are still children of TreeCache.root. So, whenever I try to do anything to the cache, it means a mandatory touch of the "/" root node with associated locking.
I'm experimenting with several changes right now that appear to be safe.
First change I'm trying involves switching Node.children to a ConcurrentHashMap and then removing the use of synchronized(childrenMutex).
There might be a more clever way to do this though that just treats TreeCache.root as a special case with respect to locking. I'm imagining employing slightly different locking semantics favouring greater concurrency being applied to TreeCache.root as compared to all other nodes.
I'll look into this. Can you create a JIRA issue ?
Yes, we could treat 'root' as a special case. However, we have to take the case into account where someone does a remove("/"), then we do need a write-lock.
In any case, acquiring read-locks should now be blindingly fast, even on root, so I don't see why root should be a hotspot.
Node.children is a ConcurrentReaderHashMap, you want to switch it to ConcurrentHashMap ? Do you envisage many possibly concurrent writes ?
I don't know why Node.data is not a ConcurrentHashMap, but rather a simple HashMap. I think I looked at this, and had a reason. But please mention all of this in the JIRA isue, and I'll revisit this.
One thing I thought about is to have multiple root, so
/a/b/c and /d/e/f would *not* have the same root ("/"), but "/a" and "/d" would be their own different roots. Access to /a and /d would be concurrent, not governed by the same lock.
However, this entails more complexity: e.g. remove("/" would have to write-lock all root nodes. get(X) might also be a bit more costly.
I think we should try the ConcurrentHashMap route first and see where it gets us.
Do you have perf numbers that show "/" is a *real* problem ?
I'll open a Jira issue and mention all the details there.
It is definately a real issue and showing up pretty clearly on our performance bottleneck analysis. We're seeing many threads waiting on getOrCreateChild() for the same object (root). As a result, we can't scale past more than 4 CPUs or so with sustained performance stuck in the 150-175 TPS range (peeks are ~200 TPS).
As for remove("/") - should that even be supported? I would have thought that "/" should thrown an exception if you attempt to remove.
The rationale for switching to a ConncurrentHashMap is that in my scenarios I do have many concurrent writers that are creating children off of root. The grandchildren of root will rarely have concurrent writers but will have concurrent readers.
Yes, please create a JIRA issue and I'll look into it.
I have created a JIRA issue where I'll look at whether we can replace HashMap with ConcurrentHashMap for Node.data and Node.children. I recall there is a reason why I haven't done it yet, so this needs some thought
It would be nice if you could add that stress test into the JBoss CVS, under CVS module "JBossCache": org.jboss.cache.tests, under the "perf" or "stress" package.
I would like to see some numbers first, before I start changing the critical class Node.
I've been working on just such a test case now that simulates how our app is using/stressing JBossCache. I agree that concrete numbers based on a clear test case would be needed to justify any changes to Node and quantify the benefits.
The test case I have now shows it pretty clearly within a single VM (LOCAL cache mode). I'm just tweaking things now to get similar results across a cluster.
Stay tuned, and in the coming days I'll provide that test case and suggested changes to Node.
I'm presently evaluating JBoss Cache for a large system state replication mechanism. However, we're looking at a cache where certain sub-trees may change frequently wheras others may stay fairly static for a long time but may mutate rapidly in certain periods. As such, this problem seemed interesting.
We're looking at a multi-CPU setup. If all sub-trees are contenting for a write lock on the root node we may have to split the cache into several instances, which is certainly doable, but also less appealing.
I found the JIRA issue but could not figure out what the resolution was. Can you advice me if this is still a problem?
There have been a number of changes made since my post that dramatically improve scalability and performance of JBossCache, and a few of those optimizations alleviate some of root node contention.
However, the fundamental problem that is described in this thread still exists. The right approach is most likely to allow for multiple root nodes within the cache.
Thank you for your quick reply. It was my impression though that the "root node contention" was an issue whther you had multiple roots in a tree or not. Or am I missing something?
Just for safety, let me expand a bit: onsider three domain which are keeping state in the cache as "/a", "/b" and "/c". Domain "a" has consistent high mutation rate in it's sub-trees whereas "b" and "c" has a burst-like activity in parts of, or their entire, sub-tree. However, it is critical for the system that the load differences between the domains does not cross the domain boundaries. For example, the high activity in "a" must not degrade preformance for the "c" domain should it enter a prolonged burst of activity and mutations.
Should I understand that the different roots in this scenario now are separated in regards to write-locks? Or is there still dependencies on the main root ("/")? In short, would you rcommend one or multiple cache instances in the scenario above?
Whenever an attempt is made to retrieve a node from the cache, the cache is traversed from the root node (has a special Fqn of "/").
There is a single instance of the root node within the cache, which like all nodes, keeps a reference to its children in a hash map that supports concurrent reads, but not concurrent writes. The concurrent reader hash map means that multiple threads can concurrently ask the node for its children without contention.
This means that so long as you are not adding and removing nodes directly underneath the root (i.e. direct children of root) there should be little to no concurrency problems.
However, if you are adding or removing nodes continually underneath root at a high rate then concurrency problems can surface because an exclusive write lock must be obtained on root node.
I have experimented with using a hashed table of TreeCache's to resolve this second problem and that does indeed seem to work, but presents a number of undesirable problems, most notably that clustering performance will suffer dramatically.
Hope this helps.