I dug further into code for your question regarding query caching. Though the flag is enabled in hibernate configuration for all modules, query cache is only enabled programatically (query.setCacheable(true)) for CMS.
Following your previous response and as per http://wiki.jboss.org/auth/wiki/JBossCacheHibernate, does it mean that INVALIDATION_SYNC would be worth trying when CMS is not used?
Yes, I think it's worth trying. I'd thought there might have been a problem in JBC 1.4.x that would have made invalidation not work properly, which is why I said don't bother. But I just had a look at the JBC code and it's looks OK, so go for it.
INVALIDATION_SYNC does not help in my case. :-(
Interesting. You mentioned that only the CMS cache actually caches queries. Is most of the actual caching activity in the CMS cache?
Also, if you aren't actually caching queries, you should set use_query_cache to false in your SessionFactory config. Otherwise every time you create a new entity or update one, Hibernate will send a *two* messages around the cluster to invalidate the query cache. That message should go async, but it's still a source of overhead.
Later on Sohil pointed out that even CMS does not cache query. The code is no longer used and is going to be removed.
When I said, INVALIDATION_SYNC does not help, I meant that numbers do not show any better or worse result. Just to make sure I am on the same page, my configuration is Pessimistic locking + read committed + invalidation_sync. I will however disable use_query_cache flag from configuration as well.
If you are not caching queries, that's good. We're on the same page on your config, which can now be applied to the CMS cache.
If that and use_query_cache=false don't resolve your issue there's either something I'm missing in how JBC handles invalidation in the Hibernate "put" use case (db read + cache write) or your testing is doing a really high number of entity creations/updates (which isn't going to scale).
We can leave CMS out completely as my tests don't use any CMS component. CMS classes are loaded on start up and that's about it.
A very simple portlet that I am using to test is at http://anonsvn.jboss.org/repos/qa/portal/failover-test/failover-test/src/main/org/jboss/portlet/failoverTest/FailoverTestPortlet.java . This is very similar to what Dominik used to test scalability in EAP.
I don't see anything there that involves entities; looks like a pure session replication test. Is your session replication cache configured for buddy replication?
Portal world is bit different. Although there is no explicit entity stuff, portal stores which portal page contains which porlet in database. A request to such page usually results into a hit to database along with check if anonymous user has access to that page and porltet or not. I already worked with the team on optimizing DB calls in non clustered as well as clustered (one node) setup. What we are trying to achieve is take the numbers from one node as base line and see how it gets as we add nodes to cluster. I hope this clarifies few things.
Sounds like activity on the entity cache should be almost entirely reads, which should scale. Doesn't sound like there would even be that many cases of db read + cache write after an initial warmup period.
Is the web session clustered? If so, be sure to use buddy replication for the web session cache or the session replication will inhibit scalability.
Do the use_query_cache=false bit first though so we know whether that has an effect. :-)
I was working on a new release of portal last week hence there was no activity on this issue. I am going proceed on this now. Just an FYI.
I ran tests with use_query_cache=false and following configuraiton (just to recap)
<attribute name="NodeLockingScheme">PESSIMISTIC</attribute> <attribute name="IsolationLevel">READ_COMMITTED</attribute> <attribute name="CacheMode">INVALIDATION_SYNC</attribute>
Results are better. Previously, I used to get very good scalability upto 3 nodes but with 4 nodes, perf was equal to that of 2 nodes. Now, upto 3 nodes, I get close to linear scalability and with 4 nodes perf is equal to that of 3 nodes. With 5 nodes, results are as before.
OK; when you get a chance to test w/ buddy replication on the web session cache, let me know.
Regarding buddy replication, portal uses same TreeCache definition for both entity and protlet session replication. Since buddy replication can not be used for entity, do you mean to use separate cache configuration for session replication stuff and use the existing one for everything else?
My ignorance of portal internals is becoming apparent.
I'd thought portlet sessions were managed as data stored inside the overall javax.servlet.http.HttpSession. So, either:
1) You guys are overriding the standard HttpSession replication config and substituting your own cache for the one defined in deploy/jboss-web-cluster.sar
2) I'm clueless and portlet sessions are using a completely different replication mechanism.
I suspect #2. :-)
What to do depends on which it is.
If #1, the answer depends on why you replaced the standard HttpSession replication cache. If there's not a good reason, go back to using the standard cache, but enable buddy replication.
If #2, you guys have developed your own independent clustering technology, and I can only make general comments:
a) Generally "sessions" of whatever type should be "owned" by a particular node in the cluster. The only reason to replicate them is to provide HA in the unexpected case of failover. Generally REPL_ASYNC is used for this, as it scales much better. The downside is the potential loss of state if failover happens before the latest replication message arrives. In most cases that risk is acceptable in order to get better performance.
INVALIDATION can't be used for a session cache. The only place the session exists is in the cache. If you invalidate it instead of replicating, you have no HA.
Entities are shared by all nodes, and an entity cache should use SYNC communication for the reasons already discussed.
Bottom line, using the same cache for both entities and sessions is not a good approach.
b) Generally, turning on buddy replication for a "session" cache should only be done if the cache integration layer has been written with BR in mind. It's not a simple matter of changing a config flag and voila, it works. The standard HttpSession cache integration layer has been written with BR in mind.