Sorry the following is not displayed correctly in the original post.
You can roughly assume one node per entity. There are some overhead nodes as well, but that's unlikely to be significant in the overall node count once you start talking thousands of entities.
Yes, increasing maxNodes to 10,000 will increase the number of hibernate objects. Whether you have enough memory to hold that many depends on how much memory you give your VM and how much data is in your entities. You have to try and find out. People certainly cache much larger numbers of entities than that; there's nothing magic about 5,000, it's just an example value.
Passivation is not recommended. Hibernate second level caching has never been tested with passivation because generally it makes no sense to use something else as a persistent store for something that's already in a production level RDBMS.
Well, sometimes we see in production that the total number of objects cached is far more than what we defined for max node number in treecache.xml. We deploy hibernate session factory as mbean and turns on the session factory statistics so we can watch all the caching behavior. In one particular case, the max node in the treecache.xml is 30,000 but we saw the total objects cached is 97,000. The jvm eventually got outofmemory error and we had to restarted jboss. Althought this is rare case, we constantly observe that the total number objects cached outnumbers the max node number by a small amount.
Is this expected behavior? Why tree cache does not follow the max node that we defined?
Perhaps the eviction thread is not running frequently enough?
The maxNodes config is not a hard limit, i.e. the cache doesn't use the thread trying to insert node 30,001 to evict a node first in order to stay under the limit. There's a separate eviction thread (runs every 5 secs by default) that does it.
That said 97,000 versus a limit of 30,000 is a big difference.
Just before the jvm got outofmemory error, we notice in the log many entries like this:
putNodeEvent(): eviction node event queue size is at 98% threshold value of capacity: 200000 You will need to reduce the wakeUpIntervalSeconds parameter.
Our wakeUpIntervalSeconds is set as 3 (seconds). I don't understand what the eviction node even queue is and why it can reach almost 200,000. Does it take memory or it just writes into file system? It is likely this eviction queue causes the outofmemory? Is there anything we can configure treecache or jboss to avoid outofmemory?
Thank you very much for your help.
P.S. We do have a jboss support license. But since jboss and redhat are merged, our login does not work anymore. We are working with redhat people to sort this out. In the meantime, I have to post our questions here.
The eviction event queue holds event objects that describe reads and writes to the cache. Those are used by the eviction handling code to determine which nodes to evict (e.g. what nodes are LRU).
The queue itself takes memory, but it's a bounded queue so it won't grow beyond 200,000 entries. But these WARN messages are a sign that eviction is overloaded, which can result in excess nodes being left in the cache.
I would reduce your wakeUpIntervalSeconds to 1.
200,000 seems pretty big to me. But my question is how big of each object in the eviction queue? What is the content of objects in the eviction queue? Is it just wrapper around the cached objects or just a pointer to the cached object? If it is a replication of each cached object, then 200,000 such objects can takes significant memory. Could you give us more info on this eviction queue so that we can better understand our outofmemory
The queue contains instances of:
As you can see it contains ref to an Fqn, two ints, a boolean and a long. Every time you read or write to the cache one of those is created and added to the queue. When the eviction thread runs they are pulled off the queue, data is used in eviction decisions, and the EvictedEventNode is discarded.
Thanks. So, the eviction queue should not be the cause of our problem as the content in the queue is pretty small.
I think we have found out the problem we had in production. It is the hot deployment. I know jboss does not recommend redeployment in production, we use it once in a while to add new datasources. What happened in our case is like this:
We turned on hot deployment for adding a new datasource. But we also changed the treecache.xml so that the Max node number can be reduced. When the hot deployment runs, it redeployed treecache. Also, we have hibernate session factory service (mbean) depends on the treecache mbean. So all the session factories got redeployed. At that moment, jboss was hosed and refused to take in requests.
What we do not know is that what happens in hot deployment of treecache with actively being used by hibernate. Does this explain why our eviction queue got so big?
I don't see an obvious path of causation from the redeploy to the big eviction queue, not if your dependencies were as described. The cache would be fully started before the session factories were started; i.e. the cache should not have been exposed to requests while it was starting.
A possibility is the way your app responded when it came back online -- tried to cache a lot of data in a big spurt rather than the more normal steady state. That would result in a temporary spike in the number of items in the eviction queue.