11 Replies Latest reply on Oct 29, 2006 11:55 AM by ben.wang

Eviction thread behaviour

manik Oct 20, 2006 1:01 PM

As reported in

http://jira.jboss.com/jira/browse/JBCACHE-794

there are a few issues with the eviction thread's behaviour in concept.

The bits I'm specifically concerned about are:

1) Should the eviction thread always use a lock acquisition timeout of 0, simply because if anyone else has a lock on the node being evicted then it should not be evicted? Since this is algorithm-specific, the BaseEvictionPolicy's evict() method should return an appropriate value if the eviction failed because of a timeout (rather than throw an exception, as this may be quite common with a timeout of 0)

2) BaseEvictionAlgorithm.evictCacheNode(), which calls BaseEvictionPolicy.evict(), would have to decide on whether a failure to evict because of a timeout should result in the eviction call being put back on the queue. In the case of an LRU type policy, this probably should *not* happen since the fact that the node is locked, it should be treated as a nodeVisited event and this eviction call should be removed from the eviction queue.

What do people think about this?

Cheers,
Manik

1. Re: Eviction thread behaviour

ben.wang Oct 25, 2006 6:43 AM (in response to manik)

"manik.surtani@jboss.com" wrote:
1) Should the eviction thread always use a lock acquisition timeout of 0, simply because if anyone else has a lock on the node being evicted then it should not be evicted? Since this is algorithm-specific, the BaseEvictionPolicy's evict() method should return an appropriate value if the eviction failed because of a timeout (rather than throw an exception, as this may be quite common with a timeout of 0)

I don't think it is necessarily true that if a node has a read/write lock, it should not be evicted. Take for example the eivction policy of FIFO or policy with a node max age. Both will evict the corresponding node even when it just has been accessed.

Unfortunately, I still don't see how eviction timeout will cause deadlock or slow system down though? This can happen if the event queue has been filled up. But Manik has made the queue size configurable now. So theorectically, we should not run into this problem anymore. Therefore, making the timeout of 0 may not be necessarily anymore.

"manik.surtani@jboss.com" wrote:
2) BaseEvictionAlgorithm.evictCacheNode(), which calls BaseEvictionPolicy.evict(), would have to decide on whether a failure to evict because of a timeout should result in the eviction call being put back on the queue. In the case of an LRU type policy, this probably should *not* happen since the fact that the node is locked, it should be treated as a nodeVisited event and this eviction call should be removed from the eviction queue.

If we fail to evict the node, it will be put into a special recycle queue to be process again within the next cycle. When the nodeVisited event comes in, it is supposed to remove node event from both queues. If it is not, then it is a bug.
Actions
2. Re: Eviction thread behaviour

manik Oct 25, 2006 7:17 AM (in response to manik)

A configurable queue size does not get rid of the deadlock scenario - just pushes the problem out a bit more. Under unpredictable load, this queue could still fill up.
Actions
3. Re: Eviction thread behaviour

brian.stansberry Oct 25, 2006 11:08 AM (in response to manik)

"ben.wang@jboss.com" wrote:

I don't think it is necessarily true that if a node has a read/write lock, it should not be evicted. Take for example the eivction policy of FIFO or policy with a node max age. Both will evict the corresponding node even when it just has been accessed.

I understand what you mean about FIFO or max age, but if eviction doesn't respect locks that can lead to serious problems; i.e. user thread doing a get acquires a lock in PessimisticLockInterceptor, then eviction thread evicts node, then user thread reaches TreeCache._get, which does a findNode, gets nothing and returns null. No chance to reload from the cache loader as the user thread is already past the interceptor.

Writing the above sparked a tangential thought; may be terrible, haven't thought about implications, etc. When an interceptor finds a node, why not throw it in the InvocationContext or something similar and pass it through the stack that way? Subsequently calls to find the node can check the context first before walking the cache tree. Saves redundant walking of the tree.
Actions
4. Re: Eviction thread behaviour

manik Oct 25, 2006 11:15 AM (in response to manik)

Writing the above sparked a tangential thought; may be terrible, haven't thought about implications, etc. When an interceptor finds a node, why not throw it in the InvocationContext or something similar and pass it through the stack that way? Subsequently calls to find the node can check the context first before walking the cache tree. Saves redundant walking of the tree.

Hmm, I like. Want to throw this into a JIRA task and I'll play around with it when I get a moment ...
Actions
5. Re: Eviction thread behaviour

brian.stansberry Oct 25, 2006 11:52 AM (in response to manik)

http://jira.jboss.com/jira/browse/JBCACHE-811
Actions
6. Re: Eviction thread behaviour

ben.wang Oct 26, 2006 12:42 AM (in response to manik)

"manik.surtani@jboss.com" wrote:
A configurable queue size does not get rid of the deadlock scenario - just pushes the problem out a bit more. Under unpredictable load, this queue could still fill up.

The queue fills up during surge is perfectly ok otherwise we would need a unbounded queue. :-)

What I don't understand is still the cause of the problem. I understand you can't reproduce it reading from the Jira, right?
Actions
7. Re: Eviction thread behaviour

ben.wang Oct 26, 2006 12:49 AM (in response to manik)

"bstansberry@jboss.com" wrote:
I understand what you mean about FIFO or max age, but if eviction doesn't respect locks that can lead to serious problems; i.e. user thread doing a get acquires a lock in PessimisticLockInterceptor, then eviction thread evicts node, then user thread reaches TreeCache._get, which does a findNode, gets nothing and returns null. No chance to reload from the cache loader as the user thread is already past the interceptor.

No, eviction should still respect the locks, of which what it is doing now if I am correct. The question is the lock wait time, whether it is 0 or the regular setting.
Actions
8. Re: Eviction thread behaviour

manik Oct 26, 2006 9:13 AM (in response to manik)

What I don't understand is still the cause of the problem. I understand you can't reproduce it reading from the Jira, right?

I couldn't reproduce it because of a timing problem, but I do completely understand the cause of the problem. Consider (LIFO):

1) Eviction queue is close to full, cache region is full.
2) Start a tx
3) Add stuff to the cache
4) Causes older items in the region to be queued for eviction
5) tx reads item in cache, which was queued for eviction
6) Node visited event in 5) not yet received, Eviction Thread attemps to process queue. Waiting on RL in 5)
7) tx attempts to write more stuff, but blocks because this triggers more evictions and the eviction queue is now full.

The tx doesn't get a chance to commit and release the RL in 5), because 7) blocks. The eviction thread cannot empty the quete because it is waiting n 5). Deadlock, until lock timeout!
Actions
9. Re: Eviction thread behaviour

manik Oct 26, 2006 9:15 AM (in response to manik)

Setting the eviction thread lock acquisition timeout to 0 will allow the Eviction Thread to process the rest of the queue without blocking on trying to evict stuff that is locked.
Actions
10. Re: Eviction thread behaviour

genman Oct 27, 2006 4:29 PM (in response to manik)

Do you make sure that nodes that could not be locked (raise an exception) are processed later, by perhaps putting them on the back of the queue again?
Actions
11. Re: Eviction thread behaviour

ben.wang Oct 29, 2006 11:55 AM (in response to manik)

The logic should be there already. If we can't evict (due to exception), we put it in a special queue to be processed later.
Actions

Go to original post