11 Replies Latest reply on Oct 29, 2006 11:55 AM by ben.wang

    Eviction thread behaviour

    manik

      As reported in

      http://jira.jboss.com/jira/browse/JBCACHE-794

      there are a few issues with the eviction thread's behaviour in concept.

      The bits I'm specifically concerned about are:

      1) Should the eviction thread always use a lock acquisition timeout of 0, simply because if anyone else has a lock on the node being evicted then it should not be evicted? Since this is algorithm-specific, the BaseEvictionPolicy's evict() method should return an appropriate value if the eviction failed because of a timeout (rather than throw an exception, as this may be quite common with a timeout of 0)

      2) BaseEvictionAlgorithm.evictCacheNode(), which calls BaseEvictionPolicy.evict(), would have to decide on whether a failure to evict because of a timeout should result in the eviction call being put back on the queue. In the case of an LRU type policy, this probably should *not* happen since the fact that the node is locked, it should be treated as a nodeVisited event and this eviction call should be removed from the eviction queue.

      What do people think about this?

      Cheers,
      Manik

        • 1. Re: Eviction thread behaviour

           

          "manik.surtani@jboss.com" wrote:
          1) Should the eviction thread always use a lock acquisition timeout of 0, simply because if anyone else has a lock on the node being evicted then it should not be evicted? Since this is algorithm-specific, the BaseEvictionPolicy's evict() method should return an appropriate value if the eviction failed because of a timeout (rather than throw an exception, as this may be quite common with a timeout of 0)


          I don't think it is necessarily true that if a node has a read/write lock, it should not be evicted. Take for example the eivction policy of FIFO or policy with a node max age. Both will evict the corresponding node even when it just has been accessed.

          Unfortunately, I still don't see how eviction timeout will cause deadlock or slow system down though? This can happen if the event queue has been filled up. But Manik has made the queue size configurable now. So theorectically, we should not run into this problem anymore. Therefore, making the timeout of 0 may not be necessarily anymore.

          "manik.surtani@jboss.com" wrote:
          2) BaseEvictionAlgorithm.evictCacheNode(), which calls BaseEvictionPolicy.evict(), would have to decide on whether a failure to evict because of a timeout should result in the eviction call being put back on the queue. In the case of an LRU type policy, this probably should *not* happen since the fact that the node is locked, it should be treated as a nodeVisited event and this eviction call should be removed from the eviction queue.


          If we fail to evict the node, it will be put into a special recycle queue to be process again within the next cycle. When the nodeVisited event comes in, it is supposed to remove node event from both queues. If it is not, then it is a bug.


          • 2. Re: Eviction thread behaviour
            manik

            A configurable queue size does not get rid of the deadlock scenario - just pushes the problem out a bit more. Under unpredictable load, this queue could still fill up.

            • 3. Re: Eviction thread behaviour
              brian.stansberry

               

              "ben.wang@jboss.com" wrote:

              I don't think it is necessarily true that if a node has a read/write lock, it should not be evicted. Take for example the eivction policy of FIFO or policy with a node max age. Both will evict the corresponding node even when it just has been accessed.


              I understand what you mean about FIFO or max age, but if eviction doesn't respect locks that can lead to serious problems; i.e. user thread doing a get acquires a lock in PessimisticLockInterceptor, then eviction thread evicts node, then user thread reaches TreeCache._get, which does a findNode, gets nothing and returns null. No chance to reload from the cache loader as the user thread is already past the interceptor.

              Writing the above sparked a tangential thought; may be terrible, haven't thought about implications, etc. When an interceptor finds a node, why not throw it in the InvocationContext or something similar and pass it through the stack that way? Subsequently calls to find the node can check the context first before walking the cache tree. Saves redundant walking of the tree.

              • 4. Re: Eviction thread behaviour
                manik

                 



                Writing the above sparked a tangential thought; may be terrible, haven't thought about implications, etc. When an interceptor finds a node, why not throw it in the InvocationContext or something similar and pass it through the stack that way? Subsequently calls to find the node can check the context first before walking the cache tree. Saves redundant walking of the tree.



                Hmm, I like. Want to throw this into a JIRA task and I'll play around with it when I get a moment ...

                • 5. Re: Eviction thread behaviour
                  brian.stansberry
                  • 6. Re: Eviction thread behaviour

                     

                    "manik.surtani@jboss.com" wrote:
                    A configurable queue size does not get rid of the deadlock scenario - just pushes the problem out a bit more. Under unpredictable load, this queue could still fill up.


                    The queue fills up during surge is perfectly ok otherwise we would need a unbounded queue. :-)

                    What I don't understand is still the cause of the problem. I understand you can't reproduce it reading from the Jira, right?

                    • 7. Re: Eviction thread behaviour

                       

                      "bstansberry@jboss.com" wrote:
                      I understand what you mean about FIFO or max age, but if eviction doesn't respect locks that can lead to serious problems; i.e. user thread doing a get acquires a lock in PessimisticLockInterceptor, then eviction thread evicts node, then user thread reaches TreeCache._get, which does a findNode, gets nothing and returns null. No chance to reload from the cache loader as the user thread is already past the interceptor.


                      No, eviction should still respect the locks, of which what it is doing now if I am correct. The question is the lock wait time, whether it is 0 or the regular setting.

                      • 8. Re: Eviction thread behaviour
                        manik

                         



                        What I don't understand is still the cause of the problem. I understand you can't reproduce it reading from the Jira, right?



                        I couldn't reproduce it because of a timing problem, but I do completely understand the cause of the problem. Consider (LIFO):

                        1) Eviction queue is close to full, cache region is full.
                        2) Start a tx
                        3) Add stuff to the cache
                        4) Causes older items in the region to be queued for eviction
                        5) tx reads item in cache, which was queued for eviction
                        6) Node visited event in 5) not yet received, Eviction Thread attemps to process queue. Waiting on RL in 5)
                        7) tx attempts to write more stuff, but blocks because this triggers more evictions and the eviction queue is now full.

                        The tx doesn't get a chance to commit and release the RL in 5), because 7) blocks. The eviction thread cannot empty the quete because it is waiting n 5). Deadlock, until lock timeout!

                        • 9. Re: Eviction thread behaviour
                          manik

                          Setting the eviction thread lock acquisition timeout to 0 will allow the Eviction Thread to process the rest of the queue without blocking on trying to evict stuff that is locked.

                          • 10. Re: Eviction thread behaviour
                            genman

                            Do you make sure that nodes that could not be locked (raise an exception) are processed later, by perhaps putting them on the back of the queue again?

                            • 11. Re: Eviction thread behaviour

                              The logic should be there already. If we can't evict (due to exception), we put it in a special queue to be processed later.