4 Replies Latest reply on Apr 4, 2005 9:36 PM by kkalmbach

    exception in LRUAlgorithm

    kkalmbach

      Under some heavy load I get the following exception from the LRUAlgorithm..

      2005-03-30 13:36:12,990 ERROR [org.jboss.cache.eviction.EvictionTimerTask] run(): error processing eviction with exception: org.jboss.cache.eviction.EvictionException: LRUAlgorithm.removeFromQueue(): internal error. Can't find fqn in NodeMap. fqn: /UserContext/user1 will reset the eviction queue list.
      
      2005-03-30 13:36:12,990 INFO [org.jboss.cache.eviction.Region] reseteEvictionQueues(): node queue size: 439 region name: /_default_/
      2005-03-30 13:36:12,990 ERROR [STDERR] org.jboss.cache.eviction.EvictionException: LRUAlgorithm.removeFromQueue(): internal error. Can't find fqn in nodeMap. fqn: /UserContext/user1
      
      2005-03-30 13:36:12,990 ERROR [STDERR] at org.jboss.cache.eviction.LRUAlgorithm.removeFromQueue(LRUAlgorithm.java:216)
      
      2005-03-30 13:36:12,990 ERROR [STDERR] at org.jboss.cache.eviction.LRUAlgorithm.processRemovedNodes(LRUAlgorithm.java:108)
      
      2005-03-30 13:36:12,990 ERROR [STDERR] at org.jboss.cache.eviction.LRUAlgorithm.processQueues(LRUAlgorithm.java:81)
      
      2005-03-30 13:36:12,990 ERROR [STDERR] at org.jboss.cache.eviction.LRUAlgorithm.process(LRUAlgorithm.java:51)
      
      2005-03-30 13:36:12,990 ERROR [STDERR] at org.jboss.cache.eviction.EvictionTimerTask.run(EvictionTimerTask.java:35)
      
      2005-03-30 13:36:12,990 ERROR [STDERR] at java.util.TimerThread.mainLoop(Timer.java:432)
      
      2005-03-30 13:36:12,990 ERROR [STDERR] at java.util.TimerThread.run(Timer.java:382)



      I saw that in the LRUAlgorithm itself, you catch and ignore this exception if it's called from evict(). If this exception happens from a call to processRemovedNodes, is is bubbled up to the exictionTimerTask and the nodeList is reset, which makes nothing timout ever.

      Should this exception be ignored from processRemovedNodes? If this exception should not be ignored, I can work on getting a better unit test to show this.

        • 1. Re: exception in LRUAlgorithm

          Please contribute a unit testing to re-produce this prolem. I don't think is ignorable error.

          Thanks a lot,

          -Ben

          • 2. Re: exception in LRUAlgorithm
            kkalmbach

            After much searching I think I found my problem.

            Does this make sense to you..
            I do not specify a isolation level in my service.xml, so (I think) I use the "LockStategyNone".

            When I have 2 threads doing a repeated add/removes on the same node. One does not wait on the other, and the adds and deletes are inter-mingled. So in the nodeEvent queue, I get an add/add/delete/delete. Then when the eviction alogrithm starts to process the queue, it processes an add, then another add (which does not do anything, because it is already in the map), then a delete and then the second delete fails.

            a few questions..
            1) Is this a reasonable explanation of what is going on?
            2) Does the default LockStragety make sense, should it ever be used? should the default change?
            3) Does the eviction policy need changing to accomodate this (allow multiple entries in the map, or ignore the error, or ??)


            I can send trace level logs if you want, but they are too large to post here.
            Thanks
            -Kevin

            • 3. Re: exception in LRUAlgorithm

              Kevin,

              The default lock strategy should be REPEATABLE_READ if you don't specify any. So that still does not explain why you are getting to delete it twice.

              There is a scenario can happen though. When eviction policy is trying to evict a node, another thread swoop in to delete the node first. As a result, node not found can result. Since our eviction policy is not syncrhonous, we should allow this scenario to go through without error.

              -Ben

              • 4. Re: exception in LRUAlgorithm
                kkalmbach

                Looking at the logs, I do not think this is related to the eviction policy running.

                I definatly see one thread getting a write lock, then another thread getting a write lock (for the same node) before the first one is finished. Here is what I see happening....
                T1 is Thread1, T2 is Thread2. (Please excuse the ascii sequence diagram)

                T1 T2 Cache NodeEventq
                
                put ------------------> add
                nodeEvent---------------------------------> add
                remove----------------> remove
                 put ---------> add
                 nodeEvent------------------------> add
                nodeEvent---------------------------------> remove
                 remove-------> remove
                 nodeEvent------------------------> remove
                


                Then when the eviction Timer wakes up and tries to run through the nodeEventQueue, it hits 2 adds then 2 deletes.