4 Replies Latest reply on Jan 18, 2007 2:26 PM by Manik Surtani

    markNodeCurrentlyInUse() - race condition

    Elias Ross Master

      It seems to me that Region#markNodeCurrentlyInUse() does not really do what it is supposed to do.

      Basically, it adds an event to the queue. And what can happen is that 4 seconds later (say, if event queue processing is every 4 seconds), is that the node can disappear after this method is called.

      Doesn't this make this method fairly useless?

      Shouldn't there just be a method on a Node to prevent eviction?

        • 1. Re: markNodeCurrentlyInUse() - race condition
          Manik Surtani Master

          I agree that it is ugly and placing a marker directly on the node makes more sense.

          I also agree that using the event queue for this kind of data is pretty hacky, but let's revisit the purpose of this method (and Brian will probably have stuff to add here)

          markNodeCurrentlyInUse() does (should) not prevent node removal by calling one of the remove methods. All it should do is prevent the node from being evicted. And if this is the case, then a marked in the event queue should be sufficient since the event processor thread would find the in-use marker before any eviction markers.

          I guess where this can be a problem is if a node is marked for eviction first, and then is marked as in-use, and then the event processor thread starts.

          • 2. Re: markNodeCurrentlyInUse() - race condition
            Elias Ross Master

            Example situation which demonstrates a race condition. The following situation happens, say if 2 nodes are allowed, 3 nodes are created, and LRU policy is applied.

            Visit A - B - C
            Mark Node A in use
            -- event processing occurs --
            Node A evicted ...

            • 3. Re: markNodeCurrentlyInUse() - race condition
              Brian Stansberry Master

              From an earlier jbosscache-dev conversation:

              "Manik Surtani" wrote:

              On 15 Nov 2006, at 14:51, Brian Stansberry wrote:

              > The purpose of this method is to allow an application to signal the
              > eviction policy to not evict a particular node who's data it's using.
              > Needed in situations where the app doesn't hold a lock on the node
              > *and*
              > where evicting the node will cause a problem.
              > AFAICT, this use case is really just for EJB3 SFSBs, where the problem
              > is that evicting an "in-use" node forces the container to call
              > prePassivate() on the in-use EJB. This is done from a CacheListener.
              > This is all a bit funky. Adding methods to the API to deal with a
              > very
              > specific use case. It also has a small race condition, which I won't
              > get into here.
              > Here's a different (but still funky) approach:
              > Add a new exception class o.j.c.eviction.NodeInUseException extends
              > RuntimeException.
              > The application implements CacheListener. When it gets a
              > nodePassivated(pre=true) event, if it doesn't want the passivation, it
              > throws the NodeInUseException. This will abort the eviction and
              > propagate back to the eviction policy. Eviction policies are already
              > written such that they catch all exceptions, and then throw the node
              > into a retry queue (see BaseEvictionAlgorithm.evictCacheNode()). If we
              > wanted to get fancy, we could specifically catch NodeInUseException
              > and
              > decide from there whether to add it to the retry queue.
              > I don't think we should try to change this this week; more of a short
              > term future thing.
              > Thoughts?

              Your second (and still funky) approach does seem a lot cleaner than
              hacking in stuff into the API for a very specific use case. So
              certainly my preferred option. But from a performance standpoint, is
              this really an "exceptional" circumstance?

              The above is hacky. Concern I have about adding a field to the node is it forces a read of each node as part of the eviction. IIRC that's not needed now, except in CacheImpl itself as part of the remove. Don't want to add overhead. (I'm in a class now so can't look at code, so please forgive if that's an easily addressable concern.)

              I believe the race condition is because eviction is a two step process with 2 queues -- policy reads off all the eviction events off the event queue and uses that to build its eviction queue. Then it goes through the eviction queue and decides whether to evict the nodes. Race can happen if the markNodeCurrentlyInUse() call happens after event queue has been read but before the policy has dealt with the relevant node. This race is really a general problem that could throw off accurate working off any eviction algorithm (e.g. a get() that occurs after the event queue is read doesn't prevent eviction by the LRUPolicy).

              • 4. Re: markNodeCurrentlyInUse() - race condition
                Manik Surtani Master

                This is where placing markers (both "marked for eviction" as well as "mark as in use") on the node could help.

                The eviction interceptor places events in the queue as well as marks the node for eviction; on subsequent gets (before the eviction happens) the eviction interceptor could *remove* the "marked for eviction" marker (according to policy) on the node.

                When the eviction thread kicks in, it processes the eviction queue and before attempting to evict a node, tests the "marked for eviction" flag on the node as well as the "mark as in use" flag.

                Since these flags reside in the node, dealing with concurrency and race conditions become much easier.