4 Replies Latest reply on Sep 11, 2014 8:12 AM by rvansa

    Existing cache entry not found (distributed cache, async store, IGNORE_RETURN_VALUES)

    jugglingcats

      We are using 6.0.0-Final. We have a 3 node cluster, using a distributed cache (1 owner) and a custom mongodb store which operates async on writes using a basic LinkedBlockingQueue. We stopped using

       

      Occasionally (about 0.02% of the time) in tests we are seeing a get for a previously put key return null.

       

      Our test harness is multi-threaded but it cannot do a get for a key before the put has completed, and the key (a session id) has been returned. So there's no chance the get is happening before the put returns. We should never see a cache miss.

       

      I'm trying to understand if the following assumptions are correct...

      - Gets for a key will always be retrieved from the owner node for that key (assuming the node is up)

      - The owner node will always serve from memory assuming the cache size doesn't exceed max entries (ie. the entry has not been evicted)

       

      If these are right, it's hard to understand the behaviour we're seeing!

       

      I have just noticed at least one occurrence of this problem where the put and the get are on the same node, which is very strange to me.

       

      Note that we also use IGNORE_RETURN_VALUES on cache puts, because we're not interested in the previous session state. In my tests, it *seems* as though using this flag aggrevates this problem. I've run several tests without this flag and don't see the issue at all. Can anyone explain why this might be? Using this flag significantly improves performance for us.

       

      As usual any advice and guidance much appreciated.

        • 1. Re: Existing cache entry not found (distributed cache, async store, IGNORE_RETURN_VALUES)
          jugglingcats

          Having done some more digging the reason for this issue is pretty obvious... we are using DIST_ASYNC!

           

          It was quite a while ago that we made this decision on performance grounds, and I think at the time we acknowledged that it would result in occasional dirty reads...

           

          What happens is that when a put is done on node A it may not make it to owner node B before node B is hit with a get for the same key. I'm guessing that even when a put is done on the owner node B it sends it to itself, which is done in a separate thread and therefore another thread on node B can also get a dirty read. Does that sound correct?


          Thanks

          • 2. Re: Existing cache entry not found (distributed cache, async store, IGNORE_RETURN_VALUES)
            rvansa

            With asynchronous replication, you can really see such issue - put() method does not wait for the replication (sending the write to another node), and messages sent over network are not ordered. Therefore, the read could reach the remote node before the write applies.

            On the other hand, I believe that if a write is local, recording that to the data container is executed by the thread calling the put(), and therefore another read on the same node should see the value. If you can use arbitrary keys, you can implement your KeyAffinityService to do inserts (writes under the key for the first time) locally.

            • 3. Re: Existing cache entry not found (distributed cache, async store, IGNORE_RETURN_VALUES)
              jugglingcats

              Hi Radim, thanks for the reply. I will do some more testing when I get some time to see if I can replicate dirty read on same node. I suspect the put and get were on the same node but not the owner node.

               

              We don't control the keys as we use Mongo ObjectId, but I am interested in determining the owner node for a given key so we can implement affinity at the HTTP level. My idea is to set a cookie with the owner node for a given key, so that the next request comes to the owner node. If the topology changes, this is no big deal, we'd just do the remote read and then reset the cookie so user goes to new owner node next time. I think this could really help our performance by keeping most gets/puts local... does it make sense? What I don't know is if it's possible to determine the owner node(s) for a given key.

              • 4. Re: Existing cache entry not found (distributed cache, async store, IGNORE_RETURN_VALUES)
                rvansa

                cache.getAdvancedCache().getDistributionManager().getPrimaryLocation(key)

                1 of 1 people found this helpful