3 Replies Latest reply on Sep 22, 2015 4:12 AM by rvansa

Keys stored in a replicated cache are not kept in the accessed cache node until owner node sends it back

john32768 Sep 17, 2015 6:17 AM

We're migrating from Infinispan 5 to Infinispan 7. We're using a replicated cache with 2 nodes of which only one node is accessed directly through a rest interface. We're using async replication with a queue flush time of 10 seconds.

With Infinispan 5, any value stored in the cache would be immediately retrievable from the same node, in other words a PUT /key followed immediately by a GET /key would not result in a 404.

With Infinispan 7, this behaviour has changed, and now only certain keys are immediately available, while other keys are not. This happens about 50% of the time. After 10 seconds have elapsed, the keys will always be available in both nodes.

I strongly suspect this is because this replicated cache is now somehow seen as a subset of a distributed cache (with a minimum of 2 copies), and that some keys are "owned" by the "other" node, not the one that was accessed (something to do with key affinity).

What we're seeing is that for some keys we store in Node A, the key is actually first stored in Node B (and is immediately available there). Node B then puts this in its async replication queue, and after 10 seconds the key will also be available in Node A. This is quite counter-intuitive for us. Basically, unless we're using Hotrod (which is not an option for us), we can't be sure which Node will hold the key and so we cannot access it easily again.

Also, the performance of this kind of setup will be worse, because for keys that are owned by Node B, some kind of synchronous communication will be required between the receiving Node (A) and the owning Node (B) before the transaction completes. With Infinispan 5, this would never be needed and the communication would always flow through the async replication queue, no matter what the key is.

Is it possible to somehow get the old behaviour, where any key that is sent to Node A, is stored on Node A and then communicated asynchronously to Node B?

--John

1. Re: Keys stored in a replicated cache are not kept in the accessed cache node until owner node sends it back

rvansa Sep 18, 2015 3:42 AM (in response to john32768)

Let me first explain this change on synchronous cahces. In Infinispan 5 (I think up to 5.2), the algorithm was very prone to deadlocks. This is because the lock was always acquired first on the originator and then, while holding the lock, it was replicated to the other owners, where it had to acquire the lock, too. If the key was modified concurrently on two nodes, the lock could be acquired locally and then acquisition on other node would fail with timeout. That's why it was changed to the 'lock first on primary owner, no locking needed on backup owners, unlock on primary'.

Asynchronous caches work a bit differently. The lock is not held while replicating the modification to the other node, but JGroups make sure that messages from the same source are delivered in the order they were sent. Therefore, you use lock to decide order on primary owner and then let the update propagate to backups in the same order. So here there's no risk of deadlocks, but the cache would not be eventually consistent - if you insert the entry first on originator and then apply the update on the remote node, you could end up with two different values, and it would never sync.

I believe that the replication from originator to primary owner is asynchronous, too, so you may be just lucky to see the update on A being 'immediatelly' applied on B.

I don't think there's any option to use the old behaviour, because we consider it broken (Infinispan is aiming at at least eventual consistency even in async mode). And even HotRod wouldn't give you guaranteed result - it may miss the primary owner after cluster rebalance. Async caches just don't guarantee when the update happens.
Actions
2. Re: Keys stored in a replicated cache are not kept in the accessed cache node until owner node sends it back

john32768 Sep 21, 2015 10:19 AM (in response to rvansa)

Thanks for the help. I already suspected there was a fundamental change somewhere that was underlying the results we were seeing. I'm just wondering now how to proceed. Doing sync replication instead of async has helped for us, but we're worried that this will have a significant impact on performance.

Let me first explain this change on synchronous cahces. In Infinispan 5 (I think up to 5.2), the algorithm was very prone to deadlocks. This is because the lock was always acquired first on the originator and then, while holding the lock, it was replicated to the other owners, where it had to acquire the lock, too. If the key was modified concurrently on two nodes, the lock could be acquired locally and then acquisition on other node would fail with timeout. That's why it was changed to the 'lock first on primary owner, no locking needed on backup owners, unlock on primary'.

Concurrent access would never occur for us, as only one node is accessed at all times.

I believe that the replication from originator to primary owner is asynchronous, too, so you may be just lucky to see the update on A being 'immediatelly' applied on B.

I don't think we were lucky here, I tested this with a small program which only accesses one of the nodes but checks for presence of the key on both. 50% of the time it was found on node A, and the other times on node B. We never saw the case where it would not be present yet on either node out of 10000's of inserts.

I don't think there's any option to use the old behaviour, because we consider it broken (Infinispan is aiming at at least eventual consistency even in async mode). And even HotRod wouldn't give you guaranteed result - it may miss the primary owner after cluster rebalance. Async caches just don't guarantee when the update happens.

Why it would not give a correct result with HotRod? It seems to me that in that case the primary owner of the key can be determined by HotRod and it would contact the correct server, which as I've tested already, always has the entry. So using HotRod, which dynamically selects to with which node to communicate would give a much more consistent result, except maybe in some rare edge case. Unfortunately, using HotRod is not an option for us at the moment as we're stuck with REST.

So, what we're dealing with now is that:

1) I have a single cache node with which we communicate using REST -> everything works fine 100% of the time
2) I replace the above with two async replicated nodes, and suddenly the same server we have been communicating with seems to think that half of the keys are "missing" for a short while.. how can that ever be a desired situation? I suppose this is a lack of a read uncommited mode?

All we really want is to have a single node scenario, with a backup that can be available when the first node goes down (it's fine if that backup is slightly outdated or is missing the latest changes). This used to be the async replicated setup of 5.2. Now apparently this "backup" is interfering with how the primary node we're using is working. As soon as we turn off the backup server (or the primary one for that matter), everything starts working perfectly.
Actions
3. Re: Keys stored in a replicated cache are not kept in the accessed cache node until owner node sends it back

rvansa Sep 22, 2015 4:12 AM (in response to john32768)

John Hendrikx wrote:

I believe that the replication from originator to primary owner is asynchronous, too, so you may be just lucky to see the update on A being 'immediatelly' applied on B.

I don't think we were lucky here, I tested this with a small program which only accesses one of the nodes but checks for presence of the key on both. 50% of the time it was found on node A, and the other times on node B. We never saw the case where it would not be present yet on either node out of 10000's of inserts.

Okay, I might interpret the code wrong. Anyway, if you want to know for sure, you can set up trace logging on org.infinispan, run single put and see the timing.

John Hendrikx wrote:

I don't think there's any option to use the old behaviour, because we consider it broken (Infinispan is aiming at at least eventual consistency even in async mode). And even HotRod wouldn't give you guaranteed result - it may miss the primary owner after cluster rebalance. Async caches just don't guarantee when the update happens.

Why it would not give a correct result with HotRod? It seems to me that in that case the primary owner of the key can be determined by HotRod and it would contact the correct server, which as I've tested already, always has the entry. So using HotRod, which dynamically selects to with which node to communicate would give a much more consistent result, except maybe in some rare edge case. Unfortunately, using HotRod is not an option for us at the moment as we're stuck with REST.

Yes, I meant those edge cases: a) cluster rebalances (but it seems you don't really care about the short window when one of the nodes crashes) b) If using HotRod protocol with version < 2.0, some segments may not be mapped correctly due to some backwards compatibility with HotRod clients with Infinsipan 5.1 and older (at those times the routing was done using 'virtual nodes' instead of 'segments').

John Hendrikx wrote:

All we really want is to have a single node scenario, with a backup that can be available when the first node goes down (it's fine if that backup is slightly outdated or is missing the latest changes). This used to be the async replicated setup of 5.2. Now apparently this "backup" is interfering with how the primary node we're using is working. As soon as we turn off the backup server (or the primary one for that matter), everything starts working perfectly.

In that case there might be a solution - if you can implement your own org.infinispan.distribution.ch.ConsistentHashFactory (this is configurable via configurationBuilder.clustering().hash().consistentHashFactory()) that will make one node primary owner of all segments and the other node only backup owner. Then, you can route all requests just to primary node, if this one is alive. See ReplicatedConsistentHashFactory, it's only 150 lines of code so it's not too much work to hack it.
Actions

Go to original post