We have an application that currently runs on WildFly 8 and uses the bundled Infinispan 6. Unsurprisingly, this application has some problems dealing with network partitions because Infinispan 6's approach to handling partitions was basically to stick its fingers in its ears and yell "LALALALA I CAN'T HEAR YOU!"
Infinispan 7 added proper handling of partitions. In terms of the CAP theorem, we want an AP cluster because a temporary lack of consistency doesn't matter much, but we do need the cluster to continue operating even if most of it abruptly dies. So I've been investigating what Infinispan will do when configured for availability. The best information I've been able to find is this page on the GitHub wiki, which states:
When the partitions merge, Infinispan does not attempt to merge the different values that each partition might have. The largest partition (i.e. the one with the most nodes) is assumed to be the correct one, and its topology becomes the merge topology. Data from nodes not in the merge CH is wiped, and they will receive the latest data from nodes in the merge CH.
That's from section 3.1.2, talking about replicated caches, but the section for distributed caches says pretty much the same thing. As detailed as that document is, it doesn't cover what happens with event listeners, and we use listeners quite heavily.
So my question is: when two partitions merge and the nodes in one partition wipe out all their data, will the nodes in that partition call event listeners on the caches? As an example scenario, suppose node A in partition 1 has a cache containing the key-value pair (X, Y). Then the partition heals. Node A detects that partition 2 has more nodes, so it wipes its cache and reacquires the state from one of the nodes in partition 2. But in partition 2, the key X had no mapping, so after reacquiring the state, node A still doesn't have a mapping for X. From the application's standpoint, this is equivalent to the mapping being removed from the cache. A normal remove() operation would trigger a CacheEntryRemovedEvent to be delivered the relevant listeners. Will that still happen when a partition heals as in this scenario?
And now for a side rant: why is the partition-handling configuration option a boolean? The quote above describes Infinispan's behavior with partition handling "disabled" (i.e. partition-handling = false), but as far as I'm concerned, "biggest partition wins" is a perfectly valid strategy for handling partitions. To me, partition handling being disabled implies Infinispan 6's behavior: the cluster literally does nothing about it. The partition-handling option isn't really about whether partitions are handled but rather how they are handled: false gives you an AP system versus true gives you a CP system. This greatly confused me when I first started reading about the partition handling changes.