Our application uses multiple caches, each of which having an "owner" node, which is the authority on cache data and also the only node that can provide state for a particular cache.
We used to do this (years ago) by ensuring that owner == JGroups coordinator, but are now looking for a solution that works on a higher level.
Cache participants know if they are owner or not, but they don't know who is if they are not. A first solution was to fetch state from everyone joining the cluster until state could be retrieved successfully. However, this resulted in a total mess of flush messages, retries and timeouts. To improve the situation, we've added ownership to IpAddress' additional data field. Caches now fetch state only if the owner joins the cluster.
However, the problem remains: When the owner joins the cluster, multiple cache instances try to flush at the same time, and they all fail. State transfer continues without throwing an error, and without flush. Doesn't this mean that flush in state transfer is essentially useless because it isn't guaranteed to work (or fail hard) anyway? If flush isn't used, does state transfer block the state provider's sending queues, or the receiver's queues/cache? Are there other means to ensure data consistency?
For this kind of strategy, I wonder if you've looked at the TcpCacheLoader. You would have all the "child nodes" load their data from the main node.
As to flushing, someone else should be able to answer your concern.