Should Hotrod topo updates remove initial statically-configured servers? (5.1b4)
inetdevboy Nov 15, 2011 2:15 PMWe have a scenario in which Hotrod failover is breaking due to the automatic topology updates (using 5.1b4 at time of writing).
Imagine a topology-aware Hotrod client connecting to a 2-node server cluster (nodes A and B) in either "distributed" or "replication" mode:
Example 1 (successful failover):
- Nodes A/B both started
- Init Hotrod client with static server list [A,B].
- client.put("key1", "value1")
- stop node "A"
- client.get("key1")
- succeeds because client knows about "A" and "B", gets value from "B"
- also receives new topology view of [B], "A" dropped from pool
- start node "A" again, eventually syncs up with "B"
- client.get("key1")
- Note: This is the "magic call" I'll refer to in example 2
- get() call succeeds (value from A or B)
- client receives topo view update of [A,B]
- stop node "B"
- client.get("key1")
- succeeds, value pulled from "A"
- client receives topo update of [A], "B" dropped from pool
The problem stems from the fact that the topology updates are only received in reply to a client request. In the scenario described above, if one removes the "magic call" noted above, the failover mechanism breaks.
Example 2 (failover breakage):
- Nodes A/B both started
- Init Hotrod client with static server list [A,B].
- client.put("key1", "value1")
- stop node "A"
- client.get("key1")
- succeeds because client knew about "A" and "B", gets value from "B"
- client also receives new topology view of [B], "A" dropped from pool
- start node "A" again, eventually syncs up with "B"
- stop node "B"
- now, since there have been no intervening client calls, the client has a topo view of [B] only.
- client.get("key1")
- this fails, because the client doesn't know that "A" is available, and "B" is stopped.
So what is the best solution to this problem? I've thought of a few imperfect approaches (enumerated below), but I'd love to hear the team's recommendations.
- In this simple 2-node view, one could simply disable topology awareness in the client, and stick with the static list indefinitely. Has obvious drawbacks.
- Perhaps the client's topology view should be reverted to the originally-defined static list in the event that all the nodes in the dynamically-updated topology view become unavailable.
- If the node stop/starts were intentional operations (rolling restarts for maintenance, or whatever) one could wire up some mechanism by which all of the HotRod clients are told to ping for a topology update each time the topology changes. This could be done via JMS, a JGroups-aware client or probably a hundred other ways, but it'd be nice to have a relatively standard implementation available out-of-the-box rather than everybody having to devise a custom solution.
- Couldn't the HotRod protocol be augmented to allow a "push" of updated topology views from the server? Just a response-only message with a new opcode indicating a topology push and ignore the message id?
I look forward to hearing your thoughts.