1 Reply Latest reply on Nov 16, 2011 5:22 AM by galder.zamarreno

Should Hotrod topo updates remove initial statically-configured servers? (5.1b4)

inetdevboy Nov 15, 2011 2:15 PM

We have a scenario in which Hotrod failover is breaking due to the automatic topology updates (using 5.1b4 at time of writing).

Imagine a topology-aware Hotrod client connecting to a 2-node server cluster (nodes A and B) in either "distributed" or "replication" mode:

Example 1 (successful failover):

Nodes A/B both started
Init Hotrod client with static server list [A,B].
client.put("key1", "value1")
stop node "A"
client.get("key1")
- succeeds because client knows about "A" and "B", gets value from "B"
- also receives new topology view of [B], "A" dropped from pool
start node "A" again, eventually syncs up with "B"
client.get("key1")
- Note: This is the "magic call" I'll refer to in example 2
- get() call succeeds (value from A or B)
- client receives topo view update of [A,B]
stop node "B"
client.get("key1")
- succeeds, value pulled from "A"
- client receives topo update of [A], "B" dropped from pool

The problem stems from the fact that the topology updates are only received in reply to a client request. In the scenario described above, if one removes the "magic call" noted above, the failover mechanism breaks.

Example 2 (failover breakage):

Nodes A/B both started
Init Hotrod client with static server list [A,B].
client.put("key1", "value1")
stop node "A"
client.get("key1")
- succeeds because client knew about "A" and "B", gets value from "B"
- client also receives new topology view of [B], "A" dropped from pool
start node "A" again, eventually syncs up with "B"
stop node "B"
- now, since there have been no intervening client calls, the client has a topo view of [B] only.
client.get("key1")
- this fails, because the client doesn't know that "A" is available, and "B" is stopped.

So what is the best solution to this problem? I've thought of a few imperfect approaches (enumerated below), but I'd love to hear the team's recommendations.

In this simple 2-node view, one could simply disable topology awareness in the client, and stick with the static list indefinitely. Has obvious drawbacks.
Perhaps the client's topology view should be reverted to the originally-defined static list in the event that all the nodes in the dynamically-updated topology view become unavailable.
If the node stop/starts were intentional operations (rolling restarts for maintenance, or whatever) one could wire up some mechanism by which all of the HotRod clients are told to ping for a topology update each time the topology changes. This could be done via JMS, a JGroups-aware client or probably a hundred other ways, but it'd be nice to have a relatively standard implementation available out-of-the-box rather than everybody having to devise a custom solution.
Couldn't the HotRod protocol be augmented to allow a "push" of updated topology views from the server? Just a response-only message with a new opcode indicating a topology push and ignore the message id?

I look forward to hearing your thoughts.

1. Re: Should Hotrod topo updates remove initial statically-configured servers? (5.1b4)

galder.zamarreno Nov 16, 2011 5:22 AM (in response to inetdevboy)
@inetdevboy, very interesting post, thanks!! Onto your comments:
inetdevboy wrote:

The problem stems from the fact that the topology updates are only received in reply to a client request. In the scenario described above, if one removes the "magic call" noted above, the failover mechanism breaks.
That's the way it's currently designed
inetdevboy wrote:
Perhaps the client's topology view should be reverted to the originally-defined static list in the event that all the nodes in the dynamically-updated topology view become unavailable.
I think this is a good idea, the client could easily fallback on this. Would you mind creating a jira for it (https://issues.jboss.org/browse/ISPN)? You could even submit a patch for it! Should be very simple to implement

inetdevboy wrote:

Couldn't the HotRod protocol be augmented to allow a "push" of updated topology views from the server? Just a response-only message with a new opcode indicating a topology push and ignore the message id?
We might consider this in the future cos we're planning of doing remote notifications from the server back to clients, for example when some data is updated. With this, clients can easily build L1 caches which can be very powerful in the near cache patterns.
Actions