I am working on the HA branch on the case where discovery is used to get the initial connection used to form the cluster.
Let’s take the example of a symmetric cluster of 3 nodes #0, #1, #2. Let’s assume that the initial connection for both nodes #1 and #2 will be node #0 which has broadcasted first.
From this single initial connections, both nodes #1 & #2 must connect to each other to form the cluster. The nodes notifications will only be propagated through regular connections.
I’ve created a drawing to give an idea of the sequences of events and notifications:
The diagram is not purely sequential, some events happen in parallel but I hope it’ll clarify what I am doing.
The important part are:
- when a node connects to another one (regardless of the connection type, regular or cluster), the target node will send back its topology to the source node
- when a node connects to another one with a cluster connection, it announces its presence to the target node which adds it to its topology and notifies its own listeners
- when a cluster connection is notified that a node is UP, it creates a bridge to the node (more on that later) and propagate the node notification.
It is possible for a node notification to go through many hops before reaching another node to update its topology.
- When a node is notified of its own status, it discards the notification and stops its propagation
- When a node notification has a distance > at the topology size, it is discarded. This is the case where node #0 notifies node #1 that node #2 is UP. In turn, node #1 notifies node #0 (since it’s a symmetric cluster), etc.
Andy modified the code to handle chain cluster. In that case, we do not create a bridge on a cluster connection if the distance is > 1. Using discovery to create a symmetric cluster, it is possible that the notification will have a greater distance to reach all nodes. We need to add an attribute when a node is UP to distinguish between creating connections only to direct neighbors (if distance == 1) or to all clusters.
- if “directConnections” is true, we create bridge only if distance == 1
- if “directConnections” is false, we create bridge in any case
- regardless of the directConnections attributes, we do not propagate a notification whose distance > topology size
Please note that the only place which triggers a sequence of node notifications in CoreProtocolManager when a NODE_ANNOUNCE packet is received. In that case, we start notifying that a node is UP from a distance == 1. When we propagate the notifications to the node’s listeners, we increment the distance.