I've been working on cluster nodes connection / reconnection today.
when a node use discovery, it usually awaits to receive a UDP broadcast to connect to another nodes and joins the cluster.
However for the 1st server, it is possible it will never receive a UDP broadcast if no other servers is started before its discoveryInitialTimeoutWait expires.
Or when a node becomes the last node of a cluster, it must again listen to UDP broadcast to be notified when another node is UP.
If that happens, we check when the server locator of its cluster connections is notified of a UDP broadcast (ServerLocatorImpl.connectorsChanged()).
If the node has not received any topology, we then call its connect() method to connect to the broadcasting node and triggers the cluster formation.
We need to make sure that the server locator knows when it is part of a cluster (it has received a topology) or when it is alone (it is the only member of the topology). If it is already part of a cluster, we do *not* connect to other broadcasting node (as we are informed of the cluster topology through the connection).
Static connector case:
If a node has a static list of connectors and is notified that a node is down, it creates a runnable which will try to reconnect to the node down.
When the node is UP again, the reconnection will trigger the node notification so that the node UP will be part of the cluster again.
It kind of works but the code is ugly and needs to be cleaned up. I create session factory to connect to other nodes and trigger the node notifications but the factories are not properly handled (when should they be closed?)
I also creates Runnable inside the ServerLocator in the static connector case to try infinitely to reconnect to the static connector list. The runnable should not interfer with the created resources (esp. when the server locator is on a client side)
The use case are not 100% clear in my mind. I'll clarify them to be sure we handle correctly all cases (discovery/static, first node, last node, etc.)