13 Replies Latest reply on Aug 18, 2010 9:41 AM by timfox

    Node notification propagation in a cluster

    jmesnil

      I am working on the HA branch on the case where discovery is used to get the initial connection used to form the cluster.

      Let’s take the example of a symmetric cluster of 3 nodes #0, #1, #2. Let’s assume that the initial connection for both nodes #1 and #2 will be node #0 which has broadcasted first.

      From this single initial connections, both nodes #1 & #2 must connect to each other to form the cluster. The nodes notifications will only be propagated through regular connections.

      I’ve created a drawing to give an idea of the sequences of events and notifications:

      http://bit.ly/sym_cluster_notifs

      The diagram is not purely sequential, some events happen in parallel but I hope it’ll clarify what I am doing.

      The important part are:

      • when a node connects to another one (regardless of the connection type, regular or cluster), the target node will send back its topology to the source node
      • when a node connects to another one with a cluster connection, it announces its presence to the target node which adds it to its topology and notifies its own listeners
      • when a cluster connection is notified that a node is UP, it creates a bridge to the node (more on that later) and propagate the node notification.

      It is possible for a node notification to go through many hops before reaching another node to update its topology.

      • When a node is notified of its own status, it discards the notification and stops its propagation
      • When a node notification has a distance > at the topology size, it is discarded. This is the case where node #0 notifies node #1 that node #2 is UP. In turn, node #1 notifies node #0 (since it’s a symmetric cluster), etc.

      Andy modified the code to handle chain cluster. In that case, we do not create a bridge on a cluster connection if the distance is > 1. Using discovery to create a symmetric cluster, it is possible that the notification will have a greater distance to reach all nodes. We need to add an attribute when a node is UP to distinguish between creating connections only to direct neighbors (if distance == 1) or to all clusters.

      • if “directConnections” is true, we create bridge only if distance == 1
      • if “directConnections” is false, we create bridge in any case
      • regardless of the directConnections attributes, we do not propagate a notification whose distance > topology size

      Please note that the only place which triggers a sequence of node notifications in CoreProtocolManager when a NODE_ANNOUNCE packet is received. In that case, we start notifying that a node is UP from a distance == 1. When we propagate the notifications to the node’s listeners, we increment the distance.

        • 1. Re: Node notification propagation in a cluster
          timfox

          Sounds about right.

           

          One question, what's the use case for setting directConnections=false? Is there any value in creating cluster connections to nodes more than one hop away?

          • 2. Re: Node notification propagation in a cluster
            jmesnil

            Tim Fox wrote:

             

            Sounds about right.

             

            One question, what's the use case for setting directConnections=false? Is there any value in creating cluster connections to nodes more than one hop away?

            Yes, for example when a node using discovery group will join a cluster.

            Let's suppose we have a cluster with nodes #0, #1, #2, and #3.
            We start node #4 which receives a broadcast from node #2.
            Node #2 will propagate its topology to node 4 (i.e. [#0, #1, #2, #3])
            Node #4 should create cluster connections to node #0, #1 and #3 (in addition to #2) even though it has not been notified of their existence directly by them (their distance are > 1)
            "directConnections" (more accurately, should be "allowDirectConnectionsOnly") should be false by default and I see only a few cases where it should be true (for chained cluster only).
            • 3. Re: Node notification propagation in a cluster
              timfox

              Jeff Mesnil wrote:


              Yes, for example when a node using discovery group will join a cluster.

              Let's suppose we have a cluster with nodes #0, #1, #2, and #3.

              Is this a symmetric cluster, chain cluster (or something else) ?

               

              Jeff Mesnil wrote:

               

              Tim Fox wrote:

               

              Sounds about right.

               

              One question, what's the use case for setting directConnections=false? Is there any value in creating cluster connections to nodes more than one hop away?

              Yes, for example when a node using discovery group will join a cluster.

              Let's suppose we have a cluster with nodes #0, #1, #2, and #3.
              We start node #4 which receives a broadcast from node #2.
              Node #2 will propagate its topology to node 4 (i.e. [#0, #1, #2, #3])

              I don't understand this, if node 4 receives a UDP broadcast from node 2, then that means node 2 will form part of the list of initial connections for node 4, so node 4 will create a direct connection to it, it won't go through other nodes.

               

              Perhaps you could clarify the topology here?

              • 4. Re: Node notification propagation in a cluster
                jmesnil

                Tim Fox wrote:

                Jeff Mesnil wrote:

                 

                Yes, for example when a node using discovery group will join a cluster.

                Let's suppose we have a cluster with nodes #0, #1, #2, and #3.
                We start node #4 which receives a broadcast from node #2.
                Node #2 will propagate its topology to node 4 (i.e. [#0, #1, #2, #3])

                I don't understand this, if node 4 receives a UDP broadcast from node 2, then that means node 2 will form part of the list of initial connections for node 4, so node 4 will create a direct connection to it, it won't go through other nodes.

                 

                Perhaps you could clarify the topology here?

                We have a symmetric cluster with nodes #0, #1, 2 a, #3 using discovery groups

                 

                node #4 is started and its cluster manager is started.

                Since it is using discovery, it will wait to receive a UDP broadcast to know its initial connector

                it receives a UDP broadcast from #2 => node #4 initial connector is to #2

                node #4 will connect to node  #2, add itself as a cluster topology listeners on node #2

                In response, node #2 will notify node #4 of its topology (i.e. [#0, #1, #2, #3]).

                node #4 will also announce its presence to node #2 so that the bridge #2 -> #4 is created on node #2.

                 

                So the question is when are we supposed to create the bridges from #4 to the other nodes and vice-versa.

                 

                When node #2 notifies node #4 of its topology, node #4 can then create the bridges from node #4 to nodes #0, #1, #3.

                In turn when the nodes #0, #1 and #3 will be connected to node #4 and receives its node announcement, they will create bridges from them to node #4.

                At that point, node #4 has joined the cluster and all bridges have been formed.

                 

                To make this work, node #4 must be able to create bridges to nodes #0, #1 and #3 when it is notified of their presence by node #2.

                 

                The "allowsDirectConnectionsOnly" attribute is simpler than what I explained.

                For a cluster connection using a static list of initial connectors, it must create bridges to nodes that are direct neighbors to it (distance == 1)

                For a cluster connection using a discovery group, it must create bridges to any nodes regardless of the distance. It will have always one initial connector and will download the whole topology from it and connect to other nodes based on this topology.

                 

                Is that clearer?

                • 5. Re: Node notification propagation in a cluster
                  timfox

                  Jeff, I don't follow what you're getting at here.

                   

                  A node in a cluster will only ever create cluster connections to it's immediate neighbours, irrespective of the topology.

                   

                  I don't see the use case for nodes to create cluster connectiosn to other nodes more than one hop away.

                  • 6. Re: Node notification propagation in a cluster
                    ataylor

                    As far as i can see it this is what Jeff is saying.

                     

                    Discovery is only used to connect to the first node, after this the topology is downloaded via the clusterconnection. If this is the case then the initial node connected to will always be the only node 1 hop away.

                     

                    So if you have nodes 1,2,3 and 4 and 1 connects to 2 and downloads the topology, node 1 will see nodes 3 and 4 as 2 hops away and not make a cluster connection to these nodes. Jeff is that corrct?

                    • 7. Re: Node notification propagation in a cluster
                      timfox

                      Andy Taylor wrote:

                       

                      As far as i can see it this is what Jeff is saying.

                       

                      Discovery is only used to connect to the first node, after this the topology is downloaded via the clusterconnection. If this is the case then the initial node connected to will always be the only node 1 hop away.

                       

                      So if you have nodes 1,2,3 and 4 and 1 connects to 2 and downloads the topology, node 1 will see nodes 3 and 4 as 2 hops away and not make a cluster connection to these nodes. Jeff is that corrct?

                      I don't understand what you mean here. In a symmetric cluster all nodes are one hop away.

                      • 8. Re: Node notification propagation in a cluster
                        jmesnil

                        Tim Fox wrote:

                         

                        Andy Taylor wrote:

                         

                        As far as i can see it this is what Jeff is saying.

                         

                        Discovery is only used to connect to the first node, after this the topology is downloaded via the clusterconnection. If this is the case then the initial node connected to will always be the only node 1 hop away.

                         

                        So if you have nodes 1,2,3 and 4 and 1 connects to 2 and downloads the topology, node 1 will see nodes 3 and 4 as 2 hops away and not make a cluster connection to these nodes. Jeff is that corrct?

                        Yes, Andy, that's what I mean.

                         

                        Tim, what do you envision for the workflow when a node joins a symmetric cluster using discovery?

                        It starts with a single initial connector and needs to know all the nodes from the cluster to connect to them. It will know of them through this initial connector.

                        • 9. Re: Node notification propagation in a cluster
                          timfox

                          Jeff Mesnil wrote:

                           

                          Tim Fox wrote:

                           

                          Andy Taylor wrote:

                           

                          As far as i can see it this is what Jeff is saying.

                           

                          Discovery is only used to connect to the first node, after this the topology is downloaded via the clusterconnection. If this is the case then the initial node connected to will always be the only node 1 hop away.

                           

                          So if you have nodes 1,2,3 and 4 and 1 connects to 2 and downloads the topology, node 1 will see nodes 3 and 4 as 2 hops away and not make a cluster connection to these nodes. Jeff is that corrct?

                          Yes, Andy, that's what I mean.

                          You lost me here guys. In a symmetric cluster each node is just one hop away, that's the definition of a symmetric cluster.

                           

                          Why would node 1 see nodes 3 and 4 as two hops away? They're not.

                          • 10. Re: Node notification propagation in a cluster
                            timfox

                            My original question was what is the use case for this "directConnections" flag. I still don't see it.

                            • 11. Re: Node notification propagation in a cluster
                              jmesnil

                              Tim Fox wrote:

                               

                              You lost me here guys. In a symmetric cluster each node is just one hop away, that's the definition of a symmetric cluster.

                               

                              Why would node 1 see nodes 3 and 4 as two hops away? They're not.

                              aiui, you confuse 2 things.

                               

                              Sure, in a symmetric cluster, the nodes will be all be connected directly to each other.

                              But the node notifications that a node is UP will not necessarily come directly from the node.

                              In the discovery case, a node joining the cluster will download the topology from its single initial connector and connect *directly* to the other nodes even though their distance > 1.

                               

                              Or I have not understood at all how a symmetric cluster is supposed to be formed in the discovery case....

                              • 12. Re: Node notification propagation in a cluster
                                ataylor

                                node 0 discovers node 1 and creates an initial connection.

                                 

                                #0 -----> #1

                                 

                                node 1 is connected to node 2 so we see this

                                 

                                #0 -----> #1 ------> #2

                                 

                                At this point node 0 downloads the topology from node 1 which will be node 1 1 hop away but node 2 will be 2 hops away so a connection isnt created.

                                • 13. Re: Node notification propagation in a cluster
                                  timfox

                                  OK, I understand this use case now.

                                   

                                  So really there are two hops params which need to be configured for the cluster:

                                   

                                  1) The maximum number of hops that topology information for routing purposes is used (currently configured using the max-hops) param. For a symmetric cluster this would be 1, for a chain cluster it would be > 1

                                   

                                  2) The maximum number of hops that toplogy information for cluster connection formation purposes is used. For a symmetric cluster this would be 2, for a chain cluster this would be 1.