7 Replies Latest reply on Jul 26, 2010 9:15 AM by timfox

    nodes notifications in new HA code

    jmesnil

      I am trying to fix the new HA code so that nodes are properly notified when others nodes are up and down but I'm not understanding how this is supposed to work.

       

      At the moment, I consider only the case where a node is properly shutdown: the other nodes must be notified that it is DOWN. When it is restarted, the other nodes must be notified that it is UP.

       

      From the code, I understand that other nodes are notified of node status change when they receive a CLUSTER_TOPOLOGY packet (in ClientSessionFactoryImpl).

      These packets are sent by the node when it is notified of nodeUp/nodeDow (in CoreProtocolManager). But these notifications are for other nodes, not for the node itself, isn't it?

       

      There is a packet NodeAnnounceMessage which has been written but it never used (it's commented in CoreProtocolManager and nobody sends it). Is it supposed to be used or is it a leftover from the prototype?

       

      I don't understand in ClusterManager why we distinguish between clientListeners and clusterConnectionListeners. They are always notified together.

      In fact, in the code, there is no call to SubscribeClusterTopologyUpdatesMessage with clusterConnection set to true. When is a cluster connection supposed to subscribe to topology updates?

       

      Intuitively, I would expect that when a node is about to be shut down, its cluster manager notifies all the cluster connections to send a NODE_DOWN messages to the other nodes. When these nodes receive the message, they close their cluster connection (what about the underlying connection, it is likely the node will be down at that time and the call to close the connection will block?).

      When the node is up again, its cluster manager will connect to the other nodes. These other nodes will be notified that the NODE is up and their cluster manager will notify its cluster connections that they must create a new bridge to connect to this node.

        • 1. Re: nodes notifications in new HA code
          timfox

          Jeff Mesnil wrote:

           


          These packets are sent by the node when it is notified of nodeUp/nodeDow (in CoreProtocolManager). But these notifications are for other nodes, not for the node itself, isn't it?

          They are for the other nodes AND the node itself.

           


           

          There is a packet NodeAnnounceMessage which has been written but it never used (it's commented in CoreProtocolManager and nobody sends it). Is it supposed to be used or is it a leftover from the prototype?

          I don't think it's needed any more.

           


           

          I don't understand in ClusterManager why we distinguish between clientListeners and clusterConnectionListeners. They are always notified together.

          In fact, in the code, there is no call to SubscribeClusterTopologyUpdatesMessage with clusterConnection set to true. When is a cluster connection supposed to subscribe to topology updates?

          A cluster connection listens on topology updates - that's how it knows which other nodes to connect to. Without that info it wouldn't know what nodes to connect to.

           

          We need to distinguish between cluster connections and normal client connections to prevent looping. E.g. when a notification comes in on a cluster connection, we only distribute it to the local connections, otherwise it would just loop around the network backwards and forwards.

          • 2. Re: nodes notifications in new HA code
            jmesnil

            Tim Fox wrote:

             

            A cluster connection listens on topology updates - that's how it knows which other nodes to connect to. Without that info it wouldn't know what nodes to connect to.

             

            When a cluster manager is started, its "local" cluster connections are added as cluster topology listener. fine...

            What I don't understand if when a cluster connection is supposed to send a SubscribeClusterTopologyUpdatesMessage to the other nodes.

             

            Tim Fox wrote:

             

            We need to distinguish between cluster connections and normal client connections to prevent looping. E.g. when a notification comes in on a cluster connection, we only distribute it to the local connections, otherwise it would just loop around the network backwards and forwards.

             

            I did not understand your explanation.

            Do you mean that when the notification is emitted by a cluster connection (more accurately by its server locator), it is only distributed to the client listeners and not to the cluster connection listeners?

             

            What about writing some design documentation? about how things are supposed to work, about the roles and responsibilities of HA components (cluster manager, cluster connection, server locator), how subscription and updates are supposed to work. We could also have use cases (a node joins, a node leaves, etc.) to translate as tests later on.

            I'd help tremendously to fill the gap between the current code and the way it is supposed to be.

            • 3. Re: nodes notifications in new HA code
              timfox

              Jeff Mesnil wrote:


              When a cluster manager is started, its "local" cluster connections are added as cluster topology listener. fine...

              What I don't understand if when a cluster connection is supposed to send a SubscribeClusterTopologyUpdatesMessage to the other nodes.

              When a cluster connection starts, it uses it's initial list of servers to make a connection to one node of the cluster, it then sends a SubscribeClusterTopologyUpdateMessage (you can see this in the ClientSessionFactoryImpl.getConnection method - the code is already there) and receives the current topology. It then creates a bridge to each node of the cluster

               


              I did not understand your explanation.

              Do you mean that when the notification is emitted by a cluster connection (more accurately by its server locator), it is only distributed to the client listeners and not to the cluster connection listeners?


              Imagine if a new node joins the cluster. When this new node joins it will announce it's presence (see the announceNode() method in ClusterManagerImpl), when the other cluster connections receive this notification, they in turn will distribute the notification to their listeners - which are local client connections and cluster connections. In this case the receiving node *does not* want to distribute the notification to it's own cluster connection listeners since it will just end up back at the node which sent it, which will distribute it to its listeners - i.e. it will end up an infinite loop.

              • 4. Re: nodes notifications in new HA code
                timfox

                Jeff Mesnil wrote:


                I'd help tremendously to fill the gap between the current code and the way it is supposed to be.

                95% of the code is already there. A few loose ends need joining up.

                • 5. Re: nodes notifications in new HA code
                  jmesnil

                  Tim Fox wrote:

                   

                  Jeff Mesnil wrote:


                  When a cluster manager is started, its "local" cluster connections are added as cluster topology listener. fine...

                  What I don't understand if when a cluster connection is supposed to send a SubscribeClusterTopologyUpdatesMessage to the other nodes.

                  When a cluster connection starts, it uses it's initial list of servers to make a connection to one node of the cluster, it then sends a SubscribeClusterTopologyUpdateMessage (you can see this in the ClientSessionFactoryImpl.getConnection method - the code is already there) and receives the current topology. It then creates a bridge to each node of the cluster

                  Looking at the code, the client session factory sends a SubscribeClusterTopologyUpdateMessage with clusterConnection set to false.

                  The cluster will register the listener as a "client" topology cluster listener and has no idea whether its coming from a client or a cluster connection.

                   

                  Now, if you tell me that cluster connection should set a clusterConnection attribute on its client session factory (or preferably on its server locator) so that it sends a SubscribeClusterTopologyUpdateMessage with clusterConnection set to true, fine.

                  • 6. Re: nodes notifications in new HA code
                    jmesnil

                    Tim Fox wrote:

                     

                    95% of the code is already there. A few loose ends need joining up.

                    95% of passing tests is the metric I'm looking for

                    • 7. Re: nodes notifications in new HA code
                      timfox

                      Jeff Mesnil wrote:

                       


                       

                      Now, if you tell me that cluster connection should set a clusterConnection attribute on its client session factory (or preferably on its server locator) so that it sends a SubscribeClusterTopologyUpdateMessage with clusterConnection set to true, fine.

                      Yes, if it's a cluster connection then the  SubscribeClusterTopologyUpdateMessage should be sent with the clusterConnection attribute set to true.