2 Replies Latest reply on Aug 23, 2010 9:49 AM by jmesnil

    HA test failures and remaining tasks

    jmesnil

      looking at the code and at test that still fails in the distribution packages, it looks like these parts are still missing:

       

      * multiple cluster connections (OnewayTwoNodeClusterTest.testMultipleClusterConnections)

      * related to this, I don't understand the comment in ClusterManagerImpl.announceNode() about working with more than one cluster connection.

        I assume that a node can still have multiple cluster connections and belongs to different separate clusters, i.e. the test is valid and the code does 

        not work

      * I have failing tests with symmetric cluster test with backups which triggers failover (when they should not, I need to investigate)

      * OneWayTwoNodeCluster.testRouteWhenNoConsumersTrueNonBalancedQueues fails with messages out of order

      * in ClusteredGroupingTest, testGroupingSendTo3queuesPinnedNodeGoesDownSendBeforeStop and  testGroupingSendTo3queuesPinnedNodeGoesDownSendAfterRestart fail

       

      I do not understand the comment in the code that was handed over in ClusterConnectionImpl.nodeUp():

       

                  //if (!connectorPair.a.equals(record.getBridge().getForwardingConnection().getTransportConnection()))

                  // {

                  //   // New live node - close it and recreate it - TODO - CAN THIS EVER HAPPEN?

                  //}

       

      What's supposed to be compared here?

      I suppose we want to check that connector of the existing node is the same thant the one coming from the notification.

      This could happen in the discovery case since we will received a UP notifications from all the other nodes in the cluster. In that case, we create it the first time and do nothing after that.

       

      Other remaining tasks that needs to be done:

       

      * proper lifecycle for session factories used by cluster connections. In ServerLocator.connect(), we call createSessionFactory(config) to be able to connect to other nodes in the initial list of connectors but the factory is never closed. Ditto in ClusterConnectionBridge.connectionFailed()

       

      Another task that will take some time: rewrite the HA chapter in the cluster doc + description of the new schema for cluster-connection

        • 1. Re: HA test failures and remaining tasks
          jmesnil

          Jeff Mesnil wrote:

           

          looking at the code and at test that still fails in the distribution packages, it looks like these parts are still missing:

           

          * multiple cluster connections (OnewayTwoNodeClusterTest.testMultipleClusterConnections)

          * related to this, I don't understand the comment in ClusterManagerImpl.announceNode() about working with more than one cluster connection.

            I assume that a node can still have multiple cluster connections and belongs to different separate clusters, i.e. the test is valid and the code does 

            not work

          I have fixed the test setup but I still don't understand the comment in ClusterManagerImpl.announceNode().

          • 2. Re: HA test failures and remaining tasks
            jmesnil

            To get a better estimate on what remains to be done, I'm trying to run the whole integration test suite.

             

            There are failures in bridges, client packages which are due to changing expectations.

             

            In the new HA code, we can no longer expect the core remoting connection to be closed when all the client session are closed. The client session factory must now be explicitely closed (btw, we should make this clear in the release notes or HornetQ users will have leakage when going from 2.1.2 to 2.2)

             

            Tests with live and backup servers with share stored running in the same VM must be updated to use a FakeLock to run them side by side without issues.