5 Replies Latest reply on Mar 20, 2009 7:40 PM by clebert.suconic

    Duplicated Messages during failover and NullPersistence

    clebert.suconic

      I have made AutomaticFailoverWithDiscoveryTest to use real files on the journal.


      If you make it use NullPersistence, and place testFailover on a loop, messages will be duplicated.


      I moved the setupGroup to a super class (FailoverTestBase).

      To replicate this issue, change the setUp to:

      protected void setUp() throws Exception
       {
       super.setUp();
       setupGroupServers(true, "bc1", 5432, groupAddress, groupPort);
       }
      




        • 1. Re: Duplicated Messages during failover and NullPersistence
          clebert.suconic

          Just to complete the thread, this failure I was talking about is related to an intermitent failure that has happened on hudson.

          It doesn't fail if using real files (*probably* duplicate detection behaves different when using real files).

          This is the diff to replicate the issue.

          Index: tests/src/org/jboss/messaging/tests/integration/cluster/failover/AutomaticFailoverWithDiscoveryTest.java
          ===================================================================
          --- tests/src/org/jboss/messaging/tests/integration/cluster/failover/AutomaticFailoverWithDiscoveryTest.java (revision 6120)
          +++ tests/src/org/jboss/messaging/tests/integration/cluster/failover/AutomaticFailoverWithDiscoveryTest.java (working copy)
          @@ -63,6 +63,19 @@
           // Constructors --------------------------------------------------
          
           // Public --------------------------------------------------------
          +
          + public void testRepeat() throws Exception
          + {
          + for (int i = 0; i < 100; i++)
          + {
          + if (i > 0)
          + {
          + tearDown();
          + setUp();
          + }
          + testFailover();
          + }
          + }
          
           public void testFailover() throws Exception
           {
          @@ -173,7 +186,7 @@
           protected void setUp() throws Exception
           {
           super.setUp();
          - setupGroupServers(true, "bc1", 5432, groupAddress, groupPort);
          + setupGroupServers(false, "bc1", 5432, groupAddress, groupPort);
           }
          
           @Override
          


          • 3. Re: Duplicated Messages during failover and NullPersistence
            clebert.suconic

            Actually.. this doesn' t have anything to do with Persistence & NullPersistence


            If I wait 17ms (as done on MultiThreadfailoverTest), the test never fails.

            • 4. Re: Duplicated Messages during failover and NullPersistence
              clebert.suconic

              I mean...


              If I wait 17 ms between the backup and live start, this issue never happens.

              • 5. Re: Duplicated Messages during failover and NullPersistence
                clebert.suconic

                The issue was related to the time-components on the IDs for sure.

                I added a test PreserveOrderDuringFailoverTest, which is based on AutomaticFailoverWithDiscoveryTest.

                If you uncomment some code on PreserveOrderDuringFailoverTest, this issue will aways happen:

                // This test would fail if both servers have the same time component
                // NullStorageManager storageManagerLive = (NullStorageManager)liveService.getServer().getStorageManager();
                // TimeAndCounterIDGenerator idgeneratorlive = (TimeAndCounterIDGenerator)storageManagerLive.getIDGenerator();
                //
                // NullStorageManager storageManagerBackup = (NullStorageManager)backupService.getServer().getStorageManager();
                // TimeAndCounterIDGenerator idgeneratorBackup = (TimeAndCounterIDGenerator)storageManagerBackup.getIDGenerator();
                //
                // idgeneratorBackup.setInternalDate(0);
                // idgeneratorlive.setInternalDate(0);
                



                I would expect the IDs not affecting failover any more, so I would debug this but since this part is already changed at Tim's workspace, I will leave this alone.

                Tim: If you could please remove the wait on FailoverTestBase during your commit. Since you have changed the ID logic, we won't need the wait any more.


                 backupService.start();
                
                - Thread.sleep(20);
                
                 Configuration liveConf = new ConfigurationImpl();