11 Replies Latest reply on Feb 12, 2009 10:20 AM by jmesnil

    split-brain between live and backup node

    jmesnil

      this is related to https://jira.jboss.org/jira/browse/JBMESSAGING-1194

      Starting with the simplest case, it appears that we can very easily have a split-brain between a live node and its backup node.

      1. normal use case

      C1 & C2 are connected to live node L1
      L1 is replicated on the backup node B1

       C1 - - - B1
       \ |
       \ |
       \ |
       \ |
       L1*
      
      *: live node
      


      2. Backup activation is triggered by the 1st client connection to the backup node

      Network cable is unplugged from L1
      1. C1 will failover to B1
      2. B1 will be activated and becomes live
      3. L1 is still alive
      4. L1 will be informed that the connection to B1 is dead
      -> it will stop replicate to B1 but it remains alive
      => 2 live nodes: split-brain!

       C1 - - - B1*
       X X
       X X
       X X
       X X
       L1*
      


      3. in case the network connection failure was transient (someone replugs the cable in C1)

      1. another client C2 joins and connects to L1 as its live node
      2. C1 will continue to use BA as its live node

       C1 - - - B1*
       /
       /
       /
       /
       C2 - - - L1*
      


      To sum up, we can reach a split-brain after a transient network failure on the live node.

      In the code, I've not seen any heartbeat between the live node and the backup node. L1 will never be informed when it can reach B1 again.
      B1 as the original backup never checks connectivity to the original live node L1




        • 1. Re: split-brain between live and backup node
          timfox

           

          "jmesnil" wrote:

          In the code, I've not seen any heartbeat between the live node and the backup node. L1 will never be informed when it can reach B1 again.
          B1 as the original backup never checks connectivity to the original live node L1



          There is a heartbeat, like any normal connection,

          Also there is a test where we fail the backup server intermittently and then let it come back up: ReconnectWithBackupTest

          • 2. Re: split-brain between live and backup node
            timfox

            Actually its FailBackupServerTest

            • 3. Re: split-brain between live and backup node
              jmesnil

               

              "timfox" wrote:
              "jmesnil" wrote:

              In the code, I've not seen any heartbeat between the live node and the backup node. L1 will never be informed when it can reach B1 again.
              B1 as the original backup never checks connectivity to the original live node L1



              There is a heartbeat, like any normal connection


              yes, bad wording of mine.
              I meant that once L1 has been informed that it is no longer connected to B1, it will not be informed when the connection is up again.

              I'll write a test case to illustrate this case.

              • 4. Re: split-brain between live and backup node
                timfox

                If the backup server really has gone down, and is brought back up, it can no longer be the backup server, since it's state is no longer synchronized with the live.

                If it's a transitory failure of short enough duration, then we can just add reconnection attempts like any other connection. However while the server is down any replications will be blocked which might cause clients to block and timeout.

                • 5. Re: split-brain between live and backup node
                  jmesnil

                  that's not the use case I described: the backup server never goes down.

                  My use case is that the live node server is temporarily offline and a client fails over to the backup server. What happens when the live node is online again and clients connect to it?

                  • 6. Re: split-brain between live and backup node
                    timfox

                    Yes, we know that split brain can occur, that's why the task exists in the first place!

                    The task is to prevent duplicate deliveries in a split brain situation, not try and convince me that it can occur, I already know that...;)

                    • 7. Re: split-brain between live and backup node
                      jmesnil

                      Using FailBackupServerTest, I've written a test to reach a split-brain test but it is not working as expected.

                      It's very similar to FailBackupServerTest except that I failover the session to the backup node before failing all the replication channels of the live node.
                      The failover is successful but when I create a new session on live node, the code blocks when creating the session:

                      ClientSessionFactoryInternal sf2 = new ClientSessionFactoryImpl(
                       new TransportConfiguration(InVMConnectorFactory.class.getName()),
                       new TransportConfiguration(InVMConnectorFactory.class.getName(),
                       backupParams));
                      sf2.setSendWindowSize(32 * 1024);
                      
                      ClientSession session2 = sf2.createSession(false, true, true);
                      


                      Tim, from what you said today on the logs, you were expecting it to succeed, right?

                      I let the code as is but when debugging, I saw that the replicating connections on live node are destroyed when the client failover to the backup node.
                      When the backup is activated, it closes its connection to the live node which in turn destroy the replication connections at the other side of the wire on the live node.



                      package org.jboss.messaging.tests.integration.cluster.failover;
                      
                      import java.util.HashMap;
                      import java.util.Map;
                      import java.util.Set;
                      import java.util.concurrent.CountDownLatch;
                      import java.util.concurrent.TimeUnit;
                      
                      import junit.framework.TestCase;
                      
                      import org.jboss.messaging.core.client.ClientConsumer;
                      import org.jboss.messaging.core.client.ClientMessage;
                      import org.jboss.messaging.core.client.ClientProducer;
                      import org.jboss.messaging.core.client.ClientSession;
                      import org.jboss.messaging.core.client.impl.ClientSessionFactoryImpl;
                      import org.jboss.messaging.core.client.impl.ClientSessionFactoryInternal;
                      import org.jboss.messaging.core.client.impl.ClientSessionImpl;
                      import org.jboss.messaging.core.config.Configuration;
                      import org.jboss.messaging.core.config.TransportConfiguration;
                      import org.jboss.messaging.core.config.impl.ConfigurationImpl;
                      import org.jboss.messaging.core.exception.MessagingException;
                      import org.jboss.messaging.core.logging.Logger;
                      import org.jboss.messaging.core.remoting.FailureListener;
                      import org.jboss.messaging.core.remoting.RemotingConnection;
                      import org.jboss.messaging.core.remoting.impl.invm.InVMAcceptorFactory;
                      import org.jboss.messaging.core.remoting.impl.invm.InVMConnectorFactory;
                      import org.jboss.messaging.core.remoting.impl.invm.InVMRegistry;
                      import org.jboss.messaging.core.remoting.impl.invm.TransportConstants;
                      import org.jboss.messaging.core.server.Messaging;
                      import org.jboss.messaging.core.server.MessagingService;
                      import org.jboss.messaging.integration.transports.netty.NettyAcceptorFactory;
                      import org.jboss.messaging.integration.transports.netty.NettyConnectorFactory;
                      import org.jboss.messaging.jms.client.JBossTextMessage;
                      import org.jboss.messaging.util.SimpleString;
                      
                      /**
                       *
                       * A FailBackupServerTest
                       *
                       * Make sure live sever continues ok if backup server fails
                       *
                       * @author <a href="mailto:tim.fox@jboss.com">Tim Fox</a>
                       *
                       * Created 6 Nov 2008 11:27:17
                       *
                       *
                       */
                      public class FailLiveServerTest extends TestCase
                      {
                       private static final Logger log = Logger.getLogger(FailLiveServerTest.class);
                      
                       // Constants -----------------------------------------------------
                      
                       // Attributes ----------------------------------------------------
                      
                       private static final SimpleString ADDRESS = new SimpleString("FailLiveServerTest");
                      
                       private MessagingService liveService;
                      
                       private MessagingService backupService;
                      
                       private final Map<String, Object> backupParams = new HashMap<String, Object>();
                      
                       // Static --------------------------------------------------------
                      
                       // Constructors --------------------------------------------------
                      
                       // Public --------------------------------------------------------
                      
                       public void testFailBackup() throws Exception
                       {
                       ClientSessionFactoryInternal sf1 = new ClientSessionFactoryImpl(new TransportConfiguration(InVMConnectorFactory.class.getName()),
                       new TransportConfiguration(InVMConnectorFactory.class.getName(),
                       backupParams));
                      
                       sf1.setSendWindowSize(32 * 1024);
                      
                       ClientSession session1 = sf1.createSession(false, true, true);
                      
                       session1.createQueue(ADDRESS, ADDRESS, null, false, false);
                      
                       ClientProducer producer = session1.createProducer(ADDRESS);
                      
                       final int numMessages = 1000;
                      
                       for (int i = 0; i < numMessages; i++)
                       {
                       ClientMessage message = session1.createClientMessage(JBossTextMessage.TYPE,
                       false,
                       0,
                       System.currentTimeMillis(),
                       (byte)1);
                       message.putIntProperty(new SimpleString("count"), i);
                       message.getBody().putString("aardvarks");
                       message.getBody().flip();
                       producer.send(message);
                       }
                      
                       ClientConsumer consumer1 = session1.createConsumer(ADDRESS);
                      
                       session1.start();
                      
                       for (int i = 0; i < numMessages; i++)
                       {
                       ClientMessage message = consumer1.receive(1000);
                      
                       assertNotNull(message);
                      
                       assertEquals("aardvarks", message.getBody().getString());
                      
                       assertEquals(i, message.getProperty(new SimpleString("count")));
                      
                       if (i == 0)
                       {
                       ///// DIFF WITH FailBackupServerTest /////
                       log.info("Failing client connection");
                       assertEquals("invm:0", ((ClientSessionImpl)session1).getConnection().getRemoteAddress());
                       ((ClientSessionImpl)session1).getConnection().fail(new MessagingException(MessagingException.NOT_CONNECTED, "simulated failure b/w client and live node"));
                       assertEquals("must have failed over on backup node", "invm:1", ((ClientSessionImpl)session1).getConnection().getRemoteAddress());
                       ///// END OF DIFF WITH FailBackupServerTest /////
                      
                       // Fail all the replicating connections - this simulates the backup server crashing
                      
                       Set<RemotingConnection> conns = liveService.getServer().getRemotingService().getConnections();
                      
                       for (RemotingConnection conn : conns)
                       {
                       log.info("Failing replicating connection");
                       assertEquals("invm:1", conn.getReplicatingConnection().getRemoteAddress());
                       System.out.println(conn.getReplicatingConnection().getID());
                       conn.getReplicatingConnection().fail(new MessagingException(MessagingException.NOT_CONNECTED, "simulated failure b/w live and backup node"));
                       }
                       }
                      
                       message.acknowledge();
                       }
                      
                       assertEquals("invm:1", ((ClientSessionImpl)session1).getConnection().getRemoteAddress());
                      
                       ClientMessage message = consumer1.receive(1000);
                      
                       assertNull(message);
                      
                       // Send some more
                      
                       for (int i = 0; i < numMessages; i++)
                       {
                       message = session1.createClientMessage(JBossTextMessage.TYPE, false, 0, System.currentTimeMillis(), (byte)1);
                       message.putIntProperty(new SimpleString("count"), i);
                       message.getBody().putString("aardvarks");
                       message.getBody().flip();
                       producer.send(message);
                       }
                      
                       for (int i = 0; i < numMessages; i++)
                       {
                       message = consumer1.receive(1000);
                      
                       assertNotNull(message);
                      
                       assertEquals("aardvarks", message.getBody().getString());
                      
                       assertEquals(i, message.getProperty(new SimpleString("count")));
                      
                       message.acknowledge();
                       }
                      
                       message = consumer1.receive(1000);
                      
                       assertNull(message);
                      
                       // create another client session factory
                      
                       ClientSessionFactoryInternal sf2 = new ClientSessionFactoryImpl(new TransportConfiguration(InVMConnectorFactory.class.getName()),
                       new TransportConfiguration(InVMConnectorFactory.class.getName(),
                       backupParams));
                      
                       sf2.setSendWindowSize(32 * 1024);
                      
                       ClientSession session2 = sf2.createSession(false, true, true);
                      
                       assertEquals("invm:0", ((ClientSessionImpl)session2).getConnection().getRemoteAddress());
                      
                       session1.close();
                       session2.close();
                      
                       }
                      
                       // Package protected ---------------------------------------------
                      
                       // Protected -----------------------------------------------------
                      
                       @Override
                       protected void setUp() throws Exception
                       {
                       Configuration backupConf = new ConfigurationImpl();
                       backupConf.setSecurityEnabled(false);
                       backupParams.put(TransportConstants.SERVER_ID_PROP_NAME, 1);
                       backupConf.getAcceptorConfigurations()
                       .add(new TransportConfiguration(InVMAcceptorFactory.class.getName(),
                       backupParams));
                       backupConf.setBackup(true);
                       backupService = Messaging.newNullStorageMessagingService(backupConf);
                       backupService.start();
                      
                       Configuration liveConf = new ConfigurationImpl();
                       liveConf.setSecurityEnabled(false);
                       liveConf.getAcceptorConfigurations()
                       .add(new TransportConfiguration(InVMAcceptorFactory.class.getName()));
                       Map<String, TransportConfiguration> connectors = new HashMap<String, TransportConfiguration>();
                       TransportConfiguration backupTC = new TransportConfiguration(InVMConnectorFactory.class.getName(),
                       backupParams, "backup-connector");
                       connectors.put(backupTC.getName(), backupTC);
                       liveConf.setConnectorConfigurations(connectors);
                       liveConf.setBackupConnectorName(backupTC.getName());
                       liveService = Messaging.newNullStorageMessagingService(liveConf);
                       liveService.start();
                       }
                      
                       @Override
                       protected void tearDown() throws Exception
                       {
                       backupService.stop();
                      
                       liveService.stop();
                      
                       assertEquals(0, InVMRegistry.instance.size());
                       }
                      
                       // Private -------------------------------------------------------
                      
                       // Inner classes -------------------------------------------------
                      }
                      
                      


                      • 8. Re: split-brain between live and backup node
                        timfox

                        When those replicating connections are closed, the handler in RemotingConnectionImpl:

                         private class ReplicatingConnectionFailureListener implements FailureListener
                         {
                         public boolean connectionFailed(final MessagingException me)
                         {
                         synchronized (RemotingConnectionImpl.this)
                         {
                         for (Channel channel : channels.values())
                         {
                         channel.replicatingChannelDead();
                         }
                         }
                        
                         return true;
                         }
                         }
                        


                        Should be called.

                        This tells the live channels there is no more replicating channel and they should continue without replicating.

                        Is this being called?

                        If not, then that's where you need to debug

                        • 9. Re: split-brain between live and backup node
                          timfox

                          In MessagingServerImpl:

                           private static class NoCacheConnectionLifeCycleListener implements ConnectionLifeCycleListener
                           {
                           private RemotingConnection conn;
                          
                           public void connectionCreated(final Connection connection)
                           {
                           }
                          
                           public void connectionDestroyed(final Object connectionID)
                           {
                           if (conn != null)
                           {
                           conn.destroy();
                           }
                           }
                          
                           public void connectionException(final Object connectionID, final MessagingException me)
                           {
                           if (conn != null)
                           {
                           conn.fail(me);
                           }
                           }
                           }
                          


                          You can see that when the connection is destroyed (closed as opposed to failed) then the failure handler won't be called, so the channel never knows its replicating channel is dead.

                          • 10. Re: split-brain between live and backup node
                            jmesnil

                            from #jbossmessaging channel http://www.antwerkz.com/javabot/javabot/home/3/%23jbossmessaging/2/11/1/02/0/2009/:

                            [14:33] jbossfox: i solved the problem
                            [14:33] jbossfox: jmesnil: it's a combination of a few things
                            [14:33] jbossfox: jmesnil: firstly - the issue i explained on the forums
                            [14:33] jbossfox: jmesnil: when the client fails over
                            [14:33] jbossfox: jmesnil: this results in the backup server closing the replicating connection from the live server
                            [14:34] jbossfox: jmesnil: but this closure forces a ConectionDestroyed, not ConectionException
                            [14:34] jbossfox: jmesnil: in NoCacheConnectionLifeCycleListener
                            [14:34] jbossfox: jmesnil: so the replicating channel is never set to null
                            [14:34] jbossfox: jmesnil: basically like i explained on the forums
                            [14:34] jbossfox: jmesnil: but on top of that
                            [14:35] jbossfox: jmesnil: there was a bug where the replicatingconnection wasn't being set to null
                            [14:35] jbossfox: jmesnil: only the replicating channel
                            [14:35] jbossfox: jmesnil: so subsequent createsession attempts would attempt to replicate still
                            [14:35] jbossfox: jmesnil: and hang
                            [14:35] jmesnil: jbossfox: ah, ok, that's where the replicating channel was coming from
                            [14:35] jbossfox: jmesnil: yeah it was creating a new connection to the backup
                            [14:35] jbossfox: jmesnil: and then...
                            [14:36] jbossfox: jmesnil: a further bug
                            [14:36] jbossfox: jmesnil: where the server still thinks it has a backup
                            [14:36] jbossfox: jmesnil: even after the replicating connection fails
                            [14:36] jbossfox: jmesnil: so when getreplicatingconnection is called it creates a new one
                            [14:36] jbossfox: jmesnil: so i set backupconnectorFactory to null in this case
                            [14:36] jbossfox: jmesnil: then.....
                            [14:37] jbossfox: jmesnil: there is a further interesting quirk
                            [14:37] jbossfox: jmesnil: if backupconnectorfactory is not null then when getreplicatingconnection is called it will create a new connection
                            [14:37] jbossfox: jmesnil: so it was basically just creating a new backup connection to the backup
                            [14:37] jbossfox: jmesnil: and then....
                            [14:38] jbossfox: jmesnil: when you create a new consumer on the live, you've got to remember the old server side consumer from before client failove is still there!
                            [14:38] jbossfox: jmesnil: so any messages will be round robin'd between them
                            


                            • 11. Re: split-brain between live and backup node
                              jmesnil

                              moving forward...

                              The test SplitBrainTest.testDemonstrateSplitBrain shows how to reach a split-brain tests where the same messages are consumed by 2 different consumers.

                              To prevent this split-brain to occur where live node remains active once the backup node has been activated, the strategy would be:


                              when the live node lose its replicating connection
                              - this can be because the backup node has been activated or crashed or the network is cut b/w the live and backup node
                              - to check if the live node is isolated or not, it sends a messages to other nodes
                              - if it reaches the quorum, it stays alive
                              - else, it has been cut from both the backup and the other cluster nodes, it kills itself => the backup is the only active node


                              However, this won't solve the split-brain which may occur when the network is cut between the live & backup nodes but the live node remains connected to other cluster nodes.
                              In that case, the live node will reach the quorum and remain active while the backup node has also been activated.

                              What is the required quorum?

                              The simplest solution is to have a majority of members; the members being:
                              - the live node
                              - the backup node
                              - the other live nodes of the cluster

                              Given the special relation between the backup and the live node, the live node should pay special attention to a response from the backup:

                              - if the backup does not reply => the network is still cut between the live and backup node or the backup node is crashed
                              - if it replied => the network failure was transient. In that case, the backup response should include a "active" boolean
                              - if the backup node is active, the live node should kill itself
                              - else, the live node can continue to live (and perhaps it can also reopen its replicating connection to the backup)


                              If there are no other nodes in the cluster, we can't apply this strategy.

                              Another thing worth mentioning: the live & backup should be on the same LAN while the other cluster nodes may be on a WAN.

                              To sum up, I need to think about it more...