1 2 Previous Next 23 Replies Latest reply on Nov 24, 2010 10:10 PM by jombo

    Bridge reconnection stopping node rejoining cluster

    grantlittle

      Hi all,

       

      I am having an issue which I believe is due to the bridge attempting to reconnect after a cluster node (using a live\backup pair) outage.

       

      Scenario is:

      1. Start up a cluster with 2 live/backup pairs ie 4 servers.

      2. Kill the first primary server.

      3. Kill the first backup server.

      4. Sync data from the killed backup to the killed live server

      5. Start the first backup server

      6. Start the first live server.

       

      The scenario fails at step 6 with the following exception:

       

      [java] HornetQServer_1 err:java.lang.IllegalStateException: Incompletely deployed:
           [java] HornetQServer_1 err:
           [java] HornetQServer_1 err:DEPLOYMENTS IN ERROR:
           [java] HornetQServer_1 err:  Deployment "JMSServerManager" is in error due to: HornetQException[errorCode=104 message=Connected server is not a backup server]
           [java] HornetQServer_1 err:
           [java] HornetQServer_1 err:    at org.jboss.kernel.plugins.deployment.AbstractKernelDeployer.internalValidate(AbstractKernelDeployer.java:278)
           [java] HornetQServer_1 err:    at org.jboss.kernel.plugins.deployment.AbstractKernelDeployer.validate(AbstractKernelDeployer.java:174)
           [java] HornetQServer_1 err:    at org.hornetq.integration.bootstrap.HornetQBootstrapServer.bootstrap(HornetQBootstrapServer.java:158)
           [java] HornetQServer_1 err:    at org.jboss.kernel.plugins.bootstrap.AbstractBootstrap.run(AbstractBootstrap.java:83)
           [java] HornetQServer_1 err:    at org.hornetq.integration.bootstrap.HornetQBootstrapServer.run(HornetQBootstrapServer.java:116)
           [java] HornetQServer_1 err:    at org.hornetq.common.example.SpawnedHornetQServer.main(SpawnedHornetQServer.java:35)
           [java] HornetQServer_1 out:FAILED::Incompletely deployed:
           [java] java.lang.RuntimeException: server failed to start
           [java]     at org.hornetq.common.example.SpawnedVMSupport.spawnVM(SpawnedVMSupport.java:154)
           [java]     at org.hornetq.common.example.HornetQExample.startServer(HornetQExample.java:144)
           [java] HornetQServer_1 out:
           [java] HornetQServer_1 out:DEPLOYMENTS IN ERROR:
           [java] HornetQServer_1 out:  Deployment "JMSServerManager" is in error due to: HornetQException[errorCode=104 message=Connected server is not a backup server]
           [java] HornetQServer_1 out:
           [java]     at org.hornetq.jms.example.ClusteredStandaloneNodeRejoinExample.runExample(ClusteredStandaloneNodeRejoinExample.java:49)
           [java]     at org.hornetq.common.example.HornetQExample.run(HornetQExample.java:71)
           [java]     at org.hornetq.jms.example.ClusteredStandaloneNodeRejoinExample.main(ClusteredStandaloneNodeRejoinExample.java:24)

       

       

      This is with the retry interval set to 200ms within the clustered-connections section in the hornetq-configuration.xml file.

       

      If I change the retryInterval to a large value say 60000ms in all 4 of the hornetq-configuration.xml files (for each of the 4 servers) then the same scenario works.

       

      Has anybody else come across this. I believe this is a bug in that the attempted bridge connection should not stop the live server to restart.

       

      Attached is the test case I am using. If you unzip the file to the examples/jms folder and then run the ./build.sh file from within the clustered-failover-rejoin folder you should hopefully be able to re-create the scenario.

       

      Any help would be appreciated.

       

      Grant

        • 1. Re: Bridge reconnection stopping node rejoining cluster
          gaohoward

          I think you have to start the live server first and then the back up.

          • 2. Re: Bridge reconnection stopping node rejoining cluster
            grantlittle

            Hi Yong,

             

            From my understanding the backup must always be started first or you receive the following error:

             

            [java] HornetQServer_1 out:DEPLOYMENTS IN ERROR:
                 [java] HornetQServer_1 out:  Deployment "JMSServerManager" is in error due to: HornetQException[errorCode=104 message=Backup server MUST be started before live server. Initialisation will not proceed.]
                 [java] HornetQServer_1 err:DEPLOYMENTS IN ERROR:
                 [java] HornetQServer_1 err:  Deployment "JMSServerManager" is in error due to: HornetQException[errorCode=104 message=Backup server MUST be started before live server. Initialisation will not proceed.]
                 [java] HornetQServer_1 err:
                 [java] HornetQServer_1 err:    at org.jboss.kernel.plugins.deployment.AbstractKernelDeployer.internalValidate(AbstractKernelDeployer.java:278)
                 [java] HornetQServer_1 err:    at org.jboss.kernel.plugins.deployment.AbstractKernelDeployer.validate(AbstractKernelDeployer.java:174)
                 [java] HornetQServer_1 err:    at org.hornetq.integration.bootstrap.HornetQBootstrapServer.bootstrap(HornetQBootstrapServer.java:158)
                 [java] HornetQServer_1 err:    at org.jboss.kernel.plugins.bootstrap.AbstractBootstrap.run(AbstractBootstrap.java:83)
                 [java] HornetQServer_1 err:    at org.hornetq.integration.bootstrap.HornetQBootstrapServer.run(HornetQBootstrapServer.java:116)
                 [java] HornetQServer_1 err:    at org.hornetq.common.example.SpawnedHornetQServer.main(SpawnedHornetQServer.java:35)

            • 3. Re: Bridge reconnection stopping node rejoining cluster
              gaohoward

              Sorry my bad. What version of hornetQ are you using?

              • 4. Re: Bridge reconnection stopping node rejoining cluster
                grantlittle

                I've tried this on 2.1.2-Final and I also ran my test case against the trunk version.

                 

                Also it appears we have a similar situation (although I don't currently have a simple test case for this) in a non clustered environment as follows:

                 

                1. Start a live/backup pair.

                2. Start a client which on connection failure attempts to re-establish a connection every 500ms.

                2.  Kill the primary server.

                3. Kill the backup server.

                4.  Sync from the killed backup to the killed live server

                5. Start  the first backup server

                6. Start the first live server.

                 

                The scenario fails at step 6. I believe in this situation the client is attempting to connect before the connection between the live/backup pair can be completely established.

                • 5. Re: Bridge reconnection stopping node rejoining cluster
                  grantlittle

                  I have managed to re-create the second scenario I was talking about in my previous post.

                   

                  I have attached another zip file which replicates the scenario. To use the provided example unzip the live-backup-rejoin.zip file to the examples/jms folder of a JMS distro and then run the ./build.sh file from within the extracted live-backup-rejoin folder.

                   

                  In this situation rather than writing reconnection logic I have a client that attempts to make a totally clean connection (without caching connection factories/connections/sessions etc) and attempts to send a message every 500ms. This to some degree mimics the reconnection logic.

                   

                  The result is the following:

                  [java] HornetQServer_1 err:[main] 11:33:01,653 INFO [org.hornetq.core.server.impl.HornetQServerImpl]  live server is starting..
                       [java] HornetQServer_1 out:FAILED::Incompletely deployed:
                       [java] HornetQServer_1 out:
                       [java] HornetQServer_1 err:[main] 11:33:01,834 SEVERE [org.hornetq.integration.bootstrap.HornetQBootstrapServer]  Failed to start server
                       [java] HornetQServer_1 err:java.lang.IllegalStateException: Incompletely deployed:
                       [java] HornetQServer_1 err:
                       [java] HornetQServer_1 err:DEPLOYMENTS IN ERROR:
                       [java] HornetQServer_1 err:  Deployment "JMSServerManager" is in error due to: HornetQException[errorCode=104 message=Connected server is not a backup server]
                       [java] HornetQServer_1 err:
                       [java] HornetQServer_1 err:    at org.jboss.kernel.plugins.deployment.AbstractKernelDeployer.internalValidate(AbstractKernelDeployer.java:278)
                       [java] HornetQServer_1 err:    at org.jboss.kernel.plugins.deployment.AbstractKernelDeployer.validate(AbstractKernelDeployer.java:174)
                       [java] HornetQServer_1 err:    at org.hornetq.integration.bootstrap.HornetQBootstrapServer.bootstrap(HornetQBootstrapServer.java:158)
                       [java] HornetQServer_1 err:    at org.jboss.kernel.plugins.bootstrap.AbstractBootstrap.run(AbstractBootstrap.java:83)
                       [java] HornetQServer_1 err:    at org.hornetq.integration.bootstrap.HornetQBootstrapServer.run(HornetQBootstrapServer.java:116)
                       [java] HornetQServer_1 err:    at org.hornetq.common.example.SpawnedHornetQServer.main(SpawnedHornetQServer.java:35)
                       [java] HornetQServer_1 out:DEPLOYMENTS IN ERROR:
                       [java] HornetQServer_1 out:  Deployment "JMSServerManager" is in error due to: HornetQException[errorCode=104 message=Connected server is not a backup server]
                       [java] HornetQServer_1 out:
                       [java] java.lang.RuntimeException: server failed to start
                       [java]     at org.hornetq.common.example.SpawnedVMSupport.spawnVM(SpawnedVMSupport.java:154)
                       [java]     at org.hornetq.common.example.HornetQExample.startServer(HornetQExample.java:144)
                       [java]     at org.hornetq.jms.example.LiveBackupNodeReconnectExample.runExample(LiveBackupNodeReconnectExample.java:48)
                       [java]     at org.hornetq.common.example.HornetQExample.run(HornetQExample.java:71)
                       [java]     at org.hornetq.jms.example.LiveBackupNodeReconnectExample.main(LiveBackupNodeReconnectExample.java:24)

                   

                  I'm guessing this is pretty much identical to the error shown in my first post in this discussion.

                   

                  This issue is stopping us using HornetQ in a production environment as if we do have an outage we want to be able to bring the live/backup pair up again without requiring an application layer outage (due to it attempting to reconnect to the HornetQ servers).

                  • 6. Re: Bridge reconnection stopping node rejoining cluster
                    gaohoward

                    Hello,

                     

                    I just ran your first example, It seems work against latest trunk, here is the output

                     

                    ANT_HOME is ../../../tools/ant
                    Found javac
                    Using the following ant version from ../../../tools/ant:
                    Apache Ant version 1.7.1 compiled on June 27 2008
                    Buildfile: build.xml

                     

                    delete-files:

                     

                    run:

                     

                    init:
                        [mkdir] Created dir: /home/howard/tests/hornetq-2.2.0.CR1/examples/jms/clustered-standalone-rejoin/build
                        [mkdir] Created dir: /home/howard/tests/hornetq-2.2.0.CR1/examples/jms/clustered-standalone-rejoin/build/classes

                     

                    compile:
                         [echo] src.example.dir=/home/howard/tests/hornetq-2.2.0.CR1/examples/jms/clustered-standalone-rejoin/src
                        [javac] Compiling 8 source files to /home/howard/tests/hornetq-2.2.0.CR1/examples/jms/clustered-standalone-rejoin/build/classes

                     

                    runExample:
                         [java] serverProps = -XX:+UseParallelGC -Xms256M -Xmx256M -XX:+AggressiveOpts -XX:+UseFastAccessorMethods -Dcom.sun.management.jmxremote -Djava.util.logging.config.file=/home/howard/tests/hornetq-2.2.0.CR1/examples/common/config/logging.properties -Djava.naming.factory.initial=org.jnp.interfaces.NamingContextFactory -Djava.naming.factory.url.pkgs=org.jboss.naming:org.jnp.interfaces
                         [java] Nov 15, 2010 11:34:54 AM org.hornetq.common.example.HornetQExample run
                         [java] INFO: hornetq.example.runServer is true
                         [java] Nov 15, 2010 11:34:54 AM org.hornetq.common.example.HornetQExample startServer
                         [java] INFO: starting server with config 'server0' logServerOutput true
                         [java] HornetQServer_0 err:[main] 11:34:57,053 INFO [org.hornetq.core.server.impl.HornetQServerImpl]  backup server is starting..
                         [java] HornetQServer_0 err:[main] 11:34:57,110 INFO [org.hornetq.core.persistence.impl.journal.JournalStorageManager]  Using NIO Journal
                         [java] HornetQServer_0 err:[main] 11:34:57,977 INFO [org.hornetq.core.server.impl.HornetQServerImpl]  Backup server initialised
                         [java] HornetQServer_0 err:[main] 11:34:58,067 INFO [org.hornetq.core.remoting.impl.netty.NettyAcceptor]  Started Netty Acceptor version 3.2.1.Final-r2319 localhost:5445 for CORE protocol
                         [java] HornetQServer_0 err:[main] 11:34:58,069 INFO [org.hornetq.core.server.impl.HornetQServerImpl]  HornetQ Server version 2.2.0.CR1 (Colmeia, 120) started
                         [java] HornetQServer_0 out:STARTED::
                         [java] Nov 15, 2010 11:34:58 AM org.hornetq.common.example.HornetQExample startServer
                         [java] INFO: starting server with config 'server1' logServerOutput true
                         [java] HornetQServer_1 err:[main] 11:34:59,449 INFO [org.hornetq.core.server.impl.HornetQServerImpl]  live server is starting..
                         [java] HornetQServer_1 err:[main] 11:34:59,660 INFO [org.hornetq.core.persistence.impl.journal.JournalStorageManager]  Using NIO Journal
                         [java] HornetQServer_1 err:[main] 11:34:59,663 WARNING [org.hornetq.core.server.impl.HornetQServerImpl]  Security risk! It has been detected that the cluster admin user and password have not been changed from the installation default. Please see the HornetQ user guide, cluster chapter, for instructions on how to do this.
                         [java] HornetQServer_1 err:[main] 11:35:01,108 INFO [org.hornetq.core.remoting.impl.netty.NettyAcceptor]  Started Netty Acceptor version 3.2.1.Final-r2319 localhost:5446 for CORE protocol
                         [java] HornetQServer_1 err:[main] 11:35:01,110 INFO [org.hornetq.core.server.impl.HornetQServerImpl]  HornetQ Server version 2.2.0.CR1 (Colmeia, 120) started
                         [java] HornetQServer_1 out:STARTED::
                         [java] Nov 15, 2010 11:35:01 AM org.hornetq.common.example.HornetQExample startServer
                         [java] INFO: starting server with config 'server2' logServerOutput true
                         [java] HornetQServer_2 err:[main] 11:35:02,521 INFO [org.hornetq.core.server.impl.HornetQServerImpl]  backup server is starting..
                         [java] HornetQServer_2 err:[main] 11:35:02,586 INFO [org.hornetq.core.persistence.impl.journal.JournalStorageManager]  Using NIO Journal
                         [java] HornetQServer_2 err:[main] 11:35:03,539 INFO [org.hornetq.core.server.impl.HornetQServerImpl]  Backup server initialised
                         [java] HornetQServer_2 err:[main] 11:35:03,632 INFO [org.hornetq.core.remoting.impl.netty.NettyAcceptor]  Started Netty Acceptor version 3.2.1.Final-r2319 localhost:5447 for CORE protocol
                         [java] HornetQServer_2 err:[main] 11:35:03,633 INFO [org.hornetq.core.server.impl.HornetQServerImpl]  HornetQ Server version 2.2.0.CR1 (Colmeia, 120) started
                         [java] HornetQServer_2 out:STARTED::
                         [java] Nov 15, 2010 11:35:03 AM org.hornetq.common.example.HornetQExample startServer
                         [java] INFO: starting server with config 'server3' logServerOutput true
                         [java] HornetQServer_3 err:[main] 11:35:05,089 INFO [org.hornetq.core.server.impl.HornetQServerImpl]  live server is starting..
                         [java] HornetQServer_3 err:[main] 11:35:05,291 INFO [org.hornetq.core.persistence.impl.journal.JournalStorageManager]  Using NIO Journal
                         [java] HornetQServer_3 err:[main] 11:35:05,294 WARNING [org.hornetq.core.server.impl.HornetQServerImpl]  Security risk! It has been detected that the cluster admin user and password have not been changed from the installation default. Please see the HornetQ user guide, cluster chapter, for instructions on how to do this.
                         [java] HornetQServer_3 err:[main] 11:35:06,588 INFO [org.hornetq.core.remoting.impl.netty.NettyAcceptor]  Started Netty Acceptor version 3.2.1.Final-r2319 localhost:5448 for CORE protocol
                         [java] HornetQServer_3 out:STARTED::
                         [java] HornetQServer_3 err:[main] 11:35:06,590 INFO [org.hornetq.core.server.impl.HornetQServerImpl]  HornetQ Server version 2.2.0.CR1 (Colmeia, 120) started
                         [java] HornetQServer_3 err:[Thread-1 (group:HornetQ-server-threads12437939-10915800)] 11:35:11,108 INFO [org.hornetq.core.server.cluster.impl.BridgeImpl]  Connecting bridge sf.hornetq-cluster.530b0ef3-f069-11df-bb12-00215c450d29 to its destination
                         [java] HornetQServer_3 err:[Thread-1 (group:HornetQ-server-threads12437939-10915800)] 11:35:11,252 INFO [org.hornetq.core.server.cluster.impl.BridgeImpl]  Bridge sf.hornetq-cluster.530b0ef3-f069-11df-bb12-00215c450d29 is connected to its destination
                         [java] HornetQServer_1 err:[Thread-8 (group:HornetQ-server-threads29948747-3735543)] 11:35:11,552 INFO [org.hornetq.core.server.cluster.impl.BridgeImpl]  Connecting bridge sf.hornetq-cluster.56629da5-f069-11df-8a60-00215c450d29 to its destination
                         [java] HornetQServer_1 err:[Thread-8 (group:HornetQ-server-threads29948747-3735543)] 11:35:11,621 INFO [org.hornetq.core.server.cluster.impl.BridgeImpl]  Bridge sf.hornetq-cluster.56629da5-f069-11df-8a60-00215c450d29 is connected to its destination
                         [java] Killing server 1
                         [java] HornetQServer_0 err:[Old I/O server worker (parentId: 22922109, channelId: 30725267, null => localhost/127.0.0.1:5445)] 11:35:56,940 INFO [org.hornetq.core.server.impl.HornetQServerImpl]  Activating backup server
                         [java] HornetQServer_0 err:[Old I/O server worker (parentId: 22922109, channelId: 30725267, null => localhost/127.0.0.1:5445)] 11:35:56,941 INFO [org.hornetq.core.persistence.impl.journal.JournalStorageManager]  Using NIO Journal
                         [java] HornetQServer_0 err:[Old I/O server worker (parentId: 22922109, channelId: 30725267, null => localhost/127.0.0.1:5445)] 11:35:56,942 WARNING [org.hornetq.core.server.impl.HornetQServerImpl]  Security risk! It has been detected that the cluster admin user and password have not been changed from the installation default. Please see the HornetQ user guide, cluster chapter, for instructions on how to do this.
                         [java] Killing server 0
                         [java] Syncing data from server 0 to server 1
                         [java] Nov 15, 2010 11:36:00 AM org.hornetq.common.example.HornetQExample startServer
                         [java] INFO: starting server with config 'server0' logServerOutput true
                         [java] HornetQServer_0 err:[main] 11:36:02,473 INFO [org.hornetq.core.server.impl.HornetQServerImpl]  backup server is starting..
                         [java] HornetQServer_0 err:[main] 11:36:02,556 INFO [org.hornetq.core.persistence.impl.journal.JournalStorageManager]  Using NIO Journal
                         [java] HornetQServer_0 err:[main] 11:36:03,049 INFO [org.hornetq.core.server.impl.HornetQServerImpl]  Backup server initialised
                         [java] HornetQServer_0 err:[main] 11:36:03,158 INFO [org.hornetq.core.remoting.impl.netty.NettyAcceptor]  Started Netty Acceptor version 3.2.1.Final-r2319 localhost:5445 for CORE protocol
                         [java] HornetQServer_0 err:[main] 11:36:03,161 INFO [org.hornetq.core.server.impl.HornetQServerImpl]  HornetQ Server version 2.2.0.CR1 (Colmeia, 120) started
                         [java] HornetQServer_0 out:STARTED::
                         [java] Nov 15, 2010 11:36:08 AM org.hornetq.common.example.HornetQExample startServer
                         [java] INFO: starting server with config 'server1' logServerOutput true
                         [java] HornetQServer_1 err:[main] 11:36:09,782 INFO [org.hornetq.core.server.impl.HornetQServerImpl]  live server is starting..
                         [java] HornetQServer_1 err:[main] 11:36:09,999 INFO [org.hornetq.core.persistence.impl.journal.JournalStorageManager]  Using NIO Journal
                         [java] HornetQServer_1 err:[main] 11:36:10,002 WARNING [org.hornetq.core.server.impl.HornetQServerImpl]  Security risk! It has been detected that the cluster admin user and password have not been changed from the installation default. Please see the HornetQ user guide, cluster chapter, for instructions on how to do this.
                         [java] HornetQServer_1 err:[main] 11:36:11,006 INFO [org.hornetq.core.remoting.impl.netty.NettyAcceptor]  Started Netty Acceptor version 3.2.1.Final-r2319 localhost:5446 for CORE protocol
                         [java] HornetQServer_1 err:[main] 11:36:11,007 INFO [org.hornetq.core.server.impl.HornetQServerImpl]  HornetQ Server version 2.2.0.CR1 (Colmeia, 120) started
                         [java] HornetQServer_1 out:STARTED::
                         [java] example complete
                         [java]
                         [java] #####################
                         [java] ###    SUCCESS!   ###
                         [java] #####################
                         [java] HornetQServer_3 err:[hornetq-shutdown-thread] 11:36:11,024 INFO [org.hornetq.integration.bootstrap.HornetQBootstrapServer]  Stopping HornetQ Server...
                         [java] HornetQServer_0 err:[hornetq-shutdown-thread] 11:36:11,029 INFO [org.hornetq.integration.bootstrap.HornetQBootstrapServer]  Stopping HornetQ Server...
                         [java] HornetQServer_2 err:[hornetq-shutdown-thread] 11:36:11,044 INFO [org.hornetq.integration.bootstrap.HornetQBootstrapServer]  Stopping HornetQ Server...
                         [java] HornetQServer_1 err:[hornetq-shutdown-thread] 11:36:11,025 INFO [org.hornetq.integration.bootstrap.HornetQBootstrapServer]  Stopping HornetQ Server...

                     

                    BUILD SUCCESSFUL
                    Total time: 1 minute 18 seconds

                    • 7. Re: Bridge reconnection stopping node rejoining cluster
                      grantlittle

                      Hi Yong,

                       

                      Thanks for all of your help. This is turning out to be a strange one.

                       

                      Could I ask a favour and ask you to run it a few times, as there is a possibility it will work if the timing of the connections happens to work corectly.

                       

                      I have tried this a 10 times on 3 different boxes on 2 different networks and it fails around 95% of the time.

                       

                      The machines are

                       

                      I am currently attempting to get the trunk version onto the Red Hat box to confirm the outcome there.

                       

                      Addmittedly I did ask a colleague to try it on their machine and it worked. He is on the same network as the Ubuntu box using Ubuntu 10.04.

                       

                      Can I ask what OS you are using and what version of Java you are on.

                       

                      Thanks for your help.

                       

                      Grant

                      • 8. Re: Bridge reconnection stopping node rejoining cluster
                        gaohoward

                        OK I got the failure in the second run. My env:

                        Ubuntu 10.04

                        Java 1.6.0_17

                        • 9. Re: Bridge reconnection stopping node rejoining cluster
                          grantlittle

                          Thanks Yong,

                           

                          Well that is at least a bit more consistent.

                           

                          I have also upgraded the 3rd environment (Red Hat) to hornetq-trunk and re-ran the test and it fails there with around 95% failure rate.

                           

                          My colleague is still getting the test to pass every time. However I thought he was on Ubuntu 10.04 but he is actually on 8.10.

                           

                          The inconsistent results make it difficult to identify it as a definite defect in HornetQ but I am still suspicious.

                          • 10. Re: Bridge reconnection stopping node rejoining cluster
                            gaohoward

                            I am starting to investigate it. Thanks

                            • 11. Re: Bridge reconnection stopping node rejoining cluster
                              grantlittle

                              Added another box (Ubuntu 8.04 runnning Java 1.6.0_20) and it fails as it does in the other environments.

                               

                              Thats 5 environments where is has been seen to fail, and 1 where is works! Not good odds.

                              • 12. Re: Bridge reconnection stopping node rejoining cluster
                                grantlittle

                                Thanks Yong

                                • 13. Re: Bridge reconnection stopping node rejoining cluster
                                  grantlittle

                                  A bit of extra information.

                                   

                                  Not sure if this is directly relevant or not.

                                   

                                  My colleague was looking into a different issue he was having with webservices and noticed that his localhost entry in his /etc/hosts file was set to the actual assigned IP address of his box.

                                   

                                  When he changed it to 127.0.0.1 the test failed as per expected (or at least as we are seeing on our other environments). The configuration on all of those other environments also has the localhost entry resolving to 127.0.0.1

                                   

                                  We did try reversing the scenario on the other environments so that the localhost entry resolves to the physical IP address of the machine. Unfortunately the test continued to fail.

                                  • 14. Re: Bridge reconnection stopping node rejoining cluster
                                    gaohoward

                                    Update:

                                     

                                    What I found so far is that when the back up server 0 is restarted, immediately there is a attach session request comes in, which will activate the back up server (as if it takes over the live node, internally it is no longer marked as back up). When later server 1 is started up and tries to establish replication with server0, server0 will throw an exception compaining 'not a back up server'.

                                     

                                    I'll take further look on why server0 is activated on a restart.

                                    1 2 Previous Next