1 Reply Latest reply on Nov 12, 2013 11:04 AM by manu_1185

    Error during failback on adding some configurations in hornet-configuration.xml

    manu_1185

      Hi,

       

      I have a hornetq failover/failback setup (using replication feature available in 2.3.0.Final release) which works fine normally. However when I add some configurations related to journal settings, after that only failover works fine. An error keeps coming during failback in live server logs (please note that a few times I just got the Connection Failure log, and after that failback worked...but I didn't see 'backup announced' in logs):

       

      11:41:42,199 INFO  [org.hornetq.integration.bootstrap] HQ101000: Starting HornetQ Server

      11:41:43,272 WARN  [org.hornetq.core.server] HQ222018: AIO was not located on this platform, it will fall back to using pure Java NIO. If your platform is Linux, install LibAIO to enable the AIO journal

      11:41:43,364 INFO  [org.hornetq.core.server] HQ221000: live server is starting with configuration HornetQ Configuration (clustered=true,backup=false,sharedStore=false,journalDirectory=/mnt/jms/mqtt/journal,bindingsDirectory=/mnt/jms/mqtt/bindings,largeMessagesDirectory=/mnt/jms/mqtt/large-messages,pagingDirectory=/mnt/jms/mqtt/paging)

      11:41:43,584 WARN  [org.hornetq.core.server] HQ222162: Moving data directory /mnt/jms/mqtt/bindings to /mnt/jms/mqtt/bindings1

      11:41:43,585 WARN  [org.hornetq.core.server] HQ222162: Moving data directory /mnt/jms/mqtt/journal to /mnt/jms/mqtt/journal1

      11:41:43,586 WARN  [org.hornetq.core.server] HQ222162: Moving data directory /mnt/jms/mqtt/paging to /mnt/jms/mqtt/paging1

      11:41:43,587 WARN  [org.hornetq.core.server] HQ222162: Moving data directory /mnt/jms/mqtt/large-messages to /mnt/jms/mqtt/large-messages1

      11:41:43,606 INFO  [org.hornetq.core.server] HQ221013: Using NIO Journal

      11:41:43,609 WARN  [org.hornetq.core.server] HQ222007: Security risk! HornetQ is running with the default cluster admin user and default password. Please see the HornetQ user guide, cluster chapter, for instructions on how to change this.

      11:41:43,772 INFO  [org.hornetq.core.server] HQ221109: HornetQ Backup Server version 2.3.0.SNAPSHOT (colonizer, 123) [null] started, waiting live to fail before it gets active

      11:41:49,045 WARN  [org.hornetq.core.client] HQ212037: Connection failure has been detected: HQ119015: The connection was disconnected because of server shutdown [code=DISCONNECTED]

      11:41:49,643 ERROR [org.hornetq.core.server] HQ224000: Failure in initialisation: HornetQException[errorType=ILLEGAL_STATE message=HQ119026: Backup Server was not yet in sync with live]

              at org.hornetq.core.server.impl.HornetQServerImpl$SharedNothingBackupActivation.run(HornetQServerImpl.java:2430) [hornetq-server.jar:]

              at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_15]

       

       

      HornetQException[errorType=ILLEGAL_STATE message=HQ119026: Backup Server was not yet in sync with live]

              at org.hornetq.core.server.impl.HornetQServerImpl$SharedNothingBackupActivation.run(HornetQServerImpl.java:2430)

              at java.lang.Thread.run(Thread.java:722)

      ^C11:41:56,067 INFO  [org.hornetq.integration.bootstrap] HQ101001: Stopping HornetQ Server

      11:41:56,104 INFO  [org.hornetq.core.server] HQ221002: HornetQ Server version 2.3.0.SNAPSHOT (colonizer, 123) [ddbe4401-4b8e-11e3-add7-3d7449fe1f06] stopped

       

      Given below is the list of configuration I am adding to hornetq-configuration.xml on both live and backup servers:

       

          <id-cache-size>10000</id-cache-size>

         <connection-ttl-override>-1</connection-ttl-override>

         <scheduled-thread-pool-max-size>50</scheduled-thread-pool-max-size>

         <thread-pool-max-size>-1</thread-pool-max-size>

         <journal-file-size>50485760</journal-file-size>

         <journal-sync-non-transactional>false</journal-sync-non-transactional>

         <journal-sync-transactional>false</journal-sync-transactional>

         <journal-buffer-timeout>100000000</journal-buffer-timeout>

         <journal-max-io>50000</journal-max-io>

       

      Any idea what can be causing this?

      FYI: We are running hornetq on an 8 core 32 GB machine.

        • 1. Re: Error during failback on adding some configurations in hornet-configuration.xml
          manu_1185

          When I additionally add following two variables to the above list of variables in hornetq-configuration.xml, I get errors listed below during startup of both servers (possibly when replication is happening) and this is happening everytime. Does it mean I will have to increase connection-ttl and client-failure-check-period?

           

          <journal-min-files>100</journal-min-files>

          <journal-compact-min-files>1000</journal-compact-min-files>

           

          Error on Backup server:

           

          15:36:19,629 INFO  [org.hornetq.integration.bootstrap] HQ101000: Starting HornetQ Server

          15:36:21,315 WARN  [org.hornetq.core.server] HQ222018: AIO was not located on this platform, it will fall back to using pure Java NIO. If your platform is Linux, install LibAIO to enable the AIO journal

          15:36:21,458 INFO  [org.hornetq.core.server] HQ221000: backup server is starting with configuration HornetQ Configuration (clustered=true,backup=true,sharedStore=false,journalDirectory=/mnt/jms/mqtt/journal,bindingsDirectory=/mnt/jms/mqtt/bindings,largeMessagesDirectory=/mnt/jms/mqtt/large-messages,pagingDirectory=/mnt/jms/mqtt/paging)

          15:36:21,480 WARN  [org.hornetq.core.server] HQ222162: Moving data directory /mnt/jms/mqtt/bindings to /mnt/jms/mqtt/bindings2

          15:36:21,485 WARN  [org.hornetq.core.server] HQ222162: Moving data directory /mnt/jms/mqtt/journal to /mnt/jms/mqtt/journal2

          15:36:21,487 WARN  [org.hornetq.core.server] HQ222162: Moving data directory /mnt/jms/mqtt/paging to /mnt/jms/mqtt/paging2

          15:36:21,488 WARN  [org.hornetq.core.server] HQ222162: Moving data directory /mnt/jms/mqtt/large-messages to /mnt/jms/mqtt/large-messages2

          15:36:21,545 INFO  [org.hornetq.core.server] HQ221013: Using NIO Journal

          15:36:21,568 WARN  [org.hornetq.core.server] HQ222007: Security risk! HornetQ is running with the default cluster admin user and default password. Please see the HornetQ user guide, cluster chapter, for instructions on how to change this.

          15:36:50,175 INFO  [org.hornetq.core.server] HQ221109: HornetQ Backup Server version 2.3.0.SNAPSHOT (colonizer, 123) [null] started, waiting live to fail before it gets active

          15:37:59,802 ERROR [org.hornetq.core.server] HQ224000: Failure in initialisation: HornetQException[errorType=ILLEGAL_STATE message=HQ119026: Backup Server was not yet in sync with live]

                  at org.hornetq.core.server.impl.HornetQServerImpl$SharedNothingBackupActivation.run(HornetQServerImpl.java:2430) [hornetq-server.jar:]

                  at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_15]

           

           

          HornetQException[errorType=ILLEGAL_STATE message=HQ119026: Backup Server was not yet in sync with live]

                  at org.hornetq.core.server.impl.HornetQServerImpl$SharedNothingBackupActivation.run(HornetQServerImpl.java:2430)

                  at java.lang.Thread.run(Thread.java:722)

           

          Error on Live Server:

           

          15:36:48,707 INFO  [org.hornetq.core.server] HQ221020: Started Netty Acceptor version 3.6.2.Final-c0d783c 10.0.0.225:5445 for CORE protocol

          15:36:48,722 INFO  [org.hornetq.core.server] HQ221020: Started Netty Acceptor version 3.6.2.Final-c0d783c 10.0.0.225:5455 for CORE protocol

          15:36:48,726 INFO  [org.hornetq.core.server] HQ221007: Server is now live

          15:36:48,727 INFO  [org.hornetq.core.server] HQ221001: HornetQ Server version 2.3.0.SNAPSHOT (colonizer, 123) [1b1110e5-4baf-11e3-89da-95970defdfac]

          15:36:50,670 INFO  [org.hornetq.core.server] HQ221025: Replication: sending JournalFileImpl: (hornetq-data-3.hq id = 103, recordID = 103) (size=50,485,760) to backup. NIOSequentialFile /mnt/jms/mqtt/journal/hornetq-data-3.hq

          15:36:50,726 INFO  [org.hornetq.core.server] HQ221025: Replication: sending JournalFileImpl: (hornetq-bindings-8.bindings id = 1, recordID = 1) (size=1,048,576) to backup. NIOSequentialFile /mnt/jms/mqtt/bindings/hornetq-bindings-8.bindings

          15:36:50,728 INFO  [org.hornetq.core.server] HQ221025: Replication: sending JournalFileImpl: (hornetq-bindings-2.bindings id = 6, recordID = 6) (size=1,048,576) to backup. NIOSequentialFile /mnt/jms/mqtt/bindings/hornetq-bindings-2.bindings

          15:37:50,730 WARN  [org.hornetq.core.client] HQ212037: Connection failure has been detected: HQ119014: Did not receive data from /10.0.0.226:53554. It is likely the client has exited or crashed without closing its connection, or the network between the server and client has failed. You also might have configured connection-ttl and client-failure-check-period incorrectly. Please check user manual for more information. The connection will now be closed. [code=CONNECTION_TIMEDOUT]

          15:37:50,733 WARN  [org.hornetq.core.server] HQ222092: Connection to the backup node failed, removing replication now: HornetQException[errorType=CONNECTION_TIMEDOUT message=HQ119014: Did not receive data from /10.0.0.226:53554. It is likely the client has exited or crashed without closing its connection, or the network between the server and client has failed. You also might have configured connection-ttl and client-failure-check-period incorrectly. Please check user manual for more information. The connection will now be closed.]

                  at org.hornetq.core.remoting.server.impl.RemotingServiceImpl$FailureCheckAndFlushThread.run(RemotingServiceImpl.java:654) [hornetq-server.jar:]