13 Replies Latest reply on Feb 29, 2012 4:24 PM by gabrielp

    Failover - Reconnection of the client not working

    gabrielp

      Hi all, I'm having an issue with the failover mechanism of HornetQ related to the client losing the session and not being able to continue inserting messages in the queue.

      The client is a jmeter test that uses a Java Request to insert messages in a queue. It connects directly to the endpoint (url:1499) and inserts messages there. The tests we are doing for the failover consists in killing or shutting down the live server while we are sending 2000 messages.

       

      We see the backup server becoming live

      2012-01-24 09:15:00,772 INFO  [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.MessagingPartition] [AsynchViewChangeHandler Thread] Dead members: 1 ([63.131.131.231:1499])

      2012-01-24 09:15:00,772 INFO  [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.MessagingPartition] [AsynchViewChangeHandler Thread] New Members : 0 ([])

      2012-01-24 09:15:00,772 INFO  [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.MessagingPartition] [AsynchViewChangeHandler Thread] All Members : 1 ([63.131.131.231:1299])

      ...

      INFO: Backup Server is now live

       

      but the rest of the messages are rejected with the following exception:

       

      24/01/2012 14:58:39 org.hornetq.core.logging.impl.JULLogDelegate warn

      ADVERTENCIA: Connection failure has been detected: The connection was disconnected because of server shutdown [code=4]

      javax.jms.JMSException: Connection failure detected. Unblocking a blocking call that will never get a response

                at org.hornetq.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:286)

                at org.hornetq.core.client.impl.ClientProducerImpl.doSend(ClientProducerImpl.java:287)

                at org.hornetq.core.client.impl.ClientProducerImpl.send(ClientProducerImpl.java:142)

                at org.hornetq.jms.client.HornetQMessageProducer.doSend(HornetQMessageProducer.java:451)

                at org.hornetq.jms.client.HornetQMessageProducer.send(HornetQMessageProducer.java:199)

                at com.xxx.testing.sampler.XDtoPublisherSampler.runTest(XDtoPublisherSampler.java:106)

                at org.apache.jmeter.protocol.java.sampler.JavaSampler.sample(JavaSampler.java:161)

                at org.apache.jmeter.threads.JMeterThread.process_sampler(JMeterThread.java:411)

                at org.apache.jmeter.threads.JMeterThread.run(JMeterThread.java:297)

                at java.lang.Thread.run(Thread.java:619)

      Caused by: HornetQException[errorCode=5 message=Connection failure detected. Unblocking a blocking call that will never get a response]

                ... 10 more

      24/01/2012 14:58:39 org.hornetq.core.logging.impl.JULLogDelegate warn

      ADVERTENCIA: Connection failure has been detected: The connection was disconnected because of server shutdown [code=4]

      javax.jms.IllegalStateException: Session is closed

                at org.hornetq.jms.client.HornetQSession.checkClosed(HornetQSession.java:1008)

                at org.hornetq.jms.client.HornetQSession.createObjectMessage(HornetQSession.java:165)

                at com.xxx.testing.sampler.XDtoPublisherSampler.createObjectMessage(XDtoPublisherSampler.java:82)

                at com.xxx.testing.sampler.XDtoPublisherSampler.runTest(PlayphoneDtoPublisherSampler.java:104)

                at org.apache.jmeter.protocol.java.sampler.JavaSampler.sample(JavaSampler.java:161)

                at org.apache.jmeter.threads.JMeterThread.process_sampler(JMeterThread.java:411)

                at org.apache.jmeter.threads.JMeterThread.run(JMeterThread.java:297)

                at java.lang.Thread.run(Thread.java:619)

       

       

       

       

      The live hornetq instance has the following config:

       

      hornetq-configuration.xml

       

      <clustered>true</clustered>

           <failover-on-shutdown>true</failover-on-shutdown>

           <shared-store>true</shared-store> 

           <allow-failback>true</allow-failback>

         <journal-directory>${jboss.hornetq.file.dir}/journal</journal-directory>

         <connectors>

            <connector name="netty">

               <factory-class>org.hornetq.core.remoting.impl.netty.NettyConnectorFactory</factory-class>

               <param key="host"  value="${jboss.bind.address:localhost}"/>

               <param key="port"  value="${hornetq.remoting.netty.port:5445}"/>

            </connector>

            <connector name="backup">

               <factory-class>org.hornetq.core.remoting.impl.netty.NettyConnectorFactory</factory-class>

               <param key="host"  value="${hornetq.backup.host.address:localhost}"/>

               <param key="port"  value="${hornetq.remoting.netty.backup.port:5445}"/>

            </connector>

      </connectors>

         <acceptors>

            <acceptor name="netty">

               <factory-class>org.hornetq.core.remoting.impl.netty.NettyAcceptorFactory</factory-class>

               <param key="host"  value="${jboss.bind.address:localhost}"/>

               <param key="port"  value="${hornetq.remoting.netty.port:5445}"/>

            </acceptor>

      </acceptors>

       

         <cluster-connections>

              <cluster-connection name="prep-hornetq-cluster">

                      <address>jms</address>

                      <connector-ref>netty</connector-ref>

                      <retry-interval>500</retry-interval>

                      <use-duplicate-detection>true</use-duplicate-detection>

                      <forward-when-no-consumers>true</forward-when-no-consumers>

                      <max-hops>1</max-hops>

                      <static-connectors>

                              <connector-ref>backup</connector-ref>

                      </static-connectors>

              </cluster-connection>

         </cluster-connections>

       

       

      hornetq-jms.xml

         <connection-factory name="NettyConnectionFactory">

            <xa>false</xa>

            <connectors>

               <connector-ref connector-name="netty"/>

            </connectors>

            <entries>

               <entry name="/ConnectionFactory"/>

            </entries>

            <reconnect-attempts>-1</reconnect-attempts>

         </connection-factory>

       

       

      The script for starting the hornetq is the following

      /run.sh -c all-with-hornetq-live -b ux2.host.net -Dserver.host.name=ux2.host.net -Dpp.web.port=8404 -Djboss.service.binding.set=ports-04 -g MessagingPartition -Djboss.messaging.ServerPeerID=0 -Djboss.server.log.dir=/usr/software/runtime/hornetql/logs -Dpp.app.folder=/usr/software/runtime/hornetql/app -Djboss.hornetq.file.dir=/usr/software/runtime/hornetq -Dhornetq.server-id=0 -Dhornetq.remoting.netty.port=5445 -Dhornetq.remoting.netty.batch.port=5455 -Dhornetq.backup.host.address=ux2.host.net -Dhornetq.remoting.netty.backup.port=5446 &

       

       

      And the backup server is configured like this:

       

      hornetq-configuration.xml

         <clustered>true</clustered>

       

       

         <failover-on-shutdown>true</failover-on-shutdown>

       

       

         <shared-store>true</shared-store>

       

       

         <allow-failback>true</allow-failback>

       

       

         <backup>true</backup>

       

       

         <journal-directory>${jboss.hornetq.file.dir}/journal</journal-directory>

       

      ...

            <connector name="netty">

               <factory-class>org.hornetq.core.remoting.impl.netty.NettyConnectorFactory</factory-class>

               <param key="host"  value="${jboss.bind.address:localhost}"/>

               <param key="port"  value="${hornetq.remoting.netty.port:5446}"/>

            </connector>

       

             <connector name="backup">

               <factory-class>org.hornetq.core.remoting.impl.netty.NettyConnectorFactory</factory-class>

               <param key="host"  value="${hornetq.backup.host.address:localhost}"/>

               <param key="port"  value="${hornetq.remoting.netty.backup.port:5445}"/>

            </connector>

       

      ...

            <acceptor name="netty">

               <factory-class>org.hornetq.core.remoting.impl.netty.NettyAcceptorFactory</factory-class>

               <param key="host"  value="${jboss.bind.address:localhost}"/>

               <param key="port"  value="${hornetq.remoting.netty.port:5446}"/>

            </acceptor>

       

      ...

         <cluster-connections>

              <cluster-connection name="prep-hornetq-cluster">

                      <address>jms</address>

                      <connector-ref>netty</connector-ref>

                      <retry-interval>500</retry-interval>

                      <use-duplicate-detection>true</use-duplicate-detection>

                      <forward-when-no-consumers>true</forward-when-no-consumers>

                      <max-hops>1</max-hops>

                      <static-connectors>

                              <connector-ref>backup</connector-ref>

                      </static-connectors>

              </cluster-connection>

         </cluster-connections>

       

      hornetq-jms.xml

       

         <connection-factory name="NettyConnectionFactory">

            <xa>false</xa>

            <connectors>

               <connector-ref connector-name="netty"/>

            </connectors>

            <entries>

               <entry name="/ConnectionFactory"/>

            </entries>

         </connection-factory>

       

       

      The script

       

      ./run.sh -c all-with-hornetq-backup -b ux2.host.playp.net -Dserver.host.name=ux2.host.net -Dpp.web.port=8204 -Djboss.service.binding.set=ports-02 -g MessagingPartition -Djboss.messaging.ServerPeerID=1 -Djboss.server.log.dir=/usr/software/runtime/hornetqb/logs -Dpp.app.folder=/usr/software/runtime/hornetqb/app -Djboss.hornetq.file.dir=/usr/software/runtime/hornetq -Dhornetq.server-id=1 -Dhornetq.remoting.netty.port=5446 -Dhornetq.remoting.netty.batch.port=5456 -Dhornetq.backup.host.address=ux2.host.playp.net -Dhornetq.remoting.netty.backup.port=5445 &

       

       

       

      Thanks for your help

        • 1. Re: Failover - Reconnection of the client not working
          clebert.suconic

          I - What version? of hornetq / JBoss

          II - What connection factory are you using on client?

           

           

          just FYI:

           

          012-01-24 09:15:00,772 INFO  [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.MessagingPartition] [AsynchViewChangeHandler Thread] Dead members: 1 ([63.131.131.231:1499])

           

           

          ^^ this is not HornetQ code

           

           

          I couldn't read the XMLs.. you can use attachments on the advanced editor at this forum.

          • 2. Re: Failover - Reconnection of the client not working
            gabrielp

            I attached the xml configuration files to my first post.

            I'm using HornetQ 2.2.5.Final and JBoss 5.0.1.GA

             

            In the client i tried with ConnectionFactory and /ConnectionFactory.

            • 3. Re: Failover - Reconnection of the client not working
              clebert.suconic

              You should add <ha>true</ha> to your connection-factory:;

               

               

                 <connection-factory name="ConnectionFactory">

                    <connectors>

                       <connector-ref connector-name="netty-connector"/>

                    </connectors>

                    <entries>

                       <entry name="ConnectionFactory"/>

                    </entries>

               

               

                     <ha>true</ha>

              • 4. Re: Failover - Reconnection of the client not working
                clebert.suconic

                Look at the transaction-failover example

                • 5. Re: Failover - Reconnection of the client not working
                  gabrielp

                  Thanks for your answer, but it's still not working.

                  I also looked at the example. Since I'm not allowed to use broadcast / discovery groups, I'm only using a cluster-connection.

                   

                  do you think my issue is related to that?

                   

                  Thanks

                  • 6. Re: Failover - Reconnection of the client not working
                    clebert.suconic

                    It's the backup who announces itself to the live...

                     

                     

                    On the live node, remove the reference to the backup...

                     

                     

                    On the backup node, change the reference to the backup (huh? that's wrong here), and make it referencing the live node.

                    • 7. Re: Failover - Reconnection of the client not working
                      gabrielp

                      Now on the live server I have:

                       

                            <connector name="netty">

                               <factory-class>org.hornetq.core.remoting.impl.netty.NettyConnectorFactory</factory-class>

                               <param key="host"  value="${jboss.bind.address:localhost}"/>

                               <param key="port"  value="${hornetq.remoting.netty.port:5445}"/>

                            </connector>

                       

                         <cluster-connections>

                              <cluster-connection name="prep-hornetq-cluster">

                                      <address>jms</address>

                                      <connector-ref>netty</connector-ref>

                                      <retry-interval>500</retry-interval>

                                      <use-duplicate-detection>true</use-duplicate-detection>

                                      <forward-when-no-consumers>false</forward-when-no-consumers>

                                      <max-hops>1</max-hops>

                              </cluster-connection>

                         </cluster-connections>

                       

                      On the backup server:

                       

                            <connector name="netty">

                               <factory-class>org.hornetq.core.remoting.impl.netty.NettyConnectorFactory</factory-class>

                               <param key="host"  value="${jboss.bind.address:localhost}"/>

                               <param key="port"  value="${hornetq.remoting.netty.port:5446}"/>

                            </connector>

                       

                       

                            <connector name="primary">

                               <factory-class>org.hornetq.core.remoting.impl.netty.NettyConnectorFactory</factory-class>

                               <param key="host"  value="${hornetq.backup.host.address:localhost}"/>

                               <param key="port"  value="${hornetq.remoting.netty.backup.port:5445}"/>

                            </connector>

                       

                      I changed the name of this connector to primary

                      That reference is now used in the cluster connection

                       

                         <cluster-connections>

                              <cluster-connection name="prep-hornetq-cluster">

                                      <address>jms</address>

                                      <connector-ref>netty</connector-ref>

                                      <retry-interval>500</retry-interval>

                                      <use-duplicate-detection>true</use-duplicate-detection>

                                      <forward-when-no-consumers>true</forward-when-no-consumers>

                                      <max-hops>1</max-hops>

                                      <static-connectors>

                                              <connector-ref>primary</connector-ref>

                                      </static-connectors>

                              </cluster-connection>

                         </cluster-connections>

                       

                       

                      And the connection factory on both servers is now:

                         <connection-factory name="NettyConnectionFactory">

                            <xa>false</xa>

                            <connectors>

                               <connector-ref connector-name="netty"/>

                            </connectors>

                            <entries>

                               <entry name="/ConnectionFactory"/>

                            </entries>

                            <ha>true</ha>

                            <retry-interval>1000</retry-interval>

                            <reconnect-attempts>5</reconnect-attempts>

                         </connection-factory>

                       

                       

                      Should the connector names be unique in each server?  NO matter what change I make, I still cannot make the failover work. Do I have to change something in the client?

                       

                      Thanks

                      • 8. Re: Failover - Reconnection of the client not working
                        clebert.suconic

                        The name doesn't really matter... as long as the backup is announcing itself to the live node.

                         

                         

                        You need to make sure the backup is announcing itself properly.

                        • 9. Re: Failover - Reconnection of the client not working
                          clebert.suconic

                          In your place, I would check trace logs (or maybe even do some debug if the server is announcing itself properly).

                           

                          You could maybe try the latest branch update.

                          • 10. Re: Failover - Reconnection of the client not working
                            gabrielp

                            Ok, so I enabled TRACE logging, and after the backup server is started this is all the log i see in the live server.

                             

                            2012-01-25 06:04:45,814 TRACE [org.hornetq.utils.UTF8Util] [Old I/O server worker (parentId: 911104169, [id: 0x364e58a9, ux2.alexprep.playp.net/63.131.131.231:5445])] Reading string with utfSize=36

                            2012-01-25 06:04:45,814 TRACE [org.hornetq.utils.UTF8Util] [Old I/O server worker (parentId: 911104169, [id: 0x364e58a9, ux2.alexprep.playp.net/63.131.131.231:5445])] Reading string with utfSize=58

                            2012-01-25 06:04:45,815 TRACE [org.hornetq.utils.UTF8Util] [Old I/O server worker (parentId: 911104169, [id: 0x364e58a9, ux2.alexprep.playp.net/63.131.131.231:5445])] Reading string with utfSize=22

                            2012-01-25 06:04:45,815 DEBUG [org.hornetq.core.client.impl.Topology] [Old I/O server worker (parentId: 911104169, [id: 0x364e58a9, ux2.alexprep.playp.net/63.131.131.231:5445])] adding = 08c1fcbe-439e-11e1-ab05-0019bb364c28:Pair[a=null, b=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5446&host=ux2-alexprep-playp-net]

                            2012-01-25 06:04:45,815 DEBUG [org.hornetq.core.client.impl.Topology] [Old I/O server worker (parentId: 911104169, [id: 0x364e58a9, ux2.alexprep.playp.net/63.131.131.231:5445])] before----------------------------------

                            2012-01-25 06:04:45,815 DEBUG [org.hornetq.core.client.impl.Topology] [Old I/O server worker (parentId: 911104169, [id: 0x364e58a9, ux2.alexprep.playp.net/63.131.131.231:5445])]           08c1fcbe-439e-11e1-ab05-0019bb364c28 => TopologyMember[connector=Pair[a=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=ux2-alexprep-playp-net, b=null]]

                                      nodes=1          members=1

                            2012-01-25 06:04:45,815 DEBUG [org.hornetq.core.client.impl.Topology] [Old I/O server worker (parentId: 911104169, [id: 0x364e58a9, ux2.alexprep.playp.net/63.131.131.231:5445])] Topology updated=true

                            2012-01-25 06:04:45,815 DEBUG [org.hornetq.core.client.impl.Topology] [Old I/O server worker (parentId: 911104169, [id: 0x364e58a9, ux2.alexprep.playp.net/63.131.131.231:5445])]           08c1fcbe-439e-11e1-ab05-0019bb364c28 => TopologyMember[connector=Pair[a=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=ux2-alexprep-playp-net, b=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5446&host=ux2-alexprep-playp-net]]

                                      nodes=2          members=1

                             

                            Is this what you mean when you say that the backup server is announcing itself properly?

                             

                             

                            When I kill the live server (during my test), the backup server logs the following:

                            2012-01-25 06:12:03,342 TRACE [org.hornetq.core.journal.impl.JournalImpl] [Thread-14] Loading file hornetq-data-4.hq

                            2012-01-25 06:12:03,566 DEBUG [org.hornetq.core.journal.impl.JournalImpl] [Thread-14] Flushing deletes during loading, deleteCount = 20001

                            2012-01-25 06:12:03,600 DEBUG [org.hornetq.core.journal.impl.JournalImpl] [Thread-14] flush delete done

                            2012-01-25 06:12:03,630 TRACE [org.hornetq.core.journal.impl.JournalImpl] [Thread-14] Loading file hornetq-data-5.hq

                            2012-01-25 06:12:03,649 DEBUG [org.hornetq.core.client.impl.ClientSessionFactoryImpl] [Thread-2 (group:HornetQ-client-global-threads-1235586206)] Trying reconnection attempt 1

                            2012-01-25 06:12:03,649 DEBUG [org.hornetq.core.remoting.impl.netty.NettyConnector] [Thread-2 (group:HornetQ-client-global-threads-1235586206)] Started Netty Connector version 3.2.3.Final-r${buildNumber}

                            2012-01-25 06:12:03,649 DEBUG [org.hornetq.core.client.impl.ClientSessionFactoryImpl] [Thread-2 (group:HornetQ-client-global-threads-1235586206)] Trying to connect at the main server using connector :org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=ux2-alexprep-playp-net

                            2012-01-25 06:12:03,650 DEBUG [org.hornetq.core.client.impl.ClientSessionFactoryImpl] [Thread-2 (group:HornetQ-client-global-threads-1235586206)] Main server is not up. Hopefully there's a backup configured now!

                            2012-01-25 06:12:03,774 TRACE [org.hornetq.core.journal.impl.JournalImpl] [Thread-14] Loading file hornetq-data-6.hq

                            2012-01-25 06:12:03,904 TRACE [org.hornetq.core.journal.impl.JournalImpl] [Thread-14] Loading file hornetq-data-7.hq

                            2012-01-25 06:12:04,107 TRACE [org.hornetq.core.journal.impl.JournalImpl] [Thread-14] Loading file hornetq-data-8.hq

                            2012-01-25 06:12:04,467 TRACE [org.hornetq.core.journal.impl.JournalImpl] [Thread-14] Loading file hornetq-data-9.hq

                            2012-01-25 06:12:04,939 TRACE [org.hornetq.core.journal.impl.JournalFilesRepository] [Thread-14] Adding free file JournalFileImpl: (hornetq-data-9.hq id = 180, recordID = 180)

                            2012-01-25 06:12:04,966 TRACE [org.hornetq.core.journal.impl.JournalImpl] [Thread-14] Loading file hornetq-data-10.hq

                            2012-01-25 06:12:05,462 TRACE [org.hornetq.core.journal.impl.JournalFilesRepository] [Thread-14] Adding free file JournalFileImpl: (hornetq-data-10.hq id = 181, recordID = 181)

                            2012-01-25 06:12:05,469 TRACE [org.hornetq.core.journal.impl.JournalFilesRepository] [Thread-14] pushing openFile JournalFileImpl: (hornetq-data-9.hq id = 182, recordID = 182)

                            2012-01-25 06:12:05,469 TRACE [org.hornetq.core.journal.impl.Reclaimer] [Thread-14] posCount on JournalFileImpl: (hornetq-data-1.hq id = 21, recordID = 21) = 26339

                            2012-01-25 06:12:05,469 TRACE [org.hornetq.core.journal.impl.Reclaimer] [Thread-14] Negative from JournalFileImpl: (hornetq-data-1.hq id = 21, recordID = 21) into JournalFileImpl: (hornetq-data-1.hq id = 21, recordID = 21) = 24258

                            2012-01-25 06:12:05,469 TRACE [org.hornetq.core.journal.impl.Reclaimer] [Thread-14] Negative from JournalFileImpl: (hornetq-data-2.hq id = 94, recordID = 94) into JournalFileImpl: (hornetq-data-1.hq id = 21, recordID = 21) = 1971

                            2012-01-25 06:12:05,469 TRACE [org.hornetq.core.journal.impl.Reclaimer] [Thread-14] posCount on JournalFileImpl: (hornetq-data-2.hq id = 94, recordID = 94) = 23475

                            2012-01-25 06:12:05,470 TRACE [org.hornetq.core.journal.impl.Reclaimer] [Thread-14] Negative from JournalFileImpl: (hornetq-data-2.hq id = 94, recordID = 94) into JournalFileImpl: (hornetq-data-2.hq id = 94, recordID = 94) = 21508

                            2012-01-25 06:12:05,470 TRACE [org.hornetq.core.journal.impl.Reclaimer] [Thread-14] Negative from JournalFileImpl: (hornetq-data-3.hq id = 95, recordID = 95) into JournalFileImpl: (hornetq-data-2.hq id = 94, recordID = 94) = 1967

                            2012-01-25 06:12:05,470 TRACE [org.hornetq.core.journal.impl.Reclaimer] [Thread-14] JournalFileImpl: (hornetq-data-2.hq id = 94, recordID = 94) Can't be reclaimed because JournalFileImpl: (hornetq-data-1.hq id = 21, recordID = 21) has negative values

                            2012-01-25 06:12:05,470 TRACE [org.hornetq.core.journal.impl.Reclaimer] [Thread-14] posCount on JournalFileImpl: (hornetq-data-3.hq id = 95, recordID = 95) = 16345

                            2012-01-25 06:12:05,470 TRACE [org.hornetq.core.journal.impl.Reclaimer] [Thread-14] Negative from JournalFileImpl: (hornetq-data-3.hq id = 95, recordID = 95) into JournalFileImpl: (hornetq-data-3.hq id = 95, recordID = 95) = 14288

                            2012-01-25 06:12:05,470 TRACE [org.hornetq.core.journal.impl.Reclaimer] [Thread-14] Negative from JournalFileImpl: (hornetq-data-4.hq id = 96, recordID = 96) into JournalFileImpl: (hornetq-data-3.hq id = 95, recordID = 95) = 2057

                            2012-01-25 06:12:05,470 TRACE [org.hornetq.core.journal.impl.Reclaimer] [Thread-14] JournalFileImpl: (hornetq-data-3.hq id = 95, recordID = 95) Can't be reclaimed because JournalFileImpl: (hornetq-data-2.hq id = 94, recordID = 94) has negative values

                            2012-01-25 06:12:05,470 TRACE [org.hornetq.core.journal.impl.Reclaimer] [Thread-14] posCount on JournalFileImpl: (hornetq-data-4.hq id = 96, recordID = 96) = 13333

                            2012-01-25 06:12:05,470 TRACE [org.hornetq.core.journal.impl.Reclaimer] [Thread-14] Negative from JournalFileImpl: (hornetq-data-4.hq id = 96, recordID = 96) into JournalFileImpl: (hornetq-data-4.hq id = 96, recordID = 96) = 11188

                            2012-01-25 06:12:05,470 TRACE [org.hornetq.core.journal.impl.Reclaimer] [Thread-14] Negative from JournalFileImpl: (hornetq-data-5.hq id = 97, recordID = 97) into JournalFileImpl: (hornetq-data-4.hq id = 96, recordID = 96) = 2145

                            2012-01-25 06:12:05,470 TRACE [org.hornetq.core.journal.impl.Reclaimer] [Thread-14] JournalFileImpl: (hornetq-data-4.hq id = 96, recordID = 96) Can't be reclaimed because JournalFileImpl: (hornetq-data-3.hq id = 95, recordID = 95) has negative values

                            2012-01-25 06:12:05,470 TRACE [org.hornetq.core.journal.impl.Reclaimer] [Thread-14] posCount on JournalFileImpl: (hornetq-data-5.hq id = 97, recordID = 97) = 11733

                            2012-01-25 06:12:05,470 TRACE [org.hornetq.core.journal.impl.Reclaimer] [Thread-14] Negative from JournalFileImpl: (hornetq-data-5.hq id = 97, recordID = 97) into JournalFileImpl: (hornetq-data-5.hq id = 97, recordID = 97) = 9896

                            2012-01-25 06:12:05,470 TRACE [org.hornetq.core.journal.impl.Reclaimer] [Thread-14] Negative from JournalFileImpl: (hornetq-data-6.hq id = 98, recordID = 98) into JournalFileImpl: (hornetq-data-5.hq id = 97, recordID = 97) = 1837

                            2012-01-25 06:12:05,470 TRACE [org.hornetq.core.journal.impl.Reclaimer] [Thread-14] JournalFileImpl: (hornetq-data-5.hq id = 97, recordID = 97) Can't be reclaimed because JournalFileImpl: (hornetq-data-4.hq id = 96, recordID = 96) has negative values

                            2012-01-25 06:12:05,470 TRACE [org.hornetq.core.journal.impl.Reclaimer] [Thread-14] posCount on JournalFileImpl: (hornetq-data-6.hq id = 98, recordID = 98) = 10321

                            2012-01-25 06:12:05,470 TRACE [org.hornetq.core.journal.impl.Reclaimer] [Thread-14] Negative from JournalFileImpl: (hornetq-data-6.hq id = 98, recordID = 98) into JournalFileImpl: (hornetq-data-6.hq id = 98, recordID = 98) = 8398

                            2012-01-25 06:12:05,470 TRACE [org.hornetq.core.journal.impl.Reclaimer] [Thread-14] Negative from JournalFileImpl: (hornetq-data-7.hq id = 99, recordID = 99) into JournalFileImpl: (hornetq-data-6.hq id = 98, recordID = 98) = 1923

                            2012-01-25 06:12:05,470 TRACE [org.hornetq.core.journal.impl.Reclaimer] [Thread-14] JournalFileImpl: (hornetq-data-6.hq id = 98, recordID = 98) Can't be reclaimed because JournalFileImpl: (hornetq-data-5.hq id = 97, recordID = 97) has negative values

                            2012-01-25 06:12:05,470 TRACE [org.hornetq.core.journal.impl.Reclaimer] [Thread-14] posCount on JournalFileImpl: (hornetq-data-7.hq id = 99, recordID = 99) = 8951

                            2012-01-25 06:12:05,470 TRACE [org.hornetq.core.journal.impl.Reclaimer] [Thread-14] Negative from JournalFileImpl: (hornetq-data-7.hq id = 99, recordID = 99) into JournalFileImpl: (hornetq-data-7.hq id = 99, recordID = 99) = 7109

                            2012-01-25 06:12:05,652 DEBUG [org.hornetq.core.client.impl.ClientSessionFactoryImpl] [Thread-2 (group:HornetQ-client-global-threads-1235586206)] Trying reconnection attempt 2

                            2012-01-25 06:12:05,652 DEBUG [org.hornetq.core.remoting.impl.netty.NettyConnector] [Thread-2 (group:HornetQ-client-global-threads-1235586206)] Started Netty Connector version 3.2.3.Final-r${buildNumber}

                            2012-01-25 06:12:05,652 DEBUG [org.hornetq.core.client.impl.ClientSessionFactoryImpl] [Thread-2 (group:HornetQ-client-global-threads-1235586206)] Trying to connect at the main server using connector :org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=ux2-alexprep-playp-net

                            2012-01-25 06:12:05,653 DEBUG [org.hornetq.core.client.impl.ClientSessionFactoryImpl] [Thread-2 (group:HornetQ-client-global-threads-1235586206)] Main server is not up. Hopefully there's a backup configured now!

                            2012-01-25 06:12:05,690 TRACE [org.hornetq.core.journal.impl.JournalImpl] [Thread-14] Loading file hornetq-jms-1.jms

                            2012-01-25 06:12:05,698 TRACE [org.hornetq.core.journal.impl.JournalFilesRepository] [Thread-14] Adding free file JournalFileImpl: (hornetq-jms-1.jms id = 99, recordID = 99)

                            2012-01-25 06:12:05,703 TRACE [org.hornetq.core.journal.impl.JournalImpl] [Thread-14] Loading file hornetq-jms-2.jms

                            2012-01-25 06:12:05,711 TRACE [org.hornetq.core.journal.impl.JournalFilesRepository] [Thread-14] Adding free file JournalFileImpl: (hornetq-jms-2.jms id = 100, recordID = 100)

                            2012-01-25 06:12:05,718 TRACE [org.hornetq.core.journal.impl.JournalFilesRepository] [Thread-14] pushing openFile JournalFileImpl: (hornetq-jms-2.jms id = 102, recordID = 102)

                            2012-01-25 06:12:05,753 ERROR [STDERR] [Thread-14] Jan 25, 2012 6:12:05 AM org.hornetq.core.logging.impl.JULLogDelegate info

                            INFO: trying to deploy queue jms.queue.DLQ

                            2012-01-25 06:12:05,820 ERROR [STDERR] [Thread-14] Jan 25, 2012 6:12:05 AM org.hornetq.core.logging.impl.JULLogDelegate info

                            INFO: trying to deploy queue jms.queue.ExpiryQueue

                            2012-01-25 06:12:05,829 ERROR [STDERR] [Thread-14] Jan 25, 2012 6:12:05 AM org.hornetq.core.logging.impl.JULLogDelegate info

                            INFO: trying to deploy queue jms.queue.inboundQueue

                            2012-01-25 06:12:05,839 ERROR [STDERR] [Thread-14] Jan 25, 2012 6:12:05 AM org.hornetq.core.logging.impl.JULLogDelegate info

                            INFO: trying to deploy queue jms.queue.outboundQueue

                            2012-01-25 06:12:05,848 ERROR [STDERR] [Thread-14] Jan 25, 2012 6:12:05 AM org.hornetq.core.logging.impl.JULLogDelegate info

                            INFO: trying to deploy queue jms.queue.otaQueue

                            2012-01-25 06:12:06,010 INFO  [org.hornetq.core.remoting.impl.netty.NettyAcceptor] [Thread-14] Started Netty Acceptor version 3.2.3.Final-r${buildNumber} ux2.alexprep.playp.net:5446 for CORE protocol

                            2012-01-25 06:12:06,013 WARN  [org.hornetq.core.client.impl.ClientSessionFactoryImpl] [Thread-2 (group:HornetQ-client-global-threads-1235586206)] Failed to connect to server.

                            2012-01-25 06:12:06,022 DEBUG [org.hornetq.core.client.impl.ClientSessionFactoryImpl] [Thread-2 (group:HornetQ-client-global-threads-1235586206)] Trying reconnection attempt 0

                            2012-01-25 06:12:06,022 DEBUG [org.hornetq.core.remoting.impl.netty.NettyConnector] [Thread-2 (group:HornetQ-client-global-threads-1235586206)] Started Netty Connector version 3.2.3.Final-r${buildNumber}

                            2012-01-25 06:12:06,022 DEBUG [org.hornetq.core.client.impl.ClientSessionFactoryImpl] [Thread-2 (group:HornetQ-client-global-threads-1235586206)] Trying to connect at the main server using connector :org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=ux2-alexprep-playp-net

                            2012-01-25 06:12:06,023 DEBUG [org.hornetq.core.client.impl.ClientSessionFactoryImpl] [Thread-2 (group:HornetQ-client-global-threads-1235586206)] Main server is not up. Hopefully there's a backup configured now!

                            2012-01-25 06:12:06,035 ERROR [STDERR] [Thread-14] Jan 25, 2012 6:12:06 AM org.hornetq.core.logging.impl.JULLogDelegate info

                            INFO: Backup Server is now live

                            2012-01-25 06:12:06,525 DEBUG [org.hornetq.core.client.impl.ClientSessionFactoryImpl] [Thread-2 (group:HornetQ-client-global-threads-1235586206)] Trying reconnection attempt 1

                            2012-01-25 06:12:06,525 DEBUG [org.hornetq.core.remoting.impl.netty.NettyConnector] [Thread-2 (group:HornetQ-client-global-threads-1235586206)] Started Netty Connector version 3.2.3.Final-r${buildNumber}

                            2012-01-25 06:12:06,525 DEBUG [org.hornetq.core.client.impl.ClientSessionFactoryImpl] [Thread-2 (group:HornetQ-client-global-threads-1235586206)] Trying to connect at the main server using connector :org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=ux2-alexprep-playp-net

                            2012-01-25 06:12:06,526 DEBUG [org.hornetq.core.client.impl.ClientSessionFactoryImpl] [Thread-2 (group:HornetQ-client-global-threads-1235586206)] Main server is not up. Hopefully there's a backup configured now!

                             

                             

                            2012-01-25 06:12:03,649 DEBUG [org.hornetq.core.client.impl.ClientSessionFactoryImpl] [Thread-2 (group:HornetQ-client-global-threads-1235586206)] Trying reconnection attempt 1

                            2012-01-25 06:12:03,649 DEBUG [org.hornetq.core.remoting.impl.netty.NettyConnector] [Thread-2 (group:HornetQ-client-global-threads-1235586206)] Started Netty Connector version 3.2.3.Final-r${buildNumber}

                            2012-01-25 06:12:03,649 DEBUG [org.hornetq.core.client.impl.ClientSessionFactoryImpl] [Thread-2 (group:HornetQ-client-global-threads-1235586206)] Trying to connect at the main server using connector :org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=ux2-alexprep-playp-net

                            2012-01-25 06:12:03,650 DEBUG [org.hornetq.core.client.impl.ClientSessionFactoryImpl] [Thread-2 (group:HornetQ-client-global-threads-1235586206)] Main server is not up. Hopefully there's a backup configured now!

                             

                            I don't know why it is trying to connect to the main server, since this is the backup??

                            • 11. Re: Failover - Reconnection of the client not working
                              clebert.suconic

                              I don't see the word "announcing backup" anywhere on your logs... can you check at the actual file if you see it...

                               

                              We have added more detailed information on the latest releases... maybe you could download this tag here:

                               

                              https://anonsvn.jboss.org/repos/hornetq/tags/HornetQ_2_2_5_Final

                               

                               

                              And change ClusterManagerImpl at line 853, and add some information about the announcement being made.

                               

                               

                              Meanwhile.. check these IPs:

                              ux2.alexprep.playp.net/63.131.131.231:5445])]           08c1fcbe-439e-11e1-ab05-0019bb364c28 => TopologyMember[connector=Pair[a=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5445&host=ux2-alexprep-playp-net, b=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=5446&host=ux2-alexprep-playp-net]]

                               

                               

                              See if they are correct.. not inverted on your configs.. etc.

                              • 12. Re: Failover - Reconnection of the client not working
                                clebert.suconic

                                Were you able to make it work?...

                                 

                                 

                                If you're still having issues... please post all  your configs and I will take another look.

                                • 13. Re: Failover - Reconnection of the client not working
                                  gabrielp

                                  Hi, finally we were able to found why the failover wasn't working.

                                  In the hornet-jms.xml two connection factories shared the same name but different entry name:

                                   

                                  <connection-factory name="NettyConnectionFactory">

                                        <xa>true</xa>

                                        <connectors>

                                           <connector-ref connector-name="netty"/>

                                        </connectors>

                                        <entries>

                                           <entry name="/XAConnectionFactory"/>

                                        </entries>
                                     </connection-factory>
                                    
                                     <connection-factory name="NettyConnectionFactory">
                                        <xa>false</xa>
                                        <connectors>
                                           <connector-ref connector-name="netty"/>
                                        </connectors>
                                        <entries>

                                           <entry name="/ConnectionFactory"/>
                                        </entries>
                                        <reconnect-attempts>5</reconnect-attempts>

                                     </connection-factory>

                                   

                                  We used the file as is from the distribution, and we never expected that this would produce the whole error (the queues were working normally expect for the failover). By changing the first one to NettyConnectionFactoryXA it got solved.

                                   

                                  Hope this helps somebody else.

                                   

                                  Thanks for your help Clubert