4 Replies Latest reply on Jul 16, 2008 10:44 AM by garytully

    Pb with pure master/slave configuration

    frouleau

      Hi,

       

      I am evaluating Fuse Message Broker for production use in pure master/slave architecture. So I am using fuse-5.0.0.17 to prototype...

       

      On host A, I have a master broker with default config. On host B the slave with the same config file except masterConnectorURI="tcp://hostA:61616" shutdownOnMasterFailure="false" in the broker part.

       

      I have started the master then the slave, everything looks good.

       

      Then I have launched 2 consumers on 1 queue on host A and a producer on host B for the same queue. I have used examples "ant consumer -Durl=tcp://localhost:61616 -Dmax=50000" and "ant producer -Durl=tcp://hostA:61616 -Dmax=100000 -Ddurable=true"

       

      After few message arround 500 I have the following exception in logs:

       

      2008-07-10 16:43:11,138 /PORTFROULEAU#1 ERROR Service                        - Async error occurred: javax.jms.JMSException: Slave broker out of sync with master: Dispatched message (ID:PORTFROULEAU-1203-1215700952760-0:0:1:1:516) was not in the pending list

      javax.jms.JMSException: Slave broker out of sync with master: Dispatched message (ID:PORTFROULEAU-1203-1215700952760-0:0:1:1:516) was not in the pending list

      at org.apache.activemq.broker.region.PrefetchSubscription.processMessageDispatchNotification(PrefetchSubscription.java:171)

      at org.apache.activemq.broker.region.AbstractRegion.processDispatchNotification(AbstractRegion.java:405)

      at org.apache.activemq.broker.region.RegionBroker.processDispatchNotification(RegionBroker.java:593)

      at org.apache.activemq.broker.BrokerFilter.processDispatchNotification(BrokerFilter.java:201)

      at org.apache.activemq.broker.BrokerFilter.processDispatchNotification(BrokerFilter.java:201)

      at org.apache.activemq.broker.BrokerFilter.processDispatchNotification(BrokerFilter.java:201)

      at org.apache.activemq.broker.MutableBrokerFilter.processDispatchNotification(MutableBrokerFilter.java:208)

      at org.apache.activemq.broker.TransportConnection.processMessageDispatchNotification(TransportConnection.java:454)

      at org.apache.activemq.command.MessageDispatchNotification.visit(MessageDispatchNotification.java:77)

      at org.apache.activemq.broker.TransportConnection.service(TransportConnection.java:293)

      at org.apache.activemq.broker.TransportConnection$1.onCommand(TransportConnection.java:181)

      at org.apache.activemq.transport.ResponseCorrelator.onCommand(ResponseCorrelator.java:104)

      at org.apache.activemq.transport.TransportFilter.onCommand(TransportFilter.java:68)

      at org.apache.activemq.transport.vm.VMTransport.iterate(VMTransport.java:205)

      at org.apache.activemq.thread.DedicatedTaskRunner.runTask(DedicatedTaskRunner.java:98)

      at org.apache.activemq.thread.DedicatedTaskRunner$1.run(DedicatedTaskRunner.java:36)

       

       

      I have tried with non durable messages, with only 1 consumer and still get the error. I have look for similar bug in Jira and found AMQ-1585 but corrective patch was applied to the version I am using.

       

      Any idea ? A bug or a misconfiguration ?

      Regards,

       

      Edited by: frouleau on Jul 10, 2008 11:13 AM

        • 1. Re: Pb with pure master/slave configuration
          garytully

          hi frouleau,

           

          the problem may be the configuration of the multicast discovery network in the default activemq.xml. For pure master slave, the network connection between the master and slave is automatically configured  by the presence of the masterConnectorURI.

           

          Any additional networking between the two brokers just gets in the way.

           

          Try commenting out or removing the networkConnectons section in the configuration if it is still present.

           

          There is more information at http://activemq.apache.org/masterslave.html#MasterSlave-ConfiguringPureMasterSlave

           

          hope this helps,

          Gary.

          • 2. Re: Pb with pure master/slave configuration
            frouleau

            Hi,

             

            I have checked and made further testing. I still have the problem, but it seems better (at least in nominal case) if I set the Consumer with -Ddurable=true. Looking into the code, the only difference is the call to connection.setClientID(String). Without that I always have the error withing few hundreds of messages.

             

            Here is the slave config file:

             

            <beans>

            <broker xmlns="http://activemq.org/config/1.0" brokerName="PORTFROULEAU"

            dataDirectory="${activemq.base}/data"  masterConnectorURI="tcp://192.168.137.131:61616" shutdownOnMasterFailure="false">

            <transportConnectors>

            <transportConnector name="openwire" uri="tcp://localhost:61616"/>

            </transportConnectors>

             

            </broker>

            </beans>

             

            Regards,

             

            Edited by: frouleau on Jul 15, 2008 8:29 AM

             

            Edited by: frouleau on Jul 15, 2008 8:31 AM

            • 3. Re: Pb with pure master/slave configuration
              frouleau

              Hum,

               

              I have completly removed the transport section and it seems better.

              I still have sync errors between master/slave if a client disconnects unexpectedly (as a crash). Then I have to stop and restart master and slave to recover properly.

               

              Is this the normal behavior ?

               

              Reagrds,

              • 4. Re: Pb with pure master/slave configuration
                garytully

                In the case of a client crash, on client restart it should be directed to reconnecto to the master with the randomize=false attribute on the connection URI.

                failover://(tcp://masterhost:61616,tcp://slavehost:61616)?randomize=false

                 

                Is it after the reconnection that you see the sync errors?