5 Replies Latest reply on Dec 18, 2008 7:41 AM by s.gasse

    JBM Messages stuck in Cluster Environment

    s.gasse

      Our system consists of two physical multicore machines running Red Hat Enterprise
      Server 64 bit, Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_16-b02, mixed mode)
      and JBoss 4.3.0.GA-CP02. Each physical machine runs two instances of JBoss, for a
      total of four JBoss instances. The four JBoss instances are partitioned into 2
      clusters, and both clusters contain 2 JBoss instances from different physical
      machines.

      One of the cluster partitions (called "messaging") is dedicated to provide JMS
      services, based on JBoss Messaging, for clients running inside JBoss on the other
      cluster partition (called "mule"). The "messaging" partition uses the "production"
      cluster configuration, with two important changes:

      In the
      ${JBOSS_INSTANCE}/deploy/jboss-messaging.sar/connection-factories-service.xml, we
      have made the following changes (to improve message load-balancing across
      consumers):

      <mbean code="org.jboss.jms.server.connectionfactory.ConnectionFactory"
       ...
       <attribute name="SupportsFailover">true</attribute>
       <attribute name="SupportsLoadBalancing">true</attribute>
      
       <attribute name="PrefetchSize">0</attribute>
       <attribute name="SlowConsumers">true</attribute>
      </mbean>


      In the <JBOSS_SERVER>/deploy/jboss-messaging.sar/messaging-service.xml, we have made
      the following changes (to effectively disable DLQ functionality):

      <mbean code="org.jboss.jms.server.ServerPeer"
      name="jboss.messaging:service=ServerPeer" xmbean-dd="xmdesc/ServerPeer-xmbean.xml">
      ...
       <attribute name="DefaultMaxDeliveryAttempts">2147483647</attribute> <!--
      Integer.MAX_VALUE since it doesn't support infinite redeliveries -->
      
       <attribute name="DefaultRedeliveryDelay">1000</attribute> <!-- 1 second -->
      
       <attribute name="SuckerPassword">thePassword</attribute>
       ...
      </mbean>


      The JMS clients (on the "mule" instance) use the multicast method to
      connect/discover the JMS service (on the "messaging" instance). They use the
      following settings:

      jms.connection.factory.jndi.name=/ClusteredConnectionFactory
      jms.xaconnection.factory.jndi.name=/ClusteredXAConnectionFactory
      
      jnp.jms.partition.name=Messaging
      jnp.jms.partition.udpGroup=228.9.3.2
      jnp.jms.partition.discoveryPort=1102


      We have managed to isolate and reproduce within a few tries the following issue: if
      all consumers (from the "mule cluster") for a clustered queue are connected to node
      A of the "messaging" cluster and the producers (from the "mule cluster") on the same
      queue post messages to node B of the cluster, then the messages remain on node B.


      In the JMX console we can observe that:
      1. Clustered Queue X on "messaging" node A has 8 consumers,
      MessageCount=Delivering=Scheduled=0.
      2. Clustered Queue X on "messaging" node B has 0 consumers, MessageCount=2,
      Delivering=Scheduled=0.

      We have waited for the message sucker to move the messages to node A, but this
      hasn't happened. If we shutdown node A, all consumers move over to node B and
      consume the messages. We have yet to run a test for the case when we shutdown node
      B.

      In our production environment, we have not encountered this problem very often (2-3
      times in the last few months), because we have a large number of consumers per queue
      (30-80) and they are almost evenly distributed on the two "messaging" cluster nodes.
      In the next weeks, we will add more functionality to our ESB, and our requirements
      do not tolerate this rate of failures.

      Since the system is supposed to go live in January any ideas or hints would be very helpful!

      Thanks in advance.

      Sebastian

        • 1. Re: JBM Messages stuck in Cluster Environment
          clebert.suconic

          I will start with two basic questions...


          Are you sure you have set clustered=true on messaging-service.xml.

          And are you sure you have your queues set as clustered?



          Also... your clients are remote clients or MDBs?

          • 2. Re: JBM Messages stuck in Cluster Environment
            clebert.suconic

             

            "clebert.suconic@jboss.com" wrote:

            Are you sure you have set clustered=true on messaging-service.xml.


            I mean to say -persistent-service.xml

             <mbean code="org.jboss.messaging.core.jmx.MessagingPostOfficeService"
             name="jboss.messaging:service=PostOffice"
             xmbean-dd="xmdesc/MessagingPostOffice-xmbean.xml">
            
            ....
            
             <attribute name="Clustered">true</attribute>
            
            
            


            • 3. Re: JBM Messages stuck in Cluster Environment
              s.gasse

              Hi,

              thanks for the reply. The problem is somehow solved now, since our system uses message selectors and we found out, that this is actually not supported by a clustered JBM implementation. The temporary workaround will be to run JBM as a singleton service.

              Thanks again for your time.

              Sebastian

              • 4. Re: JBM Messages stuck in Cluster Environment
                timfox

                 

                "s.gasse" wrote:
                Hi,

                thanks for the reply. The problem is somehow solved now, since our system uses message selectors and we found out, that this is actually not supported by a clustered JBM implementation.


                Message selectors only work on the *local" destination.

                This has been discussed many times over the years, but in general multiple message selectors on queues is considered an anti-pattern.

                Why? Because you need to scan the entire queue to see if one matches every time you deliver a message. ==> slow!

                On a cluster that's compounded even further. You'd have to scan every message on every server to see if one matches. ==> Horrible performance.

                BTW this is not JBM specific. You'll find the same issues with pretty much any messaging system.

                If you're effectively using selectors to "select" messages from a "table", then you're basically using the messaging system like a database, and that's not what messaging systems are for.

                You're basically expecting the messaging system to behave like a clustered database, which is not what it is designed to do!

                Perhaps something like Oracle RAC would be a better fit.

                • 5. Re: JBM Messages stuck in Cluster Environment
                  s.gasse

                  ... agreed on that one. Unfortunately I'm not responsible for architectural decisions in this project. So don't kill the messenger ;-)

                  Thanks again,

                  Sebastian