5 Replies Latest reply on Dec 18, 2008 7:41 AM by s.gasse

JBM Messages stuck in Cluster Environment

s.gasse Dec 15, 2008 5:59 AM

Our system consists of two physical multicore machines running Red Hat Enterprise
Server 64 bit, Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_16-b02, mixed mode)
and JBoss 4.3.0.GA-CP02. Each physical machine runs two instances of JBoss, for a
total of four JBoss instances. The four JBoss instances are partitioned into 2
clusters, and both clusters contain 2 JBoss instances from different physical
machines.

One of the cluster partitions (called "messaging") is dedicated to provide JMS
services, based on JBoss Messaging, for clients running inside JBoss on the other
cluster partition (called "mule"). The "messaging" partition uses the "production"
cluster configuration, with two important changes:

In the
${JBOSS_INSTANCE}/deploy/jboss-messaging.sar/connection-factories-service.xml, we
have made the following changes (to improve message load-balancing across
consumers):

<mbean code="org.jboss.jms.server.connectionfactory.ConnectionFactory"
 ...
 <attribute name="SupportsFailover">true</attribute>
 <attribute name="SupportsLoadBalancing">true</attribute>

 <attribute name="PrefetchSize">0</attribute>
 <attribute name="SlowConsumers">true</attribute>
</mbean>

In the <JBOSS_SERVER>/deploy/jboss-messaging.sar/messaging-service.xml, we have made
the following changes (to effectively disable DLQ functionality):

<mbean code="org.jboss.jms.server.ServerPeer"
name="jboss.messaging:service=ServerPeer" xmbean-dd="xmdesc/ServerPeer-xmbean.xml">
...
 <attribute name="DefaultMaxDeliveryAttempts">2147483647</attribute> <!--
Integer.MAX_VALUE since it doesn't support infinite redeliveries -->

 <attribute name="DefaultRedeliveryDelay">1000</attribute> <!-- 1 second -->

 <attribute name="SuckerPassword">thePassword</attribute>
 ...
</mbean>

The JMS clients (on the "mule" instance) use the multicast method to
connect/discover the JMS service (on the "messaging" instance). They use the
following settings:

jms.connection.factory.jndi.name=/ClusteredConnectionFactory
jms.xaconnection.factory.jndi.name=/ClusteredXAConnectionFactory

jnp.jms.partition.name=Messaging
jnp.jms.partition.udpGroup=228.9.3.2
jnp.jms.partition.discoveryPort=1102

We have managed to isolate and reproduce within a few tries the following issue: if
all consumers (from the "mule cluster") for a clustered queue are connected to node
A of the "messaging" cluster and the producers (from the "mule cluster") on the same
queue post messages to node B of the cluster, then the messages remain on node B.

In the JMX console we can observe that:
1. Clustered Queue X on "messaging" node A has 8 consumers,
MessageCount=Delivering=Scheduled=0.
2. Clustered Queue X on "messaging" node B has 0 consumers, MessageCount=2,
Delivering=Scheduled=0.

We have waited for the message sucker to move the messages to node A, but this
hasn't happened. If we shutdown node A, all consumers move over to node B and
consume the messages. We have yet to run a test for the case when we shutdown node
B.

In our production environment, we have not encountered this problem very often (2-3
times in the last few months), because we have a large number of consumers per queue
(30-80) and they are almost evenly distributed on the two "messaging" cluster nodes.
In the next weeks, we will add more functionality to our ESB, and our requirements
do not tolerate this rate of failures.

Since the system is supposed to go live in January any ideas or hints would be very helpful!

Thanks in advance.

Sebastian

1. Re: JBM Messages stuck in Cluster Environment

clebert.suconic Dec 15, 2008 9:11 AM (in response to s.gasse)

I will start with two basic questions...

Are you sure you have set clustered=true on messaging-service.xml.

And are you sure you have your queues set as clustered?

Also... your clients are remote clients or MDBs?
Actions
2. Re: JBM Messages stuck in Cluster Environment

clebert.suconic Dec 15, 2008 10:17 AM (in response to s.gasse)
"clebert.suconic@jboss.com" wrote:

Are you sure you have set clustered=true on messaging-service.xml.

I mean to say -persistent-service.xml

<mbean code="org.jboss.messaging.core.jmx.MessagingPostOfficeService" name="jboss.messaging:service=PostOffice" xmbean-dd="xmdesc/MessagingPostOffice-xmbean.xml"> .... <attribute name="Clustered">true</attribute>
Actions
3. Re: JBM Messages stuck in Cluster Environment

s.gasse Dec 18, 2008 5:44 AM (in response to s.gasse)

Hi,

thanks for the reply. The problem is somehow solved now, since our system uses message selectors and we found out, that this is actually not supported by a clustered JBM implementation. The temporary workaround will be to run JBM as a singleton service.

Thanks again for your time.

Sebastian
Actions
4. Re: JBM Messages stuck in Cluster Environment

timfox Dec 18, 2008 5:55 AM (in response to s.gasse)

"s.gasse" wrote:
Hi,

thanks for the reply. The problem is somehow solved now, since our system uses message selectors and we found out, that this is actually not supported by a clustered JBM implementation.

Message selectors only work on the *local" destination.

This has been discussed many times over the years, but in general multiple message selectors on queues is considered an anti-pattern.

Why? Because you need to scan the entire queue to see if one matches every time you deliver a message. ==> slow!

On a cluster that's compounded even further. You'd have to scan every message on every server to see if one matches. ==> Horrible performance.

BTW this is not JBM specific. You'll find the same issues with pretty much any messaging system.

If you're effectively using selectors to "select" messages from a "table", then you're basically using the messaging system like a database, and that's not what messaging systems are for.

You're basically expecting the messaging system to behave like a clustered database, which is not what it is designed to do!

Perhaps something like Oracle RAC would be a better fit.
Actions
5. Re: JBM Messages stuck in Cluster Environment

s.gasse Dec 18, 2008 7:41 AM (in response to s.gasse)

... agreed on that one. Unfortunately I'm not responsible for architectural decisions in this project. So don't kill the messenger ;-)

Thanks again,

Sebastian
Actions

Go to original post