6 Replies Latest reply on Nov 20, 2007 1:43 PM by chipschoch

Message stranded in cluster

chipschoch Nov 13, 2007 2:27 PM

JBoss As 4.2.1, JBM 1.4.0.CR2.

I have a two node cluster with distributed a queue. Each node is running a service that is a consumer of the queue. I have client applications that connect to the clustered queue and post messages. The messages appear to be distributed between the [partial] queues on each node, however only the messages on one of the nodes are getting consumed. From my logging I can see that the messages that are getting consumed are being consumed by the services running on both nodes. Listing messages from jmx-console shows a bunch of unconsumed messages sitting in the queue on one of the nodes.

Is there some configuration that turns on the balancing between nodes. Shouldn't they all be getting consumed? My consumer service uses the following to connect to the queue.

<mbean code="org.jboss.jms.jndi.JMSProviderLoader"
 name="jboss.messaging:service=JMSProviderLoader,name=DefaultJMSProvider">
 <attribute name="ProviderName">DefaultJMSProvider</attribute>
 <attribute name="ProviderAdapterClass">
 org.jboss.jms.jndi.JNDIProviderAdapter
 </attribute>
 <!-- The combined connection factory -->
 <attribute name="FactoryRef">ClusteredXAConnectionFactory</attribute>
 <!-- The queue connection factory -->
 <attribute name="QueueFactoryRef">ClusteredXAConnectionFactory</attribute>
 <!-- The topic factory -->
 <attribute name="TopicFactoryRef">ClusteredXAConnectionFactory</attribute>
 <!-- Access JMS via HAJNDI -->
 <attribute name="Properties">
 java.naming.factory.initial=org.jnp.interfaces.NamingContextFactory
 java.naming.factory.url.pkgs=org.jboss.naming:org.jnp.interfaces
 java.naming.provider.url=${jboss.bind.address:localhost}:1100
 jnp.disableDiscovery=false
 jnp.partitionName=${jboss.partition.name:DefaultPartition}
 jnp.discoveryGroup=${jboss.partition.udpGroup:230.0.0.4}
 jnp.discoveryPort=1102
 jnp.discoveryTTL=16
 jnp.discoveryTimeout=5000
 jnp.maxRetries=1
 </attribute>
 </mbean>

1. Re: Message stranded in cluster

timfox Nov 13, 2007 2:35 PM (in response to chipschoch)

Looks like you are sending message to both of the partial queues, but both your service instances are connected to the same node for consuming.

For this kind of topology you probably only want each consumer to consumer from its local node? If so, then you should just use the standard connection factory /ConnectionFactory not /ClusteredConnectionFactory, this is how MDBs configured since clearly an MDB also only wants to consume from its local node.

/ClusteredConnectionFactory will round robin connections between nodes which is probably not what you want.

Also, you're using 1.4.0.CR2 this is a CR (non producttion) release and is superceded by 1.4.0.SP1. We're bringing out a SP2 soon too. I recommend you upgrade to that.
Actions

2. Re: Message stranded in cluster

chipschoch Nov 13, 2007 4:38 PM (in response to chipschoch)

Thanks Tim. I changed to XAConnectionFactory for the services that consume from the local node and that works fine. However, my cluster is running under RedHat but I have two non clustered JBoss servers running under windows. These are each running a service that is a consumer of messages that are posted to the clustered queue running under redhat. They are connecting use the following:

<mbean code="org.jboss.jms.jndi.JMSProviderLoader"
 name="jboss.messaging:service=JMSProviderLoader,name=ConversionJMSProvider">
 <attribute name="ProviderName">ConversionJMSProvider</attribute>
 <attribute name="ProviderAdapterClass">
 org.jboss.jms.jndi.JNDIProviderAdapter
 </attribute>

 <attribute name="FactoryRef">ClusteredXAConnectionFactory</attribute>
 <attribute name="QueueFactoryRef">ClusteredXAConnectionFactory</attribute>
 <attribute name="TopicFactoryRef">ClusteredXAConnectionFactory</attribute>

 <attribute name="Properties">
 java.naming.factory.initial=org.jnp.interfaces.NamingContextFactory
 java.naming.factory.url.pkgs=org.jboss.naming:org.jnp.interfaces
 java.naming.provider.url=172.17.20.60:1100, 172.17.20.61:1100
 jnp.disableDiscovery=false
 jnp.partitionName=dev.application
 jnp.discoveryGroup=228.1.2.4
 jnp.discoveryPort=1102
 jnp.discoveryTTL=16
 jnp.discoveryTimeout=5000
 jnp.maxRetries=1
 </attribute>
 </mbean>

It appears that both of these windows servers is connecting to the first provider in the list. Messages that are posted to the second provider (i.e. the partial queue on server 172.17.20.61) are never getting consumed.

I thought that clients of the clustered queue could connect to whichever node they found first and they would get messages from any partial queue. Is this incorrect?

Thanks.

3. Re: Message stranded in cluster

timfox Nov 14, 2007 7:34 AM (in response to chipschoch)

"chip_schoch" wrote:

I thought that clients of the clustered queue could connect to whichever node they found first and they would get messages from any partial queue. Is this incorrect?

This is correct, but you want to avoid unnecessary redistribution if you can. Redistribution is designed for the case where you have assymetric producers and consumers.

In your case it seems it would make more sense for each of your windows servers to consume from different redhat servers since you have one less network round trip.

I.e. if each windows server is consuming from its own redhat server then JBM just needs to shift the message from redhat server --> windows server.

If both windows server are consuming from the same redhat server then messages not only have to get from redhat server 1 to windows server 1 and 2, but also from redhat server 2 --> redhat server 1 -> windows server 1 and 2.

Although it's a sub-optimal topology it should still work - things to check:

1) Are the queus deployed as clustered = true?
2) Is each node in the cluster seeing each other? Have a look at the console output from jgroups as they start, when you start the second one you should see the first one registering that.
3) Make there is an issue with CR release you are using? Have you tried the latest release?

Hope that helps :)
Actions
4. Re: Message stranded in cluster

chipschoch Nov 15, 2007 1:04 PM (in response to chipschoch)

We use the windows servers to run some conversion services that use third party libs that are only available on windows. As the load increases we will be adding more conversion instances, so it really will be assymetrical.

It seems a bit self defeating if we need to manually manage which linux server in the cluster a windows converter connects to. It seems that we would be better off running jms as an ha-singleton in the cluster to avoid the unnecessary redistribution you describe.

I will try the latest release. As for the other questions.

1. Yes the queues have the 'Clustered' attribute set to true.
2. The clusters are seeing each other.

I actually have a singleton service that is running on one of the linux servers that queues messages to a clustered queue. It uses the ClustereXAConnectionFactory and appears to be round-robining the messages it queues, which would indicate that the cluster is working correctly. It is the consumers that are only getting messages off one of the partial queues.
Actions
5. Re: Message stranded in cluster

chipschoch Nov 20, 2007 10:19 AM (in response to chipschoch)

Ok, I finally got everything up and running on JBAS 4.2.2.GA and JBM 1.4.0.SP1.

I use a ClusteredConnectionFactory to push messages to a clustered queue, and it is round-robining them to the partial queues as expected.

Each of the services consuming these messages pushes another message on to another clustered queue using XAConnectionFactory, which results in each node pushing the message to its own partial queue.

The consumers of these messages are non-clustered windows servers that are both connected to one of the nodes in the cluster using the ClusteredConnectionFactory available through ha-jndi.

I expected the messagesucker to move messages from the node with no consumer to the node with consumers but it is not. Is this because those message were pushed using the XAConnectionFactory instead of the ClusteredConnectionFactory?

Thanks
Actions
6. Re: Message stranded in cluster

chipschoch Nov 20, 2007 1:43 PM (in response to chipschoch)

Apparently I started one of my nodes the first time before I set the clustered flag in the queue definition because when I looked at the jbm_postoffice records one node had clustered set to false. By deleting the records then restarting the queue is set to clustered and it appears that everything is working as expected.

I guess changing that flag after a queue has been established does not have any effect.
Actions

Go to original post