6 Replies Latest reply on Nov 20, 2007 1:43 PM by chipschoch

    Message stranded in cluster

    chipschoch

      JBoss As 4.2.1, JBM 1.4.0.CR2.

      I have a two node cluster with distributed a queue. Each node is running a service that is a consumer of the queue. I have client applications that connect to the clustered queue and post messages. The messages appear to be distributed between the [partial] queues on each node, however only the messages on one of the nodes are getting consumed. From my logging I can see that the messages that are getting consumed are being consumed by the services running on both nodes. Listing messages from jmx-console shows a bunch of unconsumed messages sitting in the queue on one of the nodes.

      Is there some configuration that turns on the balancing between nodes. Shouldn't they all be getting consumed? My consumer service uses the following to connect to the queue.

      <mbean code="org.jboss.jms.jndi.JMSProviderLoader"
       name="jboss.messaging:service=JMSProviderLoader,name=DefaultJMSProvider">
       <attribute name="ProviderName">DefaultJMSProvider</attribute>
       <attribute name="ProviderAdapterClass">
       org.jboss.jms.jndi.JNDIProviderAdapter
       </attribute>
       <!-- The combined connection factory -->
       <attribute name="FactoryRef">ClusteredXAConnectionFactory</attribute>
       <!-- The queue connection factory -->
       <attribute name="QueueFactoryRef">ClusteredXAConnectionFactory</attribute>
       <!-- The topic factory -->
       <attribute name="TopicFactoryRef">ClusteredXAConnectionFactory</attribute>
       <!-- Access JMS via HAJNDI -->
       <attribute name="Properties">
       java.naming.factory.initial=org.jnp.interfaces.NamingContextFactory
       java.naming.factory.url.pkgs=org.jboss.naming:org.jnp.interfaces
       java.naming.provider.url=${jboss.bind.address:localhost}:1100
       jnp.disableDiscovery=false
       jnp.partitionName=${jboss.partition.name:DefaultPartition}
       jnp.discoveryGroup=${jboss.partition.udpGroup:230.0.0.4}
       jnp.discoveryPort=1102
       jnp.discoveryTTL=16
       jnp.discoveryTimeout=5000
       jnp.maxRetries=1
       </attribute>
       </mbean>
      




        • 1. Re: Message stranded in cluster
          timfox

          Looks like you are sending message to both of the partial queues, but both your service instances are connected to the same node for consuming.

          For this kind of topology you probably only want each consumer to consumer from its local node? If so, then you should just use the standard connection factory /ConnectionFactory not /ClusteredConnectionFactory, this is how MDBs configured since clearly an MDB also only wants to consume from its local node.

          /ClusteredConnectionFactory will round robin connections between nodes which is probably not what you want.

          Also, you're using 1.4.0.CR2 this is a CR (non producttion) release and is superceded by 1.4.0.SP1. We're bringing out a SP2 soon too. I recommend you upgrade to that.

          • 2. Re: Message stranded in cluster
            chipschoch

            Thanks Tim. I changed to XAConnectionFactory for the services that consume from the local node and that works fine. However, my cluster is running under RedHat but I have two non clustered JBoss servers running under windows. These are each running a service that is a consumer of messages that are posted to the clustered queue running under redhat. They are connecting use the following:


            <mbean code="org.jboss.jms.jndi.JMSProviderLoader"
             name="jboss.messaging:service=JMSProviderLoader,name=ConversionJMSProvider">
             <attribute name="ProviderName">ConversionJMSProvider</attribute>
             <attribute name="ProviderAdapterClass">
             org.jboss.jms.jndi.JNDIProviderAdapter
             </attribute>
            
             <attribute name="FactoryRef">ClusteredXAConnectionFactory</attribute>
             <attribute name="QueueFactoryRef">ClusteredXAConnectionFactory</attribute>
             <attribute name="TopicFactoryRef">ClusteredXAConnectionFactory</attribute>
            
             <attribute name="Properties">
             java.naming.factory.initial=org.jnp.interfaces.NamingContextFactory
             java.naming.factory.url.pkgs=org.jboss.naming:org.jnp.interfaces
             java.naming.provider.url=172.17.20.60:1100, 172.17.20.61:1100
             jnp.disableDiscovery=false
             jnp.partitionName=dev.application
             jnp.discoveryGroup=228.1.2.4
             jnp.discoveryPort=1102
             jnp.discoveryTTL=16
             jnp.discoveryTimeout=5000
             jnp.maxRetries=1
             </attribute>
             </mbean>


            It appears that both of these windows servers is connecting to the first provider in the list. Messages that are posted to the second provider (i.e. the partial queue on server 172.17.20.61) are never getting consumed.

            I thought that clients of the clustered queue could connect to whichever node they found first and they would get messages from any partial queue. Is this incorrect?

            Thanks.

            • 3. Re: Message stranded in cluster
              timfox

               

              "chip_schoch" wrote:


              I thought that clients of the clustered queue could connect to whichever node they found first and they would get messages from any partial queue. Is this incorrect?



              This is correct, but you want to avoid unnecessary redistribution if you can. Redistribution is designed for the case where you have assymetric producers and consumers.

              In your case it seems it would make more sense for each of your windows servers to consume from different redhat servers since you have one less network round trip.

              I.e. if each windows server is consuming from its own redhat server then JBM just needs to shift the message from redhat server --> windows server.

              If both windows server are consuming from the same redhat server then messages not only have to get from redhat server 1 to windows server 1 and 2, but also from redhat server 2 --> redhat server 1 -> windows server 1 and 2.

              Although it's a sub-optimal topology it should still work - things to check:

              1) Are the queus deployed as clustered = true?
              2) Is each node in the cluster seeing each other? Have a look at the console output from jgroups as they start, when you start the second one you should see the first one registering that.
              3) Make there is an issue with CR release you are using? Have you tried the latest release?

              Hope that helps :)

              • 4. Re: Message stranded in cluster
                chipschoch

                We use the windows servers to run some conversion services that use third party libs that are only available on windows. As the load increases we will be adding more conversion instances, so it really will be assymetrical.

                It seems a bit self defeating if we need to manually manage which linux server in the cluster a windows converter connects to. It seems that we would be better off running jms as an ha-singleton in the cluster to avoid the unnecessary redistribution you describe.

                I will try the latest release. As for the other questions.

                1. Yes the queues have the 'Clustered' attribute set to true.
                2. The clusters are seeing each other.

                I actually have a singleton service that is running on one of the linux servers that queues messages to a clustered queue. It uses the ClustereXAConnectionFactory and appears to be round-robining the messages it queues, which would indicate that the cluster is working correctly. It is the consumers that are only getting messages off one of the partial queues.

                • 5. Re: Message stranded in cluster
                  chipschoch

                  Ok, I finally got everything up and running on JBAS 4.2.2.GA and JBM 1.4.0.SP1.

                  I use a ClusteredConnectionFactory to push messages to a clustered queue, and it is round-robining them to the partial queues as expected.

                  Each of the services consuming these messages pushes another message on to another clustered queue using XAConnectionFactory, which results in each node pushing the message to its own partial queue.

                  The consumers of these messages are non-clustered windows servers that are both connected to one of the nodes in the cluster using the ClusteredConnectionFactory available through ha-jndi.

                  I expected the messagesucker to move messages from the node with no consumer to the node with consumers but it is not. Is this because those message were pushed using the XAConnectionFactory instead of the ClusteredConnectionFactory?

                  Thanks

                  • 6. Re: Message stranded in cluster
                    chipschoch

                    Apparently I started one of my nodes the first time before I set the clustered flag in the queue definition because when I looked at the jbm_postoffice records one node had clustered set to false. By deleting the records then restarting the queue is set to clustered and it appears that everything is working as expected.

                    I guess changing that flag after a queue has been established does not have any effect.