8 Replies Latest reply on Jan 17, 2008 3:43 AM by timfox

    Messaging 1.3 Clustering question

    nbz

      I am having trouble either configuring things or understanding how they are supposed to work.

      I have a cluster of 2 nodes (ServerPeer 0 and 1). Jboss messaging 1.3 has been setup on both with the clustering configuration (using the correct clustered-oracle-persistence.xml configuration).

      The 2 nodes see each other just fine. If I take one down, I see the other one being notified and updating its cluster map, and vice versa.

      I have defined one clustered queue (which I want to behave as a virtual queue). To do this, I defined a destination for this queue on each of the 2 nodes. I assume it needs to created on both nodes, even though it is supposed to appear as one virtual queue to the client. Is this correct?

      The client connects to the queue using the ClusteredConnectionFactory and a HAJNDI lookup to something like <IPofServer0:1100, IPofServer1:1100>.

      After sending 500 messages, I noticed that all of them show up on the queue of Server 0 and none on Server 1, even with the policy set to round robin. Further more, if i take server 0 down, the 500 messages are still NOT accessible on Server 1. My understanding was that jboss-messaging clustering would allow for the messages to either be replicated or somehow made available to the client even if one of the Servers went down. Is this correct? What am I missing?

        • 1. Re: Messaging 1.3 Clustering question
          tim.shaw

          Same for me with Messaging 1.4 and AS 4.2.2.GA

          Again, I presume I'm missing something ...

          Took an out-of-the-box configuration, using a combination of the ejb3mdb and cluster examples - deploying the MDB to both nodes and changing the client to send lots of messages (10000+) results in all messages appearing on one (either) node, and failover not working.

          Anyone got a simple load-balance test configuration which works please?

          Thanks

          tim

          • 2. Re: Messaging 1.3 Clustering question
            timfox

            The examples directory contains examples of distributed queue, topic and failover

            • 3. Re: Messaging 1.3 Clustering question
              tim.shaw

              Yes, but the distributed queue example opens 2 connections (which are given out round-robin as expected) and sends a single message on one and receives it on the other. This may be a distributed queue, but doesn't show load balancing.

              What I am trying to do is send 200000 (or, for testing, 100) onto a single queue and have the clustered machines service these (via MDB's) using a load-balancer. I am happy with a round-robin for now :-)
              I believe this is a valid use case!

              Is this possible? I assumed it was ... MQ would service the messages round-robin, and JBM is supposed to be better!

              Are there any examples of such a configuration around? I can only get one of the machines to service the messages - which one depends on the ClusteredConnectionFactory of course.

              Thanks

              tim

              • 4. Re: Messaging 1.3 Clustering question
                timfox

                If you have a distributed queue with an MDB consuming from that queue - one on each node, and send messages to that queue on a particular node, then the local consumer (i.e. the local MDB) will always get the messages by preference.

                This makes sense, since there's no point sending the messages to different nodes if the local consumer can cope with them happily - this would just be unnecessary network traffic.

                Consumers on other nodes will consume the messages only if the local consumer either doesn't exist, or is "busy".

                What does "busy" mean? Each consumer maintains a local buffer of messages (default size 150) from which it consumes. Once that buffer is full, the consumer is busy. For a fast consumer, the buffer would never get full so it would never be busy.

                You can alter this value (see prefetchSize in the doco).

                Also make sure you're using 1.4.0.SP3 and JBoss Remoting 2.2.2.SP4

                Hope that clears things up.

                • 5. Re: Messaging 1.3 Clustering question
                  nbz

                  Ok. That makes sense in terms of the local consumption. There are a couple more oddities in running clustered messaging (I'm on 1.4 now).

                  I'm having trouble understanding if the 2 nodes in a cluster act as some sort of primary/secondary, or if they're both equal primaries. In the event of a basic failover (one of the 2 appservers goes down), things failover to the other just fine. But what happens if, say, both servers are running, but are somehow unable to connect to each over the network (so each thinks the other is unavailable). What happens to the messages stored in the shared database?

                  Under normal circumstances, when node A held, let's say, 100 messages, and node A went down. The 100 messages would "appear" on Node B after failover was complete (being read from the shared database).
                  What happens in the scenario where both servers stay up, but can't see each other on the network?

                  • 6. Re: Messaging 1.3 Clustering question
                    nbz

                    The second oddity I noticed is this:

                    Failover will occur cleanly if I kill -9 one of the 2 nodes' process.

                    On the other hand, doing a graceful shutdown of the appserver on a given node will NOT trigger failover. Any messages that were held on that given queue will NOT appear on the other site. Why is that?

                    • 7. Re: Messaging 1.3 Clustering question
                      timfox

                       

                      "nbakizada" wrote:


                      I'm having trouble understanding if the 2 nodes in a cluster act as some sort of primary/secondary, or if they're both equal primaries. In the event of a basic failover (one of the 2 appservers goes down), things failover to the other just fine. But what happens if, say, both servers are running, but are somehow unable to connect to each over the network (so each thinks the other is unavailable). What happens to the messages stored in the shared database?


                      You're referring to network partitioning commonly known as "split brain".

                      In this case both nodes would fail over for each other. This can result in duplicate delivery of messages.

                      Note that exact same issue would arise with JBoss MQ / A.N. other product so is not specific to JBM. It's a general network condition.

                      To mitigate against this you can use IP bonding to add redundancy on the IP level (see JGroups wiki), and code your consumers to be resilient to duplicate messages.



                      • 8. Re: Messaging 1.3 Clustering question
                        timfox

                         

                        "nbakizada" wrote:
                        The second oddity I noticed is this:

                        Failover will occur cleanly if I kill -9 one of the 2 nodes' process.

                        On the other hand, doing a graceful shutdown of the appserver on a given node will NOT trigger failover. Any messages that were held on that given queue will NOT appear on the other site. Why is that?


                        There is a FAQ on this.