2 Replies Latest reply on Oct 2, 2006 4:29 PM by Brian Stansberry

    JMS Queue access lost when JMS fails over

    Kerry Barnes Newbie

      We have a cluster setup of a two (reduced to better see the issue) JBoss-4.0.4-GA(patch-1) servers. We set these up using the zip file and then applied the EJB3-RC8 to it to get all the clustering pre-configured (mainly HA-JMS and the EJB3 implementation)

      Anyway, under ideal conditions everything works perfect. We have a myriad of JMS queues and when the system comes up they deploy on a single box and on the other box we get the messages that they are waiting. We process our transactions and they propagate through the system as expected, MDB?s process the messages and some of these beans spawn new messages onto additional queues. Again, all works as advertised.

      Now, the machine in master mode fails. The queues deploy on the waiting machine and everything ?looks? kosher. So we bring the other machine back into the cluster, and it accepts its role as standby, we see the queues attempt to deploy but instead drop into their waiting state.

      Our processes start to run again, and again the initial messages are distributed among the machines. Now however those beans that spawn new messages into different queue?s get a JNDI error of queue not bound. It?s like they are looking in the local JNDI for the queue and not the HAJNDI, however I can?t see anywhere this could be set. I checked all the ear,jar and war files for a jndi.properties file and found none. I thought I would find one pointing to localhost:1099, but no dice on the easy answer. And of course before we failed over the queue?s were seen on both boxes.

      So we drop the one box that hasn?t failed, so that the queues redeploy on the original master box, and then bring that machine back up so everything is like it was at the start of this, and everything starts to work again! Frustrating to say the least.

      Can anyone give me a rough idea as to where even start looking into this one? I have pretty much exhausted my idea pool.

        • 1. Re: JMS Queue access lost when JMS fails over
          Kerry Barnes Newbie

          Followup. So we got around this issue by suplying the Hashtable parameter in the construction of our InitialContex and then specifying all of the hosts and thier HAJNDI ports under the PROVIDER_URL key.

          This seems like a hack since now I need to maintain these in a properties file (since we deploy this application to different sites).

          • 2. Re: JMS Queue access lost when JMS fails over
            Brian Stansberry Master

            Are your clients running inside the application server itself? If so, just providing localhost:1100 as the provider URL should work. Or are you using the -b switch to bind ports to a particular IP, in which case that won't work.

            we see the queues attempt to deploy but instead drop into their waiting state.


            That doesn't sound right; queues shouldn't even attempt to deploy unless they are on the master node. Are you deploying your queues from the deploy-hasingleton directory?