6 Replies Latest reply on Apr 9, 2010 11:42 AM by Robert West

    Issue with JMS message loss on JBoss shutdown

    Robert West Newbie

      I've been working on developing an approach to improve our JMS infrastructure.  At present, we still use the default JMS broker internal to JBoss for our main production application, in spite of having 10 servers running for different purposes and clients.  None are clustered, so each server individually processes all of the messages that server publishes.  Additionally, we have additional products that are seeing a need to use a messaging infrastructure, and anticipate the need to have our various products communicating with each other using a messaging infrastructure.

       

      As a result, I've been putting together an approach to implement an external messaging broker that our Java and non-Java systems can share, ensuring HA for the level of reliability we need for our applications.  I've been evaluating both HornetQ and ActiveMQ for these purposes, experimenting with configurations, failover, reliability, ease of monitoring, etc.  However, I've come across a rather difficult problem with both, which I suspect may have as much or more to do with JBoss than either of the broker implementations (hence my post here).

       

      In both cases, for one of my tests I configured a single standalone external broker with a standard JCA in JBoss.  I deploy (among other things) a simple MDB that receives a text message, grabs a couple of properties along with the text, and simply logs the message.  The bean uses AUTO_ACKNOWLEDGE and container managed transactions, with a persistent queue.  I have a simple web page that allows me to publish an arbitrary number of messages at a time.  For some simplistic variability in the processing speed, all even messages take an extra second to process (via a Thread.sleep() call).

       

      When I start up the broker and JBoss and publish messages, everything runs fine.  However, if I shut down JBoss after all messages have been published but before all messages have been processed, during the shutdown process I will receive some number of errors from the broker's JCA code indicating an error trying to deliver a message to the MDB subsystem because the EJB container is shutting down/shut down.  The number of errors can vary slightly due to timing issues, and obviously is slightly different between ActiveMQ and HornetQ, but the implication is the same in both cases: every message that results in one of these errors is lost.  They are not sent back to the broker to be tried again, but are treated as if the message was properly acknowledged.

       

      If only one broker had an issue, I would suspect an error in that broker's JCA implementation.  However, since both brokers fail in nearly the exact same way (the only practical difference seems to be that ActiveMQ either grabs more messages at once or is faster at grabbing more, resulting in most cases in losing all of the messages, as opposed to only 1-4 lost with HornetQ), it seemed to make more sense to start out here.  I can cross-post to the broker's forums if people feel that is necessary.

       

      I've attached two tar files, one for HornetQ and one for ActiveMQ.  Both contain the broker configuration used, everything that was deployed to the JBoss deploy directory in terms of the JCA functionality, the JBoss server log from a representative test run and Eclipse projects for my testing ear with full source code.  The only material bit of JBoss config not contained in the tar files is that I added "-Dorg.hornetq.logger-delegate-factory-class-name=org.hornetq.integration.logging.Log4jLogDelegateFactory" to JAVA_OPTS in run.sh along with DEBUG logging for org.hornetq and WARN logging for org.hornetq.utils.UTF8Util in the jboss-log4j.xml config file (UTF8Util is extremely spammy in terms of logging, and doesn't seem to be relevant for this test).  Without that configuration, the logging details indicating the messages were failed to be delivered will not display (probably because they're going to a hidden/not configured JUL logger).

       

      I'm hoping there's something simple wrong with my configuration, but I've spent two days trying various things and haven't been able to track anything down as of yet.  I've tried clustered and non-clustered brokers, I've tried tweaking various ActivationConfig properties and JCA properties, setting values that are supposed to be the default and tweaking prefetch sizes and the like, but nothing has had any impact on this issue.  I could probably be content with some duplicate delivery, but missing messages are not an option for the kinds of messages we're passing around.  If I've missed any potentially relevant configuration files, please let me know.

       

      Version info:

      JBoss 5.1.0 GA (we use a local build, and have updated JBoss Cache to 3.2.1.GA, JBossTS to 4.6.1.GA_CP03 and JGroups to 2.6.13.GA to pick up bug fixes for issues we have encountered in production)

      ActiveMQ 5.3.0 GA (tried 5.3.1 GA, but there is a mutex bug that was preventing MDB processing from picking up after a broker failover)

      HornetQ 2.0.0 GA

      Java 1.6.0_17

      Been runinng my tests on Mac OS 10.6.2.  Can get them run on a Linux distro if someone thinks that might be relevant.  We use CentOS in production, so that's the eventual target platform.

        • 1. Re: Issue with JMS message loss on JBoss shutdown
          Robert West Newbie

          At the suggestion of one of my developers, I reran the tests with an embedded HornetQ broker to see if the same behavior exists there.  We have no intention of using that sort of configuration, as we need the broker to be independent of any particular application and there would appear to be no reason to run a full JBoss instance just to deploy a broker, but it makes for an interesting test case (particularly as if I understand the plans correctly, HornetQ will be the default internal broker for JBoss in the future).


          Unfortunately, I observed the same behavior.  On JBoss shutdown, there would be three or four failures to deliver messages after the EJB infrastructure was shut down, and on restart those messages never got delivered.  I've included a similar tar here that contains the hornetq-ra.rar and hornetq.sar directories from the deploy directory (think these are still the default from all-with-hornetq, actually, aside from defining my queues) along with the JBoss logs and the tweaked sample application (to account for the queues being in JNDI).

           

          Edit: Meant to add that I was reviewing my notes from the last time I was looking at this project (the project was sidelined for a while due to higher priorities) back when we were still using JBoss 4.2.2.  I don't have detailed logs and what-not from that effort, but I know we got to the point where we were running tests with mutliple JBoss instances publishing and consuming messages (using the same sample application) against an ActiveMQ setup that involved two brokers using shared file storage HA.  We were causing failures in both JBoss and ActiveMQ using normal shutdown, kill -9, and pulling the network plug, and we never observed any message loss.  I believe that was using ActiveMQ 5.2.0, so not a large amount of change on that side.  At the time, I was unable to get a working HornetQ JCA configuration, so no tests were performed with HornetQ.  It's possible we missed it, but since we were specifically looking to verify no messages were lost during failover, I tend to doubt we would have.  That would imply that something that changed between JBoss 4.2.2 and JBoss 5.1.0 may be at issue here (the shutdown process itself, maybe?).

           

          Message was edited by: Robert West

          • 2. Re: Issue with JMS message loss on JBoss shutdown
            John Newman Newbie

            Hi,

             

            I can't really answer your question, as we seem to have a lot of the same type of problems (unclean shutdown, things stuck in the middle, not fully processed, should not have kicked off to begin with) 

             

            But I just want to advise you to take a serious look at activemq's bug tracker before committing to it.  Be sure to run your volume and stress tests for weeks, as that's when the problems really start to show up .  If you must, do not use 5.2.0 as it has a ton of bugs that were fixed in 5.3.0, but there's still 275+ open bugs in there.    So just be wary of that.   take a look http://issues.apache.org/activemq/secure/IssueNavigator.jspa?reset=true&&type=1&pid=10520&status=1&sorter/field=issuekey&sorter/order=DESC  yikes.  M ost notably this one https://issues.apache.org/activemq/browse/AMQ-2009  is absoutely a blocker.  Restart every couple days woo hoo!

            • 3. Re: Issue with JMS message loss on JBoss shutdown
              Robert West Newbie

              I have no intention of using 5.2.0, it's just what was the latest available a few months back when I started on this effort.  If we move forward with ActiveMQ, it would be with 5.3.0 at present (5.3.1 has a mutex bug that prevents failover from working properly, fixed in 5.4.0).  To be honest, I've found significant issues with both (such as a frequent inability to simply restart a HornetQ broker pair that was cleanly shutdown with no connected clients without recopying the data directory from master to slave and broker failovers requiring consumer restarts for MDBs to pick up processing).  I tend to expect to deal with bugs in whatever solution we pick; that's just the way software works.

               

              However, lost messages regardless of the broker selected or the configuration in use (in-vm, external, external clustered) is a much more serious issue that prevents MDBs from being used at all in JBoss 5.1.

              • 4. Re: Issue with JMS message loss on JBoss shutdown
                John Newman Newbie

                that's cool.  Just be advised, that one I linked you to is still open in 5.x..   About once every other week, activemq will spontaneously fall into a coma.

                 

                i think we are going to end up using http instead of jms.  I don't know though, I have so much research to do.  This really starts to get difficult when you start to look at all the error handling scenarios.

                 

                But as far as your problem goes, I'm assuming you already walked through http://activemq.apache.org/integrating-apache-activemq-with-jboss.html  .. if not the answer may be in there.  In my mind the shutdown hook should 'just work' if you've got the configuration right.  AFAIK this is working for us.

                • 5. Re: Issue with JMS message loss on JBoss shutdown
                  Robert West Newbie

                  I believe I've identified the source of an issue, and it's definitely a JBoss bug.  org.jboss.ejb3.BlockContainerShutdownInterceptor will throw a DispatcherConnectException during the interceptor chain processing of a JMS message (before calling the MDB) if the EJB infrastructure is not up.  However, org.jboss.ejb3.mdb.inflow.MessageInflowLocalProxy.delivery() only marks the transaction for rollback if the interceptor chain (and thus the MDB itself) throws an Error or a RuntimeException.  Any other type of throwable will result in a commit(), which ActiveMQ and HornetQ are properly (according to the spec) treating as the indication that the message was successfully received and procesed, and thus they ACK the message back to the broker.

                   

                  I'm going to try and file a JBoss bug on this point and work on some sort of a patch file as a solution.  The simplest answer is to have MessageInflowLocalProxy.delivery() mark the transaction for rollback if a DispatcherConnectException is thrown, but that seems rather crude to me.  It seems like a better solution would be to modify Invocation (if a modification is even necessary) to enable interceptors to either directly mark the transaction for rollback or provide an indicator to the source of the Invocation that the invocation was interrupted before being executed.