3 Replies Latest reply on Oct 26, 2009 2:17 AM by timfox

    Processing messages in Batches

      Currently every message I take off the queue results in an insert/commit to the database. The idea is to limit these by processing the messages in batches and using JDBC batch updates to limit the number of commits to the database. Whenever you process things in batches that are coming from some input sometimes you don't want partial batches to have to wait before messages that make it a full batch to arrive. Therefore you implement some kind of timer such that if enough time as passed and a message hasn't arrived and your batch is still not full then you flush your batch. I'm keen to get feedback on this approach. Here's what I currently do and it's probably excessive.

      In order to process things in batch I simply put them in a list until the mod my batch size with the size of the list is 0. That's when I flush my batch by calling processBatch which cycles thru and causes a JDBC batch update to occur.

      Whenever I receive a message I start a ScheduledExecutor that will flush the batch (regardless of it's size) in 10 seconds. If no message arrives in 10 seconds the partial batch will be flushed. In order to prevent the execution of many of these for every message I receive if I notice that the time b/w when I created my scheduled task and the time when the next message arrives is say less than 1 seconds (suggesting I'm seeing a burst of messages) then I simply cancel the scheduledtask because it's very likely that enough messages that will make up a full batch will arrive in order to flush the batch. Anytime I have received enough messages that take me into a full batch I immediately cancel the scheduled task because I now have a full batch and don't need to rely on the schedule task for it to be flushed.

      Anyway I think you get the idea. The general idea is to be able to still process messages relatively quickly when the incoming rate is slow but yet still optimally minimize database activity when the rate is fast.

      Is there a better way to accomplish these objectives?

      Flushing the messages entails performing a JDBC update and also acknowledging the last message in the batch since that implicitly acknowledges all prior unacknowledge messages.

        • 1. Re: Processing messages in Batches
          timfox

          This is not really HornetQ specific and is a common pattern. The JMS Bridge, for example, does something similar.

          Bear in mind, that for you'll probably want to consume the messages in the same tx as you send them to the JDBC database, which will involve a JTA transaction.

          Or you could avoid JTA (probably a good idea) and just ack to the JMS system after you've inserted in the database. If the system crashes you can avoid duplicates in the db by doing a conditional insert.

          • 2. Re: Processing messages in Batches

            I've been looking at JMSBridgeImpl and noticed that you are able to acknowledge messages in the BatchTimeChecker thread in addition to the source receiver thread. Is that only because you are using a sync consumer or is this some feature of HornetQ? If you had used a MessageListener would you still have been able to acknowledge the message in another thread besides the delivery thread?

            • 3. Re: Processing messages in Batches
              timfox

              The key point here, there is never more than one thread accessing the session at any one time.