Processing messages in Batches
steffi Oct 24, 2009 11:38 PMCurrently every message I take off the queue results in an insert/commit to the database. The idea is to limit these by processing the messages in batches and using JDBC batch updates to limit the number of commits to the database. Whenever you process things in batches that are coming from some input sometimes you don't want partial batches to have to wait before messages that make it a full batch to arrive. Therefore you implement some kind of timer such that if enough time as passed and a message hasn't arrived and your batch is still not full then you flush your batch. I'm keen to get feedback on this approach. Here's what I currently do and it's probably excessive.
In order to process things in batch I simply put them in a list until the mod my batch size with the size of the list is 0. That's when I flush my batch by calling processBatch which cycles thru and causes a JDBC batch update to occur.
Whenever I receive a message I start a ScheduledExecutor that will flush the batch (regardless of it's size) in 10 seconds. If no message arrives in 10 seconds the partial batch will be flushed. In order to prevent the execution of many of these for every message I receive if I notice that the time b/w when I created my scheduled task and the time when the next message arrives is say less than 1 seconds (suggesting I'm seeing a burst of messages) then I simply cancel the scheduledtask because it's very likely that enough messages that will make up a full batch will arrive in order to flush the batch. Anytime I have received enough messages that take me into a full batch I immediately cancel the scheduled task because I now have a full batch and don't need to rely on the schedule task for it to be flushed.
Anyway I think you get the idea. The general idea is to be able to still process messages relatively quickly when the incoming rate is slow but yet still optimally minimize database activity when the rate is fast.
Is there a better way to accomplish these objectives?
Flushing the messages entails performing a JDBC update and also acknowledging the last message in the batch since that implicitly acknowledges all prior unacknowledge messages.