14 Replies Latest reply on Apr 21, 2006 7:49 AM by Mark Waschkowski

    selector performance

    Mark Waschkowski Newbie

      Hi,

      I'm going to be placing a large number of messages in a queue (100,000+) and then processing them one by one, and would like to allow the user to see their number of outstanding messages in the queue. There may be other people's messages in the queue as well, so total size of the queue doesn't help here.

      I know from:
      http://wiki.jboss.org/wiki/Wiki.jsp?page=IGetSlowPerformanceWithMessageSelectors
      that JBossMQ will go slowly when using message selectors with a large number of messages in the queue because it uses the 'read and skip' approach, essentially going through each one, one by one.

      Is JBoss Messaging going to exhibt the same behavior or will I be able to get a fast count of the number of messages in the queue using a message selector (or something similiar)?

      Thanks,

      Mark

        • 1. Re: selector performance
          Tim Fox Master

          Hi Mark-

          There is an outstanding task to implement indexes over JBossMessaging queues in a similar way to that suggested for JBoss MQ.

          http://jira.jboss.com/jira/browse/JBMESSAGING-275

          In the current release this is not implemented so I would currently expect similar performance to JBoss MQ for selectors that don't match many messages.

          However I am not sure I fully understand how you are intending to use a selector to count the number of messages in a queue.

          Could you explain this in more detail please?

          Perhaps if I understand your use case better I can suggest a more performant way to fulfill it.

          • 2. Re: selector performance
            Mark Waschkowski Newbie

            Hi Tim,

            Ah yes, optimize the indexes, ok. That would certainly help, and would be a nice feature to have.

            My intention? I was planning on using a QueueBrowser with a selector to get only the messages I was interested in and counting them (as per 5.9 of the jms spec). However, I noticed (from my performance tests) that it takes as long to read a queue to get a count as it does to place the messages there in the first place! Not behavior I was expecting, but that is the way things work when using the QueueBrowser, apparently commonly from what I found out.

            I believe there may be other implementation specific ways of getting around this, but am unsure of the specifics on this.

            If you know of another way to accomplish this (specific to JBoss or more generally) I would love to hear it!

            Because this was simply not feasible for a queue with a large number of messages in it (it would just take too long), I was planning on a different approach. One that involves me putting in 'counter' messages in every batch of 1000 that would be handled differently - they would update the database and indicate that a batch of 1000 was done. The only problem is ordering - my understanding is (according to the spec) if I put, as part of one transaction, 999 'real' messages and another 'counter' message, that a consumer would treat that particular batch serially (ie. in order) so that the counter message should be handled at the end of that batch, and I could then update my database and say that I've handled 999 messages. Even if the implementation isn't strict on this point, that would still be ok, as long as, in the aggregate, it was somewhat accurate. I haven't tested this approach yet, so maybe you could let me know if this would make sense or not...

            Thanks for any input!

            Mark

            • 3. Re: selector performance
              Ovidiu Feodorov Master

              I am a little bit intrigued by this statement

              mwaschkowski wrote:
              I noticed (from my performance tests) that it takes as long to read a queue to get a count as it does to place the messages there in the first place!


              Have you experienced this with Messaging, for a fact? I would be surprised if this proves to be true, since adding a message to a queue is an O(1) operation, while browsing the queue is O(n).

              • 4. Re: selector performance
                Tim Fox Master

                Actually, this is to be expected.

                Coming back to to your use case. From my understanding your application needs to read 1000 messages from the queue, process them, then write a record into the database to say that a batch of 1000 is complete?

                If so, then why not just count the messages as you process them in the consumer and write the row into the db saying the batch is complete when you reach the count.

                Not sure why you need to insert counter messages...

                • 5. Re: selector performance
                  Mark Waschkowski Newbie

                  Actually, I produce a batch of a 1000 at a time, but I am going to consume each message one by one. I can only optimize from the production side...

                  Mark

                  PS. Ovidiu, my performance tests were with MQ, not Messaging, although it sounds like the same thing is going to happen with Messaging

                  • 6. Re: selector performance
                    Mark Waschkowski Newbie

                    As well, I was hoping to avoid 100,000 database hits, one with each consumption of a message...

                    Mark

                    • 7. Re: selector performance
                      Tim Fox Master

                      Sorry still don't follow.

                      My would you have 100 000 database hits?

                      Could you explain your use case again?

                      • 8. Re: selector performance
                        Mark Waschkowski Newbie

                        Sorry Tim, let me summarize.

                        Basically, I would like to place a bunch of messages onto a queue and then process them one by one and, in an alternate reality, write an sql statement like:
                        select * from jms where id=x and processing_status = pending
                        whenever a user visited a web page just to let them know how far along the job was. Of course, JMS doesn't work this way, but this is the idea.

                        Now, here are my specific implementation details:
                        1) Going to be putting 100k+ messages onto a queue
                        2) Going to be putting the 100k+ messages onto the queue in batches of, say, 1000. Each batch of 1000 is going to be part of a single JMS transaction
                        3) There are going to be listeners on the queue that will process each of the messages from the batches one by one
                        ie. the listeners on the queue will not be optimized to handle mutliple messages at once, the listeners will handle each message individually (and this is required, I can't change this)
                        4) I would like to be able to be able to monitor the overall progress of the processing of the 100k+ messages

                        4 was the problem because JMS doesn't act like a RDBMS and can't quickly give me a count of something based on a criteria, at least through the standard JMS API, although different implementations could allow for it (through optimizations, proprietary means etc.).

                        Of course, I could just have each of the listeners update the database and increment a counter as they process each message. However, I don't really want to do this as it takes a lot of processing time for such a fine grained update, so thats why I was going along the lines of just updating the counter every 1000 messages. As well, in the future, the 100k+ messages could turn out to be 1M+ messages, and when you have multiple clients each running jobs simultaneously you will end up with a huge number of database hits just to increment a counter! Just doesn't smell right to me...but, I could be confused or missing something too.

                        Any suggestions welcome!

                        Thanks,

                        Mark

                        • 9. Re: selector performance
                          Tim Fox Master

                          I agree - updating a db on processing of each message doesn't sound a very scaleable solution.

                          Having a consumer update the db after having processed 1000 messages/sec seems somewhat better but I still don't understand why you'd need a special "counter message" to do this - can the consumer not keep count itself?

                          Another solution would be to send a special "progress" message to another queue when the listener has processed 1000 or 10000 messages (or whatever).

                          You're application could then inspect the "progress" queue to see the current progress. The progress messages could perhaps be non-persistent (depending on your application requirements) so not requiring any db hits (unless you have a lot of them)

                          • 10. Re: selector performance
                            Tim Fox Master

                            On a related note, "count" functionality is almost always an expensive operation - especially in highly scaleable systems.

                            This is even true for RDBMS's, since often counts often have to be re-calculated each time (often by table scan), since maintaining counters for the table requires locking which is a barrier to scalability.

                            • 11. Re: selector performance
                              Mark Waschkowski Newbie

                              >I agree - updating a db on processing of each message doesn't sound a very >scaleable solution.
                              OK, good.

                              >Having a consumer update the db after having processed 1000 >messages/sec seems somewhat better but I still don't understand why >you'd need a special "counter message" to do this - can the consumer >not keep count itself?
                              Well, I'm not sure about that. The consumer will be handling messages from various jobs, and messages from different jobs may be posted simultaneously. Are you suggesting that the consumer hold state and keep a map along the lines of ["job id", count] and then when count reaches a certain size, then do a counter update? If so, how would the listener know when all the messages for a particular job were done?
                              ie. the listener (with a threshold of 1000) got to a count of 812 (for example) and then there were no more messages...

                              >Another solution would be to send a special "progress" message to >another queue when the listener has processed 1000 or 10000 messages >(or whatever).
                              Yes, this would be fine and is similiar to a db update.

                              • 12. Re: selector performance
                                Mark Waschkowski Newbie

                                 


                                On a related note, "count" functionality is almost always an expensive operation - especially in highly scaleable systems.

                                This is even true for RDBMS's, since often counts often have to be re-calculated each time (often by table scan), since maintaining counters for the table requires locking which is a barrier to scalability.


                                Interesting. Table scans == bad. At least there we have the option of indexes. I wish I could do some work on putting indexes into the JBoss Messaging system. Maybe someday.

                                Looking forward to hearing what you have to say about the consumer handling the message count...

                                Best,

                                Mark


                                • 13. Re: selector performance
                                  Tim Fox Master

                                  If you are using a pool of MDBs to process the messages from the queue, then there is no guarantee each message is processed in the order they exist in the queue (unless you set the pool size to 1).

                                  Also ordering is not guaranteed after failure. So if one of your MDBs fails, or the server goes down then the messages that had been delivered but not acknowledged yet will be cancelled and go back on the front of the queue, potentially changing the order.

                                  So relying on the position of a counter message seems risky to me.

                                  • 14. Re: selector performance
                                    Mark Waschkowski Newbie

                                    Yes, I understand. It is a risky/implementation dependent.

                                    Well, I guess I will have each listener update the database with the results, doesn't sound like there is any way around this, not...

                                    Thanks for the suggestions,

                                    Mark