11 Replies Latest reply on Sep 22, 2009 9:33 AM by clebert.suconic

    Scheduling of messages versus, say Quartz

    genman

      I am designing an application to potentially poll information from millions of users, approximately once every ten minutes to an hour.

      It seems that EJB 3 timers don't really scale out that well and rely on using a singleton. And seeing this bug, I figure so much for that approach, at least under JBoss: EJBTHREE-1330.

      The idea I had was to schedule the first poll operation using JMS, do the polling in an MDB, then enqueue a new scheduled message, rescheduling the next job. It seems like it'd work better in the end than an EJB timer anyway, as it's distributed.

      But I'm not familiar with how HornetQ might behave dealing with millions of these guys. (Are scheduled messages paged out, unlike JBossMQ?)

      Would Quartz be better for this type of work? Seems it's not distributed, either: QUARTZ-722 .

        • 1. Re: Scheduling of messages versus, say Quartz
          ataylor

          HornetQ does page messages but they are paged at routing time. This means that if a scheduled message is paged it won't be routed to the queue until it is depaged, therefore schedule time can't be guaranteed, basically if the schedule time arrives and the message is still paged it won't be delivered. This is ok by the JMS spec since it is only a best effort for the delivery.

          Depending on how you schedule messages it may work, if the messages arrive in order of time scheduled then it should be ok.

          • 2. Re: Scheduling of messages versus, say Quartz
            timfox

             

            "ataylor" wrote:
            HornetQ does page messages but they are paged at routing time. This means that if a scheduled message is paged it won't be routed to the queue until it is depaged, therefore schedule time can't be guaranteed, basically if the schedule time arrives and the message is still paged it won't be delivered. This is ok by the JMS spec since it is only a best effort for the delivery.


            Scheduled delivery is not part of the JMS spec. It's above and beyond JMS functionality.

            The description says a message is scheduled every 10 mins to one hour, but then says there might be millions of scheduled messages, which seems to be contradictory.
            I don't really understand what is being described here.

            • 3. Re: Scheduling of messages versus, say Quartz
              genman

              I said that a million users are polled every 10 minutes to an hour. As part of the polling operation for a single user, a new scheduled message is enqueued to fire off again 10 minutes to an hour in the future.

              This wouldn't work with JBoss's EJB timer implementation.

              It would work with Quartz, but Quartz lacks scalability. I would like these polling operations to be distributed as best as possible.

              I take it based on ataylor's comment that the paging algorithm only de-pages based on the message ordering. This would be probably okay for my application, but might result in a situation where sooner-to-be delivered are held up. In which case, could the depage algorithm someone be altered or improved to depage based on delivery timestamp?

              • 4. Re: Scheduling of messages versus, say Quartz
                clebert.suconic

                 

                This would be probably okay for my application, but might result in a situation where sooner-to-be delivered are held up. In which case, could the depage algorithm someone be altered or improved to depage based on delivery timestamp?



                The page algorithm is based on memory. When you remove messages from the memory (by expiry or ACK), more messages are depaged.


                So, I would say you would be okay if you ACK messages properly.

                • 5. Re: Scheduling of messages versus, say Quartz
                  genman

                  I understand that paging is done by memory but the choice of which messages to page out first must either be random or ordered, e.g. by size, age, or other quality. I'm wondering if this could be controlled somehow.

                  • 6. Re: Scheduling of messages versus, say Quartz
                    timfox

                    I'm still not clear if you're creating a new scheduled message per *user*, or just one every 10 minutes.

                    If you're creating millions of scheduled messages that won't scale irrespective of whether the messages are paged or not.

                    To be honest the whole use case has a bad smell about it to me. But I guess I don't understand the business level use case enough to recommend a better approach.

                    • 7. Re: Scheduling of messages versus, say Quartz
                      genman

                      Not to disclose too many details, the idea is we're polling potentially million of users every ten minutes or so. For what I can't discuss...

                      One alternative solution to using JMS is to simply store every user in a database table. Then you create a method to divide the work by selecting N rows per machine. I've done this before but it's not a great solution because relational databases don't really scale and it isn't easy to fairly divide work up in a coordinated fashion. Work sharing is hard.

                      Anyway, once you do your SQL select and get N million rows, ultimately going to map each row into a separate transaction. And handling lots of little transactions is a good use case for JMS. So why not stick to JMS for everything?

                      (Admittedly, it's a fundamentally wrong approach to poll N million users. Push notification originating from N million users is a lot better design.)

                      Wrong design or not, I think HorentQ *should* be intelligent about which messages are paged in/out. If it pages out messages that have scheduled delivery next year after ones currently waiting on the queue, that's wrong.

                      • 8. Re: Scheduling of messages versus, say Quartz
                        ataylor

                        How would you decide which paged messages should be paged in or out. Yes in your example you could assume that if a message was scheduled for next year it should be paged in but not back out for some considerable time.

                        However lets say a message was scheduled for 10 minutes time, if at the point the message is routed no paging has occured, it is routed to the queue. Consider then an influx of new messages arrive and paging occurs, there could be non scheduled messages paged that could have been routed.

                        Alternatively, lets say the scheduled message is paged before it is routed and then at some point depaging occurs, how can you determine if the message should be depaged, you don't know ho long it will be before the the queue is drained enough for the next depaging event to occur. It could be 9 minutes which is fine you can depage it then, but it could be an hour or even longer.

                        • 9. Re: Scheduling of messages versus, say Quartz
                          genman

                          Perhaps there isn't a strategy that fits every possible use case.

                          So I guess I'm wondering if there a way to configure or program my own paging strategy such that:

                          0. Regular messages are paged in in order of priority and/or age
                          1. Scheduled messages are always depaged last, in order of age (newest first)

                          Now if there's a limitation to how the page file is handled, I'd like to know.

                          • 10. Re: Scheduling of messages versus, say Quartz
                            genman

                            Short of downloading and trying to figure out the code, is there any documentation on how the page algorithm was put together? I suppose I could just examine the code, though.

                            • 11. Re: Scheduling of messages versus, say Quartz
                              clebert.suconic

                              I wrote this WIKI page during development. (It was still called JBM back then).

                              http://www.jboss.org/community/wiki/JBossMessaging2Paging

                              I should probably get that info and place it as javadoc.

                              Some of the facts probably have changed... like we don't have global page any more.