7 Replies Latest reply on Oct 20, 2009 1:33 PM by kapitanpetko

    An asynchronous singleton processing component?

    fraktalek
      Hi, I'm trying to solve the following problem, to no end so far... I'm still quite new to Seam and Java EE so I might be overlooking the solution.

      Clients produce data which I need to process sequentially and asynchronously. Asynchronously because the processing may take a long time and clients can't be forced to wait until it's done. Sequentially because there are interdependencies - if two update tasks ran concurrently then something might be missed. The order is not so important though.

      The updates need not be persistent - in case of a server crash there is a way of computing what might have been missed but it would be quite costly which is why I can't use it for incremental online processing.

      So, when it comes to asynchronicity, Seam seems to offer asynchronous events (catched via @Observer), messaging with JMS and MDBs, and the timer service.

      I don't think I can use asynchronous events because first I would have to make the @Observ-ing method synchronized which is something I shouldn't do in a JavaEE environment, right? And second, even if I made it synchronized, the processing of one update may take a long time, definitely and easily longer than the default synchronized timeout (1000ms) and it just doesn't seem correct to set the timeout to some big number (it's almost impossible to estimate the upper bound of the processing time, if there is some).

      Because I can't estimate the processing time I can't really schedule the updates using a timer service. And I don't want to persist them and run the timer in regular intervals to check if there is some update pending because it could hinder performance (some updates may take a long time, but I think that usually it will be quite fast).

      The problem with JMS and MDBs is that MDBs are meant for concurrent processing which is something I do not want. I want to process the updates serially and there doesn't seem to be any clean way of imposing this constraint on an MDB.

      So I thought of using JMS without MDBs - create a JMS Queue, one Seam POJO component as a JSM producer, another Seam POJO component as a JSM consumer. I am not using MDBs so I can be sure about the number of threads that are used (possibly many for producers, but only one consumer) and because it is POJOs I can be sure that the container won't create instances as it likes, right? (as it would in the case of MDBs)
      This kind of works, but unfortunately only kind-of. The problem is probably the call of the consumer's onMessage() method which seams to be outside of application context so injection of entity manager does not work (entity manager is null and if I try to get it via entityManager = (EntityManager) Component.getInstance("entityManager") then I get an IllegalStateException that there is no active application context (if I remember correctly, sidenote: I made both the producer and consumer an application-scoped seam component). So one question would be how to get my hands on entityManager from inside of onMessage()? A JNDI lookup of EntityManagerFactory? But this probably would not be sufficient anyway because from the onMessage I need to call other Seam components which do the actual processing (and to do that they use the database heavily). How comes that there's no active application context? I assume that it's because of the asynchronous call made by the JMS provider (JBoss) but isn't this a "bug" in Seam? I can inject seam components in MDBs, right? So why shouldn't it be possible in a simple JMS consumer?

      I think this can't be so uncommon use-case so how is it usually solved? Thanks in advance for any insight.
        • 1. Re: An asynchronous singleton processing component?
          kapitanpetko

          You could use an @Asynchronous method. That will run in a separate thread, in the background, and can take as long as it needs to.
          If you are really worried about background jobs having an impact on your app, you should use a separate server for batch processing.


          Not sure if your solution is optimal, but you can setup the context you need to get the EntityManager by calling Lifecyle.beginCall()/endCall(). If you decide to go the POJO JMS way, JBSEAM-4237 might be helpful.


          To use regular MDB's, you should be able to configure your app server to only create one MDB per queue (not 100% sure about this, though), so that you can guarantee sequential processing. Or, simply, make sure you only send a message after the previous one has been processed: check the DB, send a reply message after processing is done, etc.


          HTH



          • 2. Re: An asynchronous singleton processing component?
            fraktalek

            Thanks for the comments. I chose to go the @Asynchronous way at least for now. But this requires me to make the @Asynchronous method synchronized if I do not want the tasks to run in parallel,right? At the same time I don't want to make the whole component synchronized.


            It would be very nice to see the JBSEAM-4237 implemented in the standard distribution.

            • 3. Re: An asynchronous singleton processing component?
              kapitanpetko

              Jakub Kotowski wrote on Oct 19, 2009 17:51:


              Thanks for the comments. I chose to go the @Asynchronous way at least for now. But this requires me to make the @Asynchronous method synchronized if I do not want the tasks to run in parallel,right? At the same time I don't want to make the whole component synchronized.


              If you call an @Asynchronous method multiple times, multiple background threads will be created. Making it synchronized will change nothing and should probably be avoided anyway. If you want to make sure that the next job doesn't start before the previous one has finished, one way is to use Quartz's StatefulJob. That is not nativley supported by Seam, so you will need to integrate it. Another way is to use your own checks, like a flag in the database.



              It would be very nice to see the JBSEAM-4237 implemented in the standard distribution.


              Please vote if you haven't already.


              • 4. Re: An asynchronous singleton processing component?
                fraktalek

                If you call an @Asynchronous method multiple times, multiple background threads will be created. Making it synchronized will change nothing and should probably be avoided anyway.


                Why should change it nothing? It must work at least to synchronize the threads, why shouldn't it? And I know it should be avoided, that's why I started exploring all the other options.



                Another way is to use your own checks, like a flag in the database.

                But I can hardly use my own checks without some kind of synchronization, right? And it doesn't matter if it's stored in the database or not.



                If you want to make sure that the next job doesn't start before the previous one has finished, one way is to use Quartz's StatefulJob. That is not nativley supported by Seam, so you will need to integrate it.


                I've already come to the conclusion that what I need is not really supported so I'll need to work around it somehow. I chose @Asynchronous with synchronized because it seems to be the easiest to implement... I may later try to change it to the JMS/POJO way, or perhaps the Quartz's StatefulJob - but the integration you're mentioning probably also means using the Lifecycle begind and end calls, right?



                Please vote if you haven't already.

                I have already :) I'm thinking about asking my colleagues to vote for it too.


                • 5. Re: An asynchronous singleton processing component?
                  kapitanpetko

                  Jakub Kotowski wrote on Oct 20, 2009 10:42:



                  If you call an @Asynchronous method multiple times, multiple background threads will be created. Making it synchronized will change nothing and should probably be avoided anyway.


                  Why should change it nothing? It must work at least to synchronize the threads, why shouldn't it? And I know it should be avoided, that's why I started exploring all the other options.



                  What I meant was, that even if you put synchronized on the async method, new jobs will be started. Every job instance gets its own component instance, so synchronized doesn't really help here. Unless, of course, you have an application scope component, then you will
                  get the same instance every time.




                  Another way is to use your own checks, like a flag in the database.

                  But I can hardly use my own checks without some kind of synchronization, right? And it doesn't matter if it's stored in the database or not.


                  Instead of synchronization, you could simply exit if the previous job has not finished yet. Something like:


                  @Asynchronous
                  public void doWork() {
                    boolean isWorking = getIsWorkingFromDb();
                  
                    if (isWorking) {
                      return;
                    }
                  
                    setIsWorkingToDb(true);
                    // real work goes here
                    setIsWorkingToDb(false);
                  }
                  




                  I've already come to the conclusion that what I need is not really supported so I'll need to work around it somehow. I chose @Asynchronous with synchronized because it seems to be the easiest to implement... I may later try to change it to the JMS/POJO way, or perhaps the Quartz's StatefulJob - but the integration you're mentioning probably also means using the Lifecycle begind and end calls, right?


                  I was thinking more in the lines of overriding the QuartzDispatcher or scheduling your jobs directly with Quartz. Neither is too hard to do.




                  Please vote if you haven't already.

                  I have already :) I'm thinking about asking my colleagues to vote for it too.



                  Thanks. Btw, you can use the component in JBSEAM-4237 as is, just drop it one of your packages and configure components.xml

                  • 6. Re: An asynchronous singleton processing component?
                    fraktalek

                    Nikolay Elenkov wrote on Oct 20, 2009 11:08:



                    Jakub Kotowski wrote on Oct 20, 2009 10:42:



                    If you call an @Asynchronous method multiple times, multiple background threads will be created. Making it synchronized will change nothing and should probably be avoided anyway.


                    Why should change it nothing? It must work at least to synchronize the threads, why shouldn't it? And I know it should be avoided, that's why I started exploring all the other options.



                    What I meant was, that even if you put synchronized on the async method, new jobs will be started. Every job instance gets its own component instance, so synchronized doesn't really help here. Unless, of course, you have an application scope component, then you will
                    get the same instance every time.


                    Yes, the component is application-scoped.





                    Another way is to use your own checks, like a flag in the database.

                    But I can hardly use my own checks without some kind of synchronization, right? And it doesn't matter if it's stored in the database or not.


                    Instead of synchronization, you could simply exit if the previous job has not finished yet. Something like:

                      @Asynchronous 
                      public void doWork() {
                    1    boolean isWorking = getIsWorkingFromDb();
                    2
                    3    if (isWorking) {
                    4      return;
                    5    }
                    6
                    7    setIsWorkingToDb(true);
                    8    // real work goes here
                    9    setIsWorkingToDb(false);
                      }
                    



                    Ok, let's see:



                    • Thread-1 enters doWork()

                    • Thread-1 retrieves isWorking, which is false, so

                    • Thread-1 evaluates 3 and skips the return

                    • on 6,i.e. before continuing  with 7, Thread-1 gets re-scheduled

                    • Thread-2 enters doWork()

                    • Thread-2 retrieves isWorking, which is still false

                    • Thread-2 evaluates 3 and skips the return

                    • then both Thread-1 and Thread-2 set isWorking to true and happily execute the real work in parallel




                    Where is the mistake in my reasoning?
                    Maybe this worst-case scenario is quite unlikely but it is not impossible.




                    I've already come to the conclusion that what I need is not really supported so I'll need to work around it somehow. I chose @Asynchronous with synchronized because it seems to be the easiest to implement... I may later try to change it to the JMS/POJO way, or perhaps the Quartz's StatefulJob - but the integration you're mentioning probably also means using the Lifecycle begind and end calls, right?


                    I was thinking more in the lines of overriding the QuartzDispatcher or scheduling your jobs directly with Quartz. Neither is too hard to do.

                    Hmm, I should explore this option.





                    Please vote if you haven't already.

                    I have already :) I'm thinking about asking my colleagues to vote for it too.



                    Thanks. Btw, you can use the component in JBSEAM-4237 as is, just drop it one of your packages and configure components.xml

                    Maybe I'll try this first then. Thanks.

                    • 7. Re: An asynchronous singleton processing component?
                      kapitanpetko

                      Jakub Kotowski wrote on Oct 20, 2009 11:49:




                        @Asynchronous 
                        public void doWork() {
                      1    boolean isWorking = getIsWorkingFromDb();
                      2
                      3    if (isWorking) {
                      4      return;
                      5    }
                      6
                      7    setIsWorkingToDb(true);
                      8    // real work goes here
                      9    setIsWorkingToDb(false);
                        }
                      



                      Ok, let's see:


                      • Thread-1 enters doWork()

                      • Thread-1 retrieves isWorking, which is false, so

                      • Thread-1 evaluates 3 and skips the return

                      • on 6,i.e. before continuing  with 7, Thread-1 gets re-scheduled

                      • Thread-2 enters doWork()

                      • Thread-2 retrieves isWorking, which is still false

                      • Thread-2 evaluates 3 and skips the return

                      • then both Thread-1 and Thread-2 set isWorking to true and happily execute the real work in parallel




                      Where is the mistake in my reasoning?
                      Maybe this worst-case scenario is quite unlikely but it is not impossible.



                      You are assuming that Thread-1 and Thread-2 are started (almost) simultaneously. If you are scheduling repeatable jobs that do some kind of processing, chances are they will repeat every few minutes. So by the time, Thread-2 (Job 2) is started, isWorking would have been updated in the database (you need to do this in a separate transaction, etc).
                      But then again, if there is a chance that your asynchronous method may invoked simultaneously by two separate clients, that probably won't work. 


                      There are other ways to achieve what you are trying to do with 'standard' JEE/Seam components, like JMS polling (only get a message from the queue, after you are done processing the previous one), but you might want to look at some specialized batch processing solutions (Spring Batch?).