7 Replies Latest reply on Nov 6, 2009 2:14 AM by kukeltje

    JobScheduler going crazy - ignoring wait

      Hi guys,

      I've got the same code (EXACTLY the same code) running on my windows machine, and on Ubuntu.

      On windows, my java version is 1.6.0_13 on ubuntu it's 1.6.0_06.

      My problem is that even using identical code, the scheduler on the ubuntu box goes crazy.

      Here's my debug logs from the windows machine:

      2009-08-27 14:35:37,947 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.GetNextDueDateCmd - getting next due date...
      2009-08-27 14:35:37,947 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.GetNextDueDateCmd - next due date is 2009-08-28 09:31:00.0
      2009-08-27 14:35:37,947 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.DispatcherThread - DispatcherThread will wait for max 5000ms on org.jbpm.pvm.internal.jobexecutor.JobExecutor@127e2ee
      2009-08-27 14:35:42,947 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.DispatcherThread - DispatcherThread woke up
      2009-08-27 14:35:42,947 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.AcquireJobsCmd - start querying first acquirable job...
      2009-08-27 14:35:42,947 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.AcquireJobsCmd - locking jobs []
      2009-08-27 14:35:42,947 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.GetNextDueDateCmd - getting next due date...
      2009-08-27 14:35:42,947 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.GetNextDueDateCmd - next due date is 2009-08-28 09:31:00.0
      2009-08-27 14:35:42,947 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.DispatcherThread - DispatcherThread will wait for max 5000ms on org.jbpm.pvm.internal.jobexecutor.JobExecutor@127e2ee
      2009-08-27 14:35:47,947 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.DispatcherThread - DispatcherThread woke up
      2009-08-27 14:35:47,947 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.AcquireJobsCmd - start querying first acquirable job...
      2009-08-27 14:35:47,947 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.AcquireJobsCmd - locking jobs []
      2009-08-27 14:35:47,947 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.GetNextDueDateCmd - getting next due date...
      2009-08-27 14:35:47,947 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.GetNextDueDateCmd - next due date is 2009-08-28 09:31:00.0
      2009-08-27 14:35:47,947 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.DispatcherThread - DispatcherThread will wait for max 5000ms on org.jbpm.pvm.internal.jobexecutor.JobExecutor@127e2ee
      2009-08-27 14:35:52,947 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.DispatcherThread - DispatcherThread woke up
      


      As you can see, it is running the default three threads, with each one waiting politely for around 5 seconds.

      Here's the log from the ubuntu box:

      2009-08-27 14:25:26,438 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.GetNextDueDateCmd - getting next due date...
      2009-08-27 14:25:26,439 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.GetNextDueDateCmd - next due date is 2009-08-17 10:48:36.0
      2009-08-27 14:25:26,439 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.AcquireJobsCmd - start querying first acquirable job...
      2009-08-27 14:25:26,439 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.AcquireJobsCmd - locking jobs []
      2009-08-27 14:25:26,439 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.GetNextDueDateCmd - getting next due date...
      2009-08-27 14:25:26,439 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.GetNextDueDateCmd - next due date is 2009-08-17 10:48:36.0
      2009-08-27 14:25:26,439 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.AcquireJobsCmd - start querying first acquirable job...
      2009-08-27 14:25:26,440 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.AcquireJobsCmd - locking jobs []
      2009-08-27 14:25:26,440 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.GetNextDueDateCmd - getting next due date...
      2009-08-27 14:25:26,440 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.GetNextDueDateCmd - next due date is 2009-08-17 10:48:36.0
      2009-08-27 14:25:26,440 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.AcquireJobsCmd - start querying first acquirable job...
      2009-08-27 14:25:26,440 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.AcquireJobsCmd - locking jobs []
      2009-08-27 14:25:26,440 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.GetNextDueDateCmd - getting next due date...
      2009-08-27 14:25:26,440 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.GetNextDueDateCmd - next due date is 2009-08-17 10:48:36.0
      2009-08-27 14:25:26,440 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.AcquireJobsCmd - start querying first acquirable job...
      2009-08-27 14:25:26,441 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.AcquireJobsCmd - locking jobs []
      2009-08-27 14:25:26,441 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.GetNextDueDateCmd - getting next due date...
      2009-08-27 14:25:26,441 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.GetNextDueDateCmd - next due date is 2009-08-17 10:48:36.0
      2009-08-27 14:25:26,441 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.AcquireJobsCmd - start querying first acquirable job...
      2009-08-27 14:25:26,442 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.AcquireJobsCmd - locking jobs []
      2009-08-27 14:25:26,442 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.GetNextDueDateCmd - getting next due date...
      2009-08-27 14:25:26,442 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.GetNextDueDateCmd - next due date is 2009-08-17 10:48:36.0
      2009-08-27 14:25:26,442 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.AcquireJobsCmd - start querying first acquirable job...
      2009-08-27 14:25:26,443 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.AcquireJobsCmd - locking jobs []
      2009-08-27 14:25:26,443 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.GetNextDueDateCmd - getting next due date...
      2009-08-27 14:25:26,443 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.GetNextDueDateCmd - next due date is 2009-08-17 10:48:36.0
      2009-08-27 14:25:26,443 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.AcquireJobsCmd - start querying first acquirable job...
      2009-08-27 14:25:26,443 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.AcquireJobsCmd - locking jobs []
      2009-08-27 14:25:26,443 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.GetNextDueDateCmd - getting next due date...
      2009-08-27 14:25:26,444 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.GetNextDueDateCmd - next due date is 2009-08-17 10:48:36.0
      2009-08-27 14:25:26,444 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.AcquireJobsCmd - start querying first acquirable job...
      2009-08-27 14:25:26,444 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.AcquireJobsCmd - locking jobs []
      2009-08-27 14:25:26,444 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.GetNextDueDateCmd - getting next due date...
      2009-08-27 14:25:26,444 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.GetNextDueDateCmd - next due date is 2009-08-17 10:48:36.0
      2009-08-27 14:25:26,444 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.AcquireJobsCmd - start querying first acquirable job...
      2009-08-27 14:25:26,444 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.AcquireJobsCmd - locking jobs []
      2009-08-27 14:25:26,444 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.GetNextDueDateCmd - getting next due date...
      2009-08-27 14:25:26,445 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.GetNextDueDateCmd - next due date is 2009-08-17 10:48:36.0
      2009-08-27 14:25:26,445 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.AcquireJobsCmd - start querying first acquirable job...
      2009-08-27 14:25:26,445 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.AcquireJobsCmd - locking jobs []
      2009-08-27 14:25:26,445 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.GetNextDueDateCmd - getting next due date...
      2009-08-27 14:25:26,445 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.GetNextDueDateCmd - next due date is 2009-08-17 10:48:36.0
      2009-08-27 14:25:26,445 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.AcquireJobsCmd - start querying first acquirable job...
      2009-08-27 14:25:26,446 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.AcquireJobsCmd - locking jobs []
      2009-08-27 14:25:26,446 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.GetNextDueDateCmd - getting next due date...
      2009-08-27 14:25:26,446 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.GetNextDueDateCmd - next due date is 2009-08-17 10:48:36.0
      2009-08-27 14:25:26,446 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.AcquireJobsCmd - start querying first acquirable job...
      2009-08-27 14:25:26,446 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.AcquireJobsCmd - locking jobs []
      2009-08-27 14:25:26,446 [DispatcherThread] DEBUG org.jbpm.pvm.internal.jobexecutor.GetNextDueDateCmd - getting next due date...
      


      It's not waiting!

      The only difference between the two is that on windows I'm running this in Eclipse, and on the Ubuntu box it's nicely jar'd up etc. However, all the required files are still on the classpath, and every other aspect of the workflow engine runs like clockwork.

      If you have any ideas, I'd be very grateful.

      Thanks.

      - Rachel (Kit)

        • 1. Re: JobScheduler going crazy - ignoring wait

          OK so I found the issue.

          After an unsafe shutdown, if a job is in the "Aquired" state in the DB, then the job executor will look for the next job to process and assume that a thread is currently dealing with that Aquired job.

          So, basically, I had to delete that job (or I guess I could have set it back to waiting), and the problem went away. Good thing this is a development environment.

          I'm going to update my shutdown code to correctly shutdown the job scheduler, and add a checker for this on startup.

          • 2. Re: JobScheduler going crazy - ignoring wait
            kukeltje

            Could you keep us posted? Afaik, jBPM should be able to recover from this in a specific amount of time. If this is not your experience, please report that here to.

            • 3. Re: JobScheduler going crazy - ignoring wait

              I'll try to re-create the problem and if I can I'll post a full bug report. I'll let you know next week if I've managed to repeat the problem or not.

              • 4. Re: JobScheduler going crazy - ignoring wait

                Actually, I'll wait for 4.1 (due 1/9 I believe) and make sure I can reproduce it with that version.

                • 5. Re: JobScheduler going crazy - ignoring wait
                  kukeltje

                  ok, we'll wait :-)

                  • 6. Re: JobScheduler going crazy - ignoring wait

                    Ok - so now I'm wondering if I need to report it or if it should be more of an administration and monitoring hassle.

                    Here's how to re-create it - if you think I should do a full report with code I will, otherwise perhaps someone can suggest a good administration strategy for this other than the most horrible "do a select every now and again to check".

                    So, here's the JPDL file:

                    <process name="Dummy" xmlns="http://jbpm.org/4.0/jpdl">
                    <start name="a">
                    <transition to="b"/>
                    </start>
                    <state name="b">
                    <on event="timeout">
                     <timer duedate="10 seconds"/>
                     <event-listener class="rachels.test.actionClasses.Sleep">
                     </event-listener>
                    </on>
                    <transition to="c" name="toc">
                    </transition>
                    </state>
                    <end name="c"/>
                    </process>
                    


                    The sleep class simply sleeps for 30s.

                    So, if you start a process instance of Dummy, after 10 seconds the Sleep class will do its thing.

                    If you kill the jvm mid-sleep, the job has already been acquired, but hasn't completed.

                    So, something was fixed in the code in 4.1 so now the job scheduler isn't going crazy which is nice.

                    BUT - this job is sitting there acquired and not necessarily completed in the DB.


                    • 7. Re: JobScheduler going crazy - ignoring wait
                      kukeltje

                      The jobscheduler has a second thread that exactly does what you 'propose': do a select every now and then and check. So it should be released after the max_job_lock period (or whatever it is called). Please search a little in the docs, the config or even code to see what it actually is (do not have any of it at hand here)