1 Reply Latest reply on Jun 12, 2017 12:08 PM by cfang

    JobOperator.start(jobname, jobparms) hangs

    richardmoore

      We had a production issue where jobs hung for 2 hours at the JobOperator.start(jobname, jobparms) statement. I checked the job tables to see if it created a jobinstanceid but hadn't -

       

      select ji.jobname

      , je.*

      from job_instance ji

      , job_execution je

      where je.jobinstanceid = ji.jobinstanceid

      and je.createtime between '2017-06-10 16:59:00' and '2017-06-10 20:00:00'

      order by je.createtime;

       

      We killed the linux processes (I verified they had stopped, in fact, that there were no other javabatch jobs running) and tried restarting them several times over the 2 hour period and then they started running. Is there a way to put a timeout on this that if the jobinstanceid is not able to be created with X seconds then give up?

       

      We have not been able to reproduce this issue, but getting a timeout would be great.

        • 1. Re: JobOperator.start(jobname, jobparms) hangs
          cfang

          there are many factors that can cause the delay in starting a batch job execution: OS, JVM, db connection, thread availability.  Some of them are out of the control of JBeret.  In some case, when it hangs, there is nothing JBeret can do since JBeret process itself is hanging.  So I think it's easier to implement the timeout at the client side, and the client side can measure the duration of a specific parts of execution and criteria, whereas even if we implement a timeout mechamism, it wil probably be too generic to fit your use case.