5 Replies Latest reply on Feb 4, 2017 2:37 PM by Cheng Fang

    Job Recovery and Exception handling

    João Santana Newbie

      how should I handle the following situations in data migration context (i.e. etl):


      1. itemreader/writer connection error: For some reason the reader / writer can not connect to the data source (i.e. jdbc). The job should try to connect again (n times), then notify the administrator (i.e. email).
      2. itemreader can find a lot of data and the transaction may not support the time it takes...
      3. Can the job find out that your last execution was not successful? He needs to run it again ..


      Today I have a scenario of daily data loading in PLSQL, I would like to migrate to Java and take advantage of JEE

        • 1. Re: Job Recovery and Exception handling
          Cheng Fang Master

          1, In Java EE and appserver environment (e.g.,JBoss EAP or WildFly), resource management is done by the appserver, and JBeret just makes use of the managed resources such as datasources.  So the best place for datasource health monitoring is appserver.  I'd imagine there should already be some mechanism for monitoring/alerting the health of a datasource in appserver.


          2, you can configure the datasource used for ItemReader to be non-jta datasource, so reads will not participate in transaction, and so not subject to timeout.  Even for jta datasource, you can still configure the transaction timeout, either in the datasource configuration, or in JBeret (by setting


          javax.transaction.global.timeout={seconds} - default is 180 seconds Example:


          <step id="MyGlobalStep"> <properties>

          <property name="javax.transaction.global.timeout" value="600"/> </properties>




          3, Finding which job execution has failed should be the responsibility of the batch client, by doing something like querying the JobOperator.

          It takes a few steps:  first get JobInstances belonging to a job name, it returns a list of JobInstances, sorted latest first.  Then you get all JobExecutions beloning to the latest JobInstance, then check the status of these JobExecutions.


          public List<JobInstance> getJobInstances(String jobName, int start,
            int count)throws NoSuchJobException, JobSecurityException;


          public List<JobExecution> getJobExecutions(JobInstance instance) throws
             NoSuchJobInstanceException, JobSecurityException;


          jberet-rest-api module has API for doing that directly:


          • 2. Re: Job Recovery and Exception handling
            João Santana Newbie

            1. In this case I can take another advantage of JEE. Regarding database connection I understand that monitoring the pool is a great solution. But, after all, what I need is to monitor the jobs, if they were executed successfully and be notified when one of them can not finish their task.


            2. If it is just reading, I think it suits me to use the pool outside the jta.


            3. This is a very critical point in my solution, I really need to keep track of the executions of the jobs and notify an administrator of the problems encountered. Maybe keeping track of the job repository (jbdc) is a good idea?

            • 3. Re: Job Recovery and Exception handling
              Cheng Fang Master

              you may want to consider implementing JobListener to monitor job execution status.  you can do whatever suits you in beforeJob and afterJob methods, including alerting administrators upon failures, recording the failed job executions for restarting, etc.

              • 4. Re: Job Recovery and Exception handling
                João Santana Newbie

                good! its works!


                "recording the failed job executions for restarting" ??

                • 5. Re: Job Recovery and Exception handling
                  Cheng Fang Master

                  "recording the failed job executions for restarting" ??


                  if your app already have a place to store info, such as a messaging queue, or data grid, you can store the failed job execution ids in there for easy retrieval later when you retart it.  This is just for convenience.  This info can always be retrieved from job repository via JobOperator as shown in above post.