In JBeret there is an AbstractJobRepository that stores instances of jobs, job instances and job executions. In WildFly a JobRepository has the lifecycle of the batch subsystem. This means that for every batch job started the repository, no matter the storage type, the job, job instance and job execution will not be available for GC and live in memory until the server is reloaded or restarted. At some point we'd eventually end up with an OOME.
I think we have three options to solve this.
- Require the user to purge the jobs with org.jberet.repository.PurgeBatchlet. This may not be user friendly, but it could be setup in a scheduled EJB.
- In WildFly a new repository is created each time the job repository is requested. This will have some overhead in some job repositories like the JdbcJobRepository in that it will need to look up configuration files, check that the tables don't exist, reload the data via queries, etc.
- Fix the AbstractJobRepository in some way to not keep instances of every job or each job repository needs to implement it's own persistence for the jobs, job instances and the job executions. This would require the removal of the global maps that store the job information from the AbstractJobRepository. For something like the JdbcJobRepository some sort of caching maps could be used; eviction would need to happen at some point though.
I'm curious to see what others think of these solutions or if they have some other solutions. I'd personally lean towards option 3 as option 2 would be expensive when doing things like querying jobs. Option 1 would also be okay, but it puts the responsibility on the user to ensure jobs get purged.
For completeness here's the definition for the job instances and the job executions. A job itself is essentially an object that describes the data in the job XML.
A JobInstance refers to the concept of a logical job run. Let's consider a batch job that should be run once at the end of the day, such as the 'EndOfDay' job from the diagram above. There is one 'EndOfDay' Job, but each individual run of the Job must be tracked separately. In the case of this job, there will be one logical JobInstance per day. For example, there will be a January 1st run, and a January 2nd run. If the January 1st run fails the first time and is run again the next day, it is still the January 1st run. Usually this corresponds with the data it is processing as well, meaning the January 1st run processes data for January 1st, etc. Therefore, each JobInstance can have multiple executions (JobExecution is discussed in more detail below); one or many JobInstances corresponding to a particular Job can be running at a given time.
The definition of a JobInstance has absolutely no bearing on the data that will be loaded. It is entirely up to the ItemReader implementation used to determine how data will be loaded. For example, in the EndOfDay scenario, there may be a column on the data that indicates the 'effective date' or 'schedule date' to which the data belongs. So, the January 1st run would only load data from the 1st, and the January 2nd run would only use data from the 2nd. Because this determination will likely be a business decision, it is left up to the ItemReader to decide. What using the same JobInstance will determine, however, is whether or not the 'state' from previous executions will be available to the new run. Using a new JobInstance will mean 'start from the beginning' and using an existing instance will generally mean 'start from where you left off'.
A JobExecution refers to the technical concept of a single attempt to run a Job. Each time a job is started or restarted, a new JobExecution is created, belonging to the same JobInstance.
James R. Perkins