WildFly supports batch processing through its batch-jberet subsystem, which can be configured to use either a in-memory or jdbc job repository. The in-memory batch job repository, the default one, requires no extra configuration and allows for quick batch application development and deployment. For more advanced batch processing applications that call for persistence job data and access to shared job data between multiple WildFly instances, a jdbc job repository is required. In this blog post, I'll go through key steps how to configure and use a jdbc job repository backed by PostgreSQL database.
Configure PostgreSQL JDBC Module, Driver and Datasource in WildFly
Have a PostgreSQL database server installed either locally or remotely, accessible to WildFly. For writing this post, I have a locally running PostgreSQL server with the following configuration:
version | 10 |
host | localhost |
port | 5432 (the default port number) |
database user | postgres |
database password | none |
database name | postgres (same as database user) |
jdbc connection url | jdbc:postgresql://localhost/postgres |
Download PostgreSQL jdbc driver jar from the vendor website. Choose the version that is compatible with your PostgreSQL database server.
Create a JBoss module for PostgreSQL jdbc driver jar in WildFly, by creating the module directory structure, copying jdbc driver jar to here and creating module.xml for this new module:
$ mkdir -p $JBOSS_HOME/modules/org/postgresql/driver/main/ $ cp ~/Downloads/postgresql-42.2.2.jar $JBOSS_HOME/modules/org/postgresql/driver/main/
Create module.xml under org/postgresql/main/ directory, with the following content:
<module xmlns="urn:jboss:module:1.3" name="org.postgresql.driver">
<resources>
<resource-root path="postgresql-42.2.2.jar" />
</resources>
<dependencies>
<module name="javax.api"/>
<module name="javax.transaction.api"/>
</dependencies>
</module>
Create PostgreSQL jdbc driver and datasource resources in WildFly with CLI:
$ cd $JBOSS_HOME/bin $ ./jboss-cli.sh --connect [standalone@localhost:9990 /] /subsystem=datasources/jdbc-driver=postgres:add(driver-name=postgres, driver-module-name=org.postgresql.driver, driver-class-name=org.postgresql.Driver, driver-xa-datasource-class-name=org.postgresql.xa.PGXADataSource) [standalone@localhost:9990 /] data-source add --name=PostgresDS --jndi-name=java:jboss/PostgresDS --driver-name=postgres --connection-url=jdbc:postgresql://localhost/postgres --user-name=postgres --enabled=true --use-java-context=true --jta=true
Configure and Access Batch Subsystem in WildFly
Create batch jdbc job repository using PostgreSQL datasource, and register it as the default batch job repository:
/subsystem=batch-jberet/jdbc-job-repository=jdbc:add(data-source=PostgresDS) /subsystem=batch-jberet/:write-attribute(name=default-job-repository, value=jdbc) :reload
To view the current configuration for WildFly batch subsystem:
/subsystem=batch-jberet:read-resource(recursive=true) { "outcome" => "success", "result" => { "default-job-repository" => "jdbc", "default-thread-pool" => "batch", "restart-jobs-on-resume" => true, "security-domain" => undefined, "in-memory-job-repository" => {"in-memory" => {}}, "jdbc-job-repository" => {"jdbc" => {"data-source" => "PostgresDS"}}, "thread-factory" => undefined, "thread-pool" => {"batch" => { "keepalive-time" => { "time" => 30L, "unit" => "SECONDS" }, "max-threads" => 10, "name" => "batch", "thread-factory" => undefined }} } }
To view batch job data for a specific application:
/deployment=restAPI.war/subsystem=batch-jberet:read-resource(recursive=true, include-runtime=true) { "outcome" => "success", "result" => { "job-xml-names" => [ "restJob2.xml", "restJob3.xml", "restJob1.xml", "submitted.xml", "restJobWithParams.xml", "org.jberet.test.infinispanRepository.xml" ], "job" => { "restJob2" => { "instance-count" => 2, "job-xml-names" => ["restJob2.xml"], "running-executions" => 0, "execution" => { "25" => { "batch-status" => "COMPLETED", "create-time" => "2018-06-05T18:07:20.858+0000", "end-time" => "2018-06-05T18:07:20.870+0000", "exit-status" => "COMPLETED", "instance-id" => 23L, "last-updated-time" => "2018-06-05T18:07:20.870+0000", "start-time" => "2018-06-05T18:07:20.862+0000" }, "2" => { "batch-status" => "COMPLETED", "create-time" => "2018-06-05T18:02:07.183+0000", "end-time" => "2018-06-05T18:02:07.218+0000", "exit-status" => "COMPLETED", "instance-id" => 2L, "last-updated-time" => "2018-06-05T18:02:07.218+0000", "start-time" => "2018-06-05T18:02:07.190+0000" } } },
...
Query Batch Job Data with PostgreSQL Client Tool
Another way to access batch job data is to query batch jdbc job repository with PostgreSQL client tool, such as psql. This offers a direct access to the underlying database for the batch job repository, and therefore shoud be used with great caution.
To start psql and connect with user postgres and database postgres:
$ psql --U postgres postgres=#
To view available tables in postgres database:
postgres=# \dt List of relations Schema | Name | Type | Owner --------+---------------------+-------+---------- public | job_execution | table | postgres public | job_instance | table | postgres public | partition_execution | table | postgres public | step_execution | table | postgres (4 rows)
To vew table schemas for a specific table:
postgres=# \d job_execution Table "public.job_execution" Column | Type | Collation | Nullable | Default -----------------+--------------------------+-----------+----------+------------------------------------------------------- jobexecutionid | bigint | | not null | nextval('job_execution_jobexecutionid_seq'::regclass) jobinstanceid | bigint | | not null | version | integer | | | createtime | timestamp with time zone | | | starttime | timestamp with time zone | | | endtime | timestamp with time zone | | | lastupdatedtime | timestamp with time zone | | | batchstatus | character varying(30) | | | exitstatus | character varying(512) | | | jobparameters | character varying(3000) | | | restartposition | character varying(255) | | | Indexes: "job_execution_pkey" PRIMARY KEY, btree (jobexecutionid) Foreign-key constraints: "fk_job_execution_job_instance" FOREIGN KEY (jobinstanceid) REFERENCES job_instance(jobinstanceid) ON DELETE CASCADE Referenced by: TABLE "step_execution" CONSTRAINT "fk_step_exe_job_exe" FOREIGN KEY (jobexecutionid) REFERENCES job_execution(jobexecutionid) ON DELETE CASCADE
To query job execution data in chronological order:
postgres=# select jobexecutionid, endtime, batchstatus from job_execution order by jobexecutionid desc limit 10; jobexecutionid | endtime | batchstatus ----------------+----------------------------+------------- 28 | 2018-06-10 11:08:32.531-04 | COMPLETED 27 | 2018-06-05 14:07:21.009-04 | COMPLETED 26 | 2018-06-05 14:07:20.923-04 | COMPLETED 25 | 2018-06-05 14:07:20.87-04 | COMPLETED 24 | 2018-06-05 14:07:20.315-04 | COMPLETED 23 | 2018-06-05 14:07:20.281-04 | COMPLETED 22 | 2018-06-05 14:07:19.715-04 | COMPLETED 21 | 2018-06-05 14:07:19.192-04 | FAILED 20 | 2018-06-05 14:07:18.618-04 | COMPLETED 19 | 2018-06-05 14:07:18.01-04 | COMPLETED (10 rows)
To view job instance data:
postgres=# select * from job_instance order by jobinstanceid desc limit 5; jobinstanceid | version | jobname | applicationname ---------------+---------+-------------------+----------------- 26 | | restJob1 | restAPI 25 | | restJob1 | restAPI 24 | | restJob1 | restAPI 23 | | restJob2 | restAPI 22 | | restJobWithParams | restAPI (5 rows)
To view step execution data:
postgres=# select stepexecutionid, stepname, starttime, batchstatus, commitcount from step_execution order by stepexecutionid desc limit 10; stepexecutionid | stepname | starttime | batchstatus | commitcount -----------------+-------------------------+----------------------------+-------------+------------- 28 | restJob1.step1 | 2018-06-10 11:08:32.508-04 | COMPLETED | 0 27 | restJob1.step1 | 2018-06-05 14:07:20.996-04 | COMPLETED | 0 26 | restJob1.step1 | 2018-06-05 14:07:20.913-04 | COMPLETED | 0 25 | restJob2.step1 | 2018-06-05 14:07:20.864-04 | COMPLETED | 0 24 | restJobWithParams.step1 | 2018-06-05 14:07:20.308-04 | COMPLETED | 0 23 | restJobWithParams.step1 | 2018-06-05 14:07:20.272-04 | COMPLETED | 0 22 | restJobWithParams.step1 | 2018-06-05 14:07:19.71-04 | COMPLETED | 0 21 | restJobWithParams.step1 | 2018-06-05 14:07:19.182-04 | FAILED | 0 20 | restJobWithParams.step1 | 2018-06-05 14:07:18.609-04 | COMPLETED | 0 19 | restJobWithParams.step1 | 2018-06-05 14:07:18.004-04 | COMPLETED | 0 (10 rows)
Summary
This post demonstrates how to configure PostgreSQL driver, datasource, and batch jdbc job repository with WildFly CLI commands. Batch application developers can use CLI commands to not only configure WildFly subsystems, but also access batch data and perform certain batch processing operations. SQL tools such as PostgreSQL psql offer a more direct way of access and sometimes even modify batch job data (e.g., manually updating batch status for crashed applications). Therefore, such direct access to batch job repository should be done with great caution so as not to corrupt batch job repository data.