2018

This project is read only now. Read more.

jBeret

June 2018 Previous month Next month

PostgreSQL JDBC Job Repository in WildFly

Posted by cfang Jun 10, 2018

WildFly supports batch processing through its batch-jberet subsystem, which can be configured to use either a in-memory or jdbc job repository. The in-memory batch job repository, the default one, requires no extra configuration and allows for quick batch application development and deployment. For more advanced batch processing applications that call for persistence job data and access to shared job data between multiple WildFly instances, a jdbc job repository is required. In this blog post, I'll go through key steps how to configure and use a jdbc job repository backed by PostgreSQL database.

Configure PostgreSQL JDBC Module, Driver and Datasource in WildFly

Have a PostgreSQL database server installed either locally or remotely, accessible to WildFly. For writing this post, I have a locally running PostgreSQL server with the following configuration:

version	10
host	localhost
port	5432 (the default port number)
database user	postgres
database password	none
database name	postgres (same as database user)
jdbc connection url	jdbc:postgresql://localhost/postgres

Download PostgreSQL jdbc driver jar from the vendor website. Choose the version that is compatible with your PostgreSQL database server.

Create a JBoss module for PostgreSQL jdbc driver jar in WildFly, by creating the module directory structure, copying jdbc driver jar to here and creating module.xml for this new module:

$ mkdir -p $JBOSS_HOME/modules/org/postgresql/driver/main/
$ cp ~/Downloads/postgresql-42.2.2.jar $JBOSS_HOME/modules/org/postgresql/driver/main/

Create module.xml under org/postgresql/main/ directory, with the following content:

<module xmlns="urn:jboss:module:1.3" name="org.postgresql.driver">
 <resources>
  <resource-root path="postgresql-42.2.2.jar" />
 </resources>
 <dependencies>
  <module name="javax.api"/>
  <module name="javax.transaction.api"/>
 </dependencies>
</module>

Create PostgreSQL jdbc driver and datasource resources in WildFly with CLI:

$ cd $JBOSS_HOME/bin
$ ./jboss-cli.sh --connect

[standalone@localhost:9990 /] /subsystem=datasources/jdbc-driver=postgres:add(driver-name=postgres, driver-module-name=org.postgresql.driver, driver-class-name=org.postgresql.Driver, driver-xa-datasource-class-name=org.postgresql.xa.PGXADataSource)

[standalone@localhost:9990 /] data-source add --name=PostgresDS --jndi-name=java:jboss/PostgresDS --driver-name=postgres --connection-url=jdbc:postgresql://localhost/postgres --user-name=postgres --enabled=true --use-java-context=true --jta=true

Configure and Access Batch Subsystem in WildFly

Create batch jdbc job repository using PostgreSQL datasource, and register it as the default batch job repository:

/subsystem=batch-jberet/jdbc-job-repository=jdbc:add(data-source=PostgresDS)
/subsystem=batch-jberet/:write-attribute(name=default-job-repository, value=jdbc)
:reload

To view the current configuration for WildFly batch subsystem:

/subsystem=batch-jberet:read-resource(recursive=true)
{
    "outcome" => "success",
    "result" => {
        "default-job-repository" => "jdbc",
        "default-thread-pool" => "batch",
        "restart-jobs-on-resume" => true,
        "security-domain" => undefined,
        "in-memory-job-repository" => {"in-memory" => {}},
        "jdbc-job-repository" => {"jdbc" => {"data-source" => "PostgresDS"}},
        "thread-factory" => undefined,
        "thread-pool" => {"batch" => {
            "keepalive-time" => {
                "time" => 30L,
                "unit" => "SECONDS"
            },
            "max-threads" => 10,
            "name" => "batch",
            "thread-factory" => undefined
        }}
    }
}

To view batch job data for a specific application:

/deployment=restAPI.war/subsystem=batch-jberet:read-resource(recursive=true, include-runtime=true)

{
    "outcome" => "success",
    "result" => {
        "job-xml-names" => [
            "restJob2.xml",
            "restJob3.xml",
            "restJob1.xml",
            "submitted.xml",
            "restJobWithParams.xml",
            "org.jberet.test.infinispanRepository.xml"
        ],
        "job" => {
            "restJob2" => {
                "instance-count" => 2,
                "job-xml-names" => ["restJob2.xml"],
                "running-executions" => 0,
                "execution" => {
                    "25" => {
                        "batch-status" => "COMPLETED",
                        "create-time" => "2018-06-05T18:07:20.858+0000",
                        "end-time" => "2018-06-05T18:07:20.870+0000",
                        "exit-status" => "COMPLETED",
                        "instance-id" => 23L,
                        "last-updated-time" => "2018-06-05T18:07:20.870+0000",
                        "start-time" => "2018-06-05T18:07:20.862+0000"
                    },
                    "2" => {
                        "batch-status" => "COMPLETED",
                        "create-time" => "2018-06-05T18:02:07.183+0000",
                        "end-time" => "2018-06-05T18:02:07.218+0000",
                        "exit-status" => "COMPLETED",
                        "instance-id" => 2L,
                        "last-updated-time" => "2018-06-05T18:02:07.218+0000",
                        "start-time" => "2018-06-05T18:02:07.190+0000"
                    }
                }
            },

...

Query Batch Job Data with PostgreSQL Client Tool

Another way to access batch job data is to query batch jdbc job repository with PostgreSQL client tool, such as psql. This offers a direct access to the underlying database for the batch job repository, and therefore shoud be used with great caution.

To start psql and connect with user postgres and database postgres:

$ psql --U postgres
postgres=#

To view available tables in postgres database:

postgres=# \dt
                List of relations
 Schema |        Name         | Type  |  Owner
--------+---------------------+-------+----------
 public | job_execution       | table | postgres
 public | job_instance        | table | postgres
 public | partition_execution | table | postgres
 public | step_execution      | table | postgres
(4 rows)

To vew table schemas for a specific table:

postgres=# \d job_execution
                                               Table "public.job_execution"
     Column      |           Type           | Collation | Nullable |                        Default
-----------------+--------------------------+-----------+----------+-------------------------------------------------------
 jobexecutionid  | bigint                   |           | not null | nextval('job_execution_jobexecutionid_seq'::regclass)
 jobinstanceid   | bigint                   |           | not null |
 version         | integer                  |           |          |
 createtime      | timestamp with time zone |           |          |
 starttime       | timestamp with time zone |           |          |
 endtime         | timestamp with time zone |           |          |
 lastupdatedtime | timestamp with time zone |           |          |
 batchstatus     | character varying(30)    |           |          |
 exitstatus      | character varying(512)   |           |          |
 jobparameters   | character varying(3000)  |           |          |
 restartposition | character varying(255)   |           |          |
Indexes:
    "job_execution_pkey" PRIMARY KEY, btree (jobexecutionid)
Foreign-key constraints:
    "fk_job_execution_job_instance" FOREIGN KEY (jobinstanceid) REFERENCES job_instance(jobinstanceid) ON DELETE CASCADE
Referenced by:
    TABLE "step_execution" CONSTRAINT "fk_step_exe_job_exe" FOREIGN KEY (jobexecutionid) REFERENCES job_execution(jobexecutionid) ON DELETE CASCADE

To query job execution data in chronological order:

postgres=# select jobexecutionid, endtime, batchstatus from job_execution order by jobexecutionid desc limit 10;
 
jobexecutionid |          endtime           | batchstatus
----------------+----------------------------+-------------
             28 | 2018-06-10 11:08:32.531-04 | COMPLETED
             27 | 2018-06-05 14:07:21.009-04 | COMPLETED
             26 | 2018-06-05 14:07:20.923-04 | COMPLETED
             25 | 2018-06-05 14:07:20.87-04  | COMPLETED
             24 | 2018-06-05 14:07:20.315-04 | COMPLETED
             23 | 2018-06-05 14:07:20.281-04 | COMPLETED
             22 | 2018-06-05 14:07:19.715-04 | COMPLETED
             21 | 2018-06-05 14:07:19.192-04 | FAILED
             20 | 2018-06-05 14:07:18.618-04 | COMPLETED
             19 | 2018-06-05 14:07:18.01-04  | COMPLETED
(10 rows)

To view job instance data:

postgres=# select * from job_instance order by jobinstanceid desc limit 5;

 jobinstanceid | version |      jobname      | applicationname
---------------+---------+-------------------+-----------------
            26 |         | restJob1          | restAPI
            25 |         | restJob1          | restAPI
            24 |         | restJob1          | restAPI
            23 |         | restJob2          | restAPI
            22 |         | restJobWithParams | restAPI
(5 rows)

To view step execution data:

postgres=# select stepexecutionid, stepname, starttime, batchstatus, commitcount from step_execution order by stepexecutionid desc limit 10;
 stepexecutionid |        stepname         |         starttime          | batchstatus | commitcount
-----------------+-------------------------+----------------------------+-------------+-------------
              28 | restJob1.step1          | 2018-06-10 11:08:32.508-04 | COMPLETED   |           0
              27 | restJob1.step1          | 2018-06-05 14:07:20.996-04 | COMPLETED   |           0
              26 | restJob1.step1          | 2018-06-05 14:07:20.913-04 | COMPLETED   |           0
              25 | restJob2.step1          | 2018-06-05 14:07:20.864-04 | COMPLETED   |           0
              24 | restJobWithParams.step1 | 2018-06-05 14:07:20.308-04 | COMPLETED   |           0
              23 | restJobWithParams.step1 | 2018-06-05 14:07:20.272-04 | COMPLETED   |           0
              22 | restJobWithParams.step1 | 2018-06-05 14:07:19.71-04  | COMPLETED   |           0
              21 | restJobWithParams.step1 | 2018-06-05 14:07:19.182-04 | FAILED      |           0
              20 | restJobWithParams.step1 | 2018-06-05 14:07:18.609-04 | COMPLETED   |           0
              19 | restJobWithParams.step1 | 2018-06-05 14:07:18.004-04 | COMPLETED   |           0
(10 rows)

Summary

This post demonstrates how to configure PostgreSQL driver, datasource, and batch jdbc job repository with WildFly CLI commands. Batch application developers can use CLI commands to not only configure WildFly subsystems, but also access batch data and perform certain batch processing operations. SQL tools such as PostgreSQL psql offer a more direct way of access and sometimes even modify batch job data (e.g., manually updating batch status for crashed applications). Therefore, such direct access to batch job repository should be done with great caution so as not to corrupt batch job repository data.

Filter Blog

By date:

By tag:

JBossDeveloper