0 Replies Latest reply on Jul 10, 2008 9:58 AM by tkutz_isobar

Distributed Task Management

tkutz_isobar Jul 10, 2008 9:58 AM

I am looking for feedback on a pattern we have started using, as we would prefer to have it reviewed before we convert more of our system over to using it.

The scenario we face, is that we have a fairly straight-forward workflow, without any true decision points along the way, only failure paths, and a couple of parallelization points. Because the workflow itself is both simple and static, we haven't seen the need to go for a full blown workflow engine or ESB. The point of the processing that we are concerned with, is the parallelization. At a certain point within processing our overall job, we are able to parallelized, and request several of the same component be launched, each working on a subset of the total dataset to be processed. The workflow must then be notified of each of those jobs completing, which it waits for before moving on to the following step.

The first implementation, used JMS to launch the parallel tasks, and temporary queues in the reply-to for the messages, allowing each job to respond to the originator as they start and finish. However, this has a couple problems:
1 - Temporary queues do not appear to support failover, at least on the JMS implementation we are using (JBoss Messaging 1.4), making this solution intolerant to node failure within the cluster.
2 - Since both parent and child tasks are run through services which throttle the processing, and the parent task is still alive while waiting for the child tasks to run, we are wasting processing slots, while the parent is waiting.

To deal with this, we attempted a new pattern. In this pattern, a job originator launches one StatefulSessionBean for the job, followed by JMS messages into a Queue for each of the child tasks. The child tasks then respond by issuing messages for start, finish and/or failure, into a Topic, with the job id as an attribute which can be used to filter the messages. The SFSB then subscribes to the topic, using a filter to look only at messages for the job it was created for, and polls for messages from the child processes.

The polling is implemented using an EJB3 timer on a stateless session bean, which in turn looks up the SFSB indicated by the object attached to the timer, and calls the polling method. The return value of the polling method determines if a new timer is scheduled or not. In between calls, the SFSB proxy is stored in JNDI, under a well defined context and job-specific name. It is removed when the job has either succeeded, or failed, including a timeout failure, if the jobs did not all complete within a specified window.

So far, this is working well, but we are wondering if there are any pitfalls with the approach, particularly with respect to storing the SFSB handle in JNDI, or the use of a Topic with message filters to manage message routing.

If anyone here has any comments, they'd be greatly appreciated.