Skip navigation
2006

 

Read this blog carefully and you're in for a PAYRAISE. Workflow and business process technology will be essential in developing next generation applications. The knowledge about it is scarce. So it is an easy way to be better then your collegues.

 

There is a very clear difference between concurrency in a workflow and java concurrency. And yet, on a regular basis I meet people that get them mixed up in quite ugly ways. So under the assumption that everyone reads this blog, this effort will wipe out all related confusion in one go.

 

We suppose that the reader is familiar with the very basic idea of a Thread in Java. I will explain in detail what workflow is and how it relates to the notion of a Java Thread.

 

Now, let's look at what workflow is first and then proceed to process concurrency. A workflow or business process always contains a description of a state machine. That is one of the two fundamental differences between workflow and plain Java. When a state machine is created it enters the initial state. After that, signals can be applied to the state machine, causing the statemachine to move to a new state.

 

For more backround on state machines, see Wikipedia's explanation of Finite State Machines or this intro to UML State Diagrams

 

Many parts of a business application can be expressed in terms of a state machine. Typical examples are handling an insurance claim, hiring a new employee, the procedure to go public with a company,... All of these are some form of execution that span a long time. When using Java-only, the good ol' top-down approach can't be applied. Cause Java doesn't support persistent wait states. All of these examples include long wait states in which your server application is waiting for someone else or something else to initiate continuation of the overall execution. With Java (or any other imperative programming language) you just don't have a way to program a persistent wait state. Hence you can not express the overall process in Java. The solution is to use a workflow language in which you can express processes. The process has a state machine nature. It can express the overall, long running process.

 

On a side note, I just HATE the term 'long running transactions' that is sometimes used for this. It just sends people barking up the wrong tree. A long running process is made up of many SHORT-LIVED! transactions and long waiting periods inbetween.

 

First thing we need to highlight is the transactional nature of a state machines. A traditional state machine is always in one state. Then because of some signal input it moves to the next state. A state machine can never be 'somewere halfway the transition'. State transitions are considered instantaneous. This maps perfectly to the ACID properties of database transactions.

 

 

Now comes the most difficult part. A traditional state machine is not sufficient to express real life processes. Real life processes can have multiple concurrent paths of execution, whereas a traditional state machine only has one. The best example of concurrent paths of execution is the billing and shipping path in a sale process as shown in the picture above. Each of those can have several sequential steps. But the billing path can execute completely independent of the shipping path.

 

The clue of this blog is that the state machine semantics remain in tact even when multiple concurrent paths of executions are involved. Suppose that for a given sale, we are half way down the shipping and billing path. The whole sale process still acts as a state machine. A signal that comes in will bring the process into a new state. E.g. when a payment notification arrives from the bank, the billing path will transition to the next state, while the shipping path will remain where it was. The overall state of the process is the combination of the states for all the paths of execution.

 

Now, we saw already two important ingredients: processes are in fact state machines and state transitions map perfectly onto transactions. Inside of the transaction, all we have to do is calculate the next state from a given state and the signal that is applied. This calculation does not at all require any multithreaded computing. In fact, the transactional nature of a state transition implies that a single thread is most applicable. Working with multiple threads on one transaction is nasty business.

 

To illustrate further the difference between process concurrency and java concurrency, let's look at the synchronization needed on the level of the workflow process. Consider 2 signals that potentially can arrive at the same time. E.g. One signal is expected from the shipper, accepting the order to carry the goods from the warehouse to the customer. Another signal that can arrive simultaneous is the bank's notification of payment by the customer. Suppose that there is one global process variable that keeps track of the total number of signals received. So the state of the process execution and the process variable 'nbrOfSignals' are stored in the database.

 

Here's the scenario of how things can go wrong: the two signals arrive simultaneous, resulting in 2 separate database transactions being started. Then, both transactions read the 'nbrOfSignals' variable from the database, increment it and update it. Then both of these transactions will try to commit. What is the outcome ? The answer is that it depends on the configuration or your database isolation level and the locking strategy used (pessimistic or optimistic). Nothing new has to be reinvented for handling process concurrency. It all comes down to handling database concurrency.

 

For more on isolation levels and database locking, see the wikipedia explanation for isolation and this exerpt about transaction isolation from an oreilly EJB book.

 

In summary: Multithreaded computing does not have anything to do with process concurrency. Process executions move from one state to the next in a database transaction. One such transaction can always be calculated in one simple thread. Each signal is handled in one database transaction. So when multiple incoming signals can arrive simultaneous, the database synchronization features (locking and isolation levels) can be leveraged to handle process concurrency.

 

If you got this far and still got a clue, MENTION IT ON YOUR NEXT EVALUATION and you can be sure that you're in for a big payraise :). Actually I'm serious ! Knowing how business processes relate to plain java and database transactions is KEY to make the software for your project simple, robust and maintainable.