Skip navigation

Twins Father

August 2010 Previous month Next month



Many my workmates and viewers ask me about the detail about the "Parallelism" I mentioned in my previous blog post  Because what they think and practice it is impossible from their  perspectives, and without the details they even cannot accept the "JBPM  sharding" solutions. Then I realized I hided too much things, without  that level details something about the new ideas could not be practicable.




  • Parallelism

Here I will not lend some expert explaination, I will use my little poor words skill ().  If we make the whole system behavior (SLA, Stability) is log(N) same  when dealing  single request or multiple requests. Simplifies with the  examples:



Single Request

(Total Time)

10 Requests

(Total Time)

100 Request

(Total Time)

Parallelism10 Seconds10.XX seconds11.XX seconds
None Parallelism10 Seconds15 seconds30 seconds



  • Linear Parallelism

Linear Parallelism.PNG



Normal JBPM operation contains 3 steps:

1. Start New Processes and signal to the first nodes

2. Do the business logics with each node defined.

3. Singal the process continue execution to next nodes


So here we will fundmentally two different choices will come up, others will be little variant with the following two.


1. Everything is conceptual inside-bpm operations


2. Everything is conceptual outside-bpm operations





So which was the parrallelism version? Of course "second" was the one we want.




1. Any invocation/interaction with JBPM should be break-up to two pieces: 1. Trigger , 2. Execution. Trigger leverage with asynchronous messaging protocol, jboss does great jobs on that part so HornetQ/Jboss Message is your first choice.


2. The each invocation/interaction input could be searilized to some text format stream, so business logic will only meet some identifiers stream and true business state information will load from storage (DB, Cache, etc).


3. The same datacenter for jBPM and execution logics will be important, otherwise the network partition will become some bottelnecks.Please ref:


4. Transaction ISOLATION prefer the eventually consistency, rather than read_committed. As snapshot information will be largely used in whole process executions.So tranditional transaction isolation will become the stone-block for parallelism.



1.  Easy model the transparancey layer wrapper jbpm. So the JBPM sharding will become possible implictly.

2.  Object seralization is not acceptable anymore only applicable is identiifer of object seralization.

3.  So the individual request processing a little longer than none-parallelism mode.



1.  Select correct phase for split-up is more important, so SEDA model could help you re-think your system architecture. Please ref:


2.  Select correct seralization mode is also important, TEXT/String will be applicable solutions. But memory consumption for sting in java is very bad, you should try your best banlance the memory consumption and parallelism maximum.


3.  Only could be put inside the same location Data Center. If cross network locations, the pattern is not applicable. We may need other tools for help like Inter-Data Grid service or Clouding services.


4.  You should build your own diagnostic tools for tracing, debuging and deployments. But that means you can learn more, good thing isn't it from DEV perspectives. But managers may hate that.


So right now you understand a little more detail about my previous two blog messages regarding parallelism? Some simple principles comes online:

1. try your best split/isolations.

2. seralization is not always your victim.

3. network communciation isn't always your victim.

4. synchronous or in-jvm execution isn't always means the parellelism

5. research, research, .....

6. reading source codes is good habit

7. slowness, none-parellelsim isn't tools fault, actually is your mind.

8. you should always has your own libraries, because your always don't have enough time for practices.

9. try your best balance the performance(SLA), parellelism, and high avaibilities.



SEDA, is a design for  highly-concurrent servers based on a hybrid of event-driven and   thread-driven concurrency. The idea is to break the server logic into a  series of stages connected with  queues; each stage has a (small and  dynamically-sized) thread pool to process  incoming events, and passes  events to other stages.




  1. Your system processing could be model as ansynchronous processing.
  2. You'd better not expect the SEDA model could give your reliable return results (No directly return value).
  3. You  system processing could be break into several stages/states. And the  transition of each stage/state could have dependencies or just state  transitions.
  4. Each stage/state could be repeatable if the input was same. (That will reduce a lot of complexity on the recovery part)
  5. Performance is not your highest priority in your mind




  1. State transition need to be visible and manageable. And multiple versions request may be facing in most situations
  2. Cannot find feasible data communication channel for data transfering when stage/state swapping.
  3. You cannot find out one feasible, lightweight, portable multiple thread exectuion components.
  4. Recovery will be even more harder as the execution logic manually break-up






  1. JBPM 3.2.7 or later. I would rather more expect the JBPM 3.2.9 version with improvements with sql change and fork/join improvement
  2. HornetQ 2.0 or later. The reliable asynchronous messaging processing system.
  3. CommonJ pattern + Command Pattern. The pattern support muli-threading system and replaying features
  4. IFINISPAN jboss version data grid cache solutions.

[Need Consideration]


  1. JBPM performance issue, please refer my blog for new way scale up/out jbpm ref:
  2. Seralization issue please use MapMessage or TextMessage when you want transfer data using message protocol. JSON or JAXB was good solutions for that.
  3. Reliable Multi-Thread components:
    • CommonJ pattern.
    • Data Grid
    • Computing Grid
  4. Recovery may need pre-persist the input for each stage/state execution. Then Command pattern (REDO) part could help that recovery.


  1. Parellelism will be largely put inside the SEDA models. So your system could survive in huge load or small load
  2. Scalability already be built-in. So add more servers will help you a lot or upgrade your single machine power may also good. (Scale-up/Scale-out)
  3. JBPM help you on the state transition visible and manageable, versioning already be build-in
  4. INFINISPAN will help you on data grid part, so very soon it will provide much reliable data grid execution platform for you.
  5. Plug-in and extension will be easily, like Drools, etc.
JBPM (JBoss Business Process Management) is the one of famous open  source BPM tool in the world! Even the orignial founder of JBPM already  moved to another company but still you need admit JBPM still be good as  your expectation. But as usual most BPM trigger pattern hold for  user/role interaction model or some explictly rule model or some  business calendar pattern, so it's not suitable for some OLTP/Emergency  action need. So from my research and sprawlling in the internet, I don't  find lot interesting information about the scalability and performance  discussion about JBPM. So we dig into and find our way break through the  barrier blocking the JBPM cannot support OLTP/Emergency action need. We  did some interesting practice and get huge benefit from that, I will  share that information for all of you.
Our company main handle the communication/negotiation between  planner and supplier in the meeting planning fields. And most actions  finished by the online web operations, so when each peer finish their  operations we need trigger some designated processes then notify the  another peers about the detail information as the format with Email, SMS  or FAX. The process includes serveral little complex rules, and each  step output will impact next steps behavior in most situations. Then we  find it mostly match with BPM pattern. So we decide use the JBPM plus  MVEL. May someone will jump out say why not Drools? It's too complex for  our small business requirements at least from my perspectives.
So what we learned from:
  • Which version supposed to be trust in many situations?
The answer is JBPM 3.X. Why? Because the founder of JBPM left JBoss  when the date of JBPM 4.X supposed to be released, many concept and api  was designed very well. But the backend storage design still leverage  the old model of JBPM 3.X (and I quite cannot believe the JBoss guys  also design so badly about the schema of DB and Hibernate especially  from performance perspective). After carfully and painful practices we  gave up the JBPM 4.X and moved back to JBPM 3.X. Current in prod we are  using JBPM 3.2.7
  • Please don't put any execution of business logic inside JBPM VM
That doesn't mean you cannot do that, only reason if you use that  you will bring lots of contention into JBPM backend model (especially DB  parts) and also memory. So try your best push out the exeution outside  the JBPM VM with messaging system or some distributed invocation.  Current in prod we are using JMS (HornetQ is really good  messaging-oriented product really supposed to be tried)
  • Please don't simply persist you map variables requests into JPBM
If you are familiar with JBPM, jpbm try to transit the variables as  map format (similar with java.util.Map) during process execution. So in  most cases I do see many person try to simply leverage with protocol  transit the variables, that was the most victim of performance devil. So  we still need that protocol but please use that smartly, rather than  simply persist the Map information iteratively change to only persist  one key/value with unique business key/ whole serializd map. You will  get huge performance indicates.
Before                                                                                           Recommend
//Initialize the Map                                                                         //Initialize the Map
Map<String, ?> variables;                                                               Map<String, ?> variables;
.... ....                                                                                              .... ....
variables.put("hello", 10);                                                                 variables.put("hello", 10); 
variables.put("hello1", 20);                                                               variables.put("hello1", 20);
variables.put("hello1", new HashMap());                                            variables.put("hello1", new HashMap());
... ...                                                                                                ..... ....
//Call JBPM api                                                                                //Call JBPM api
ProcessInterface pi = getProcessInterface();                                   ProcessInterface pi = getProcessInterface();                          
ContextInterface ci = pi.getContextInterface();                               ContextInterface ci = pi.getContextInterface();
for (Map.Entry<String, ?> aEntry : variables.entrySet()) {               String serilizedText = ToStringUtil.
   ci.setVariable(aEntry.getKey(), aEntry.getValue());                      Context.setVariable("BusinessKey", serilizedText);

After at least 10 rounds of performance testing we got the following comparable results:
                           total  invocation(start process) per  hour                   Avg Time (milliseconds)
Before                  1,000                                                                        8,982
Recommend         2,000                                                                       412
Almost we got 20X increasing from performance perspective.
[Sharding !!!]
Sorry for so late we came to the main point, here introduce the sharding.
  • JBPM clustering
  • JBPM sharding

  • Every node should consist the same process definitions (not mandatory, I will discuss some variants)
  • You should have your own distributor or LB controller (inter-jvm or  intra-jvm will be fine, only difference you want scale-up or scale-out)
  • Try you best be aware of your single node capacity, if your  application request far beyond the single node or clustering nodes  capacity then try this pattern.
  • The load of your application (for JBPM part) will divided into X  parts (X means how many sharding you want to setup) and also keep the  performance of single nodes under specific loads. Increase total  capacities.
  • Maintainence is harder than before, some deployment or other useful tools need be developed by yourselves.
  • Issue diagnostic will be even harder than before. You should have  some centralized logging system helpping you dig into the process  existed within which nodes, etc.
  • Archiving strategy need clearly definied and controlled. Otherwise you will be messed up.
Suitable areas:
  • Can be used in any areas, but need resolve the process locality/stickness issues. I will discuss that later.
  • Distributor design
    • Intra-JVM
Within single JVM we started multiple JBPM shardings, the distributor will be easily become the java method calling
    • Inter-JVM
Running multiple JVMS holds for different JBPM shardings, the distributor will be designed as remote method calling:
  1. Web Services
  2. Socket
  3. JMS
Current in prod we use the intra-Jvm design
  • Distribution Algorithm
    • Consistent Hashing LB
    • Weight LB
    • Round robin LB
  • Process Locality/Stickness
    • tight coupling with JBPM
JBPM ProcessInstance structure cotain one not used field "key",  so when we create process instance in jbpm we can put the sharding ID of  jbpm into process instance. So later we need populate that key  information when do node event handling and signal process continuation.
   Very easy and useful, not spof
   Hard to diagnostic
Current in prod we use that design, but we will log the process creation information into logging system
    • Loose coupling with JBPM
Create one centralized JBPM process DNS lookup  table, similar with other DB sharding design. But you also meet spof and  maintainence issues.
  • Our final design
  • Results
                                                                                    Invocation  Cnt (1 hour)                        Avg Invocation Time(Milliseconds)
Single  JBPM                                                                  9,000                                                  412
JBPM Clustering (4 JBPM nodes + 1 JBPM  storage)        10,000                                                  350
JBPM sharding   (4 JBPM nodes + 4 JBPM  storages)       38,000                                                 370
+ Intra-JVM distributor
So we can see the throughputs and performance increase a lot.
  • Other variants
    • Computing Grid + JBPM sharding (TBD)
    • Clouding + JBPM sharding (JBPM clouding)  (TBD)
Please forgive me poor english, that skill need to be long-going improvements as I'm Chinese guy.
All testing setting:
1. JVM: jdk 1.6 u21
2. Marchine: Single machine (VM), with dual core cpu and 4G memory
  1. JBPM site:
  2. JBPM 3.2.7 docs:
  3. JBPM Clusering references:
  4. Consistent Hashing:
  5. CommonJ pattern:
  6. Terracotta:
  7. My twins photo:
[My Bio]
I am living and working in Shanghai of China, my  current company was StarCite. I worked as technical leader in that  company and my most forcus was scalability and performance design.
You can contact me via email: