1 2 3 Previous Next

Twins Father

32 posts

How JBoss Products Contribute to SEDA Models

Posted by andy.song Aug 12, 2010

[Introduction]

SEDA, is a design for highly-concurrent servers based on a hybrid of event-driven and thread-driven concurrency. The idea is to break the server logic into a series of stages connected with queues; each stage has a (small and dynamically-sized) thread pool to process incoming events, and passes events to other stages. http://matt-welsh.blogspot.com/2010/07/retrospective-on-seda.html

[Pre-conditions]

Your system processing could be model as ansynchronous processing.
You'd better not expect the SEDA model could give your reliable return results (No directly return value).
You system processing could be break into several stages/states. And the transition of each stage/state could have dependencies or just state transitions.
Each stage/state could be repeatable if the input was same. (That will reduce a lot of complexity on the recovery part)
Performance is not your highest priority in your mind

[Chanllenges]

State transition need to be visible and manageable. And multiple versions request may be facing in most situations
Cannot find feasible data communication channel for data transfering when stage/state swapping.
You cannot find out one feasible, lightweight, portable multiple thread exectuion components.
Recovery will be even more harder as the execution logic manually break-up

[Solutions]

JBPM 3.2.7 or later. I would rather more expect the JBPM 3.2.9 version with improvements with sql change and fork/join improvement
HornetQ 2.0 or later. The reliable asynchronous messaging processing system.
CommonJ pattern + Command Pattern. The pattern support muli-threading system and replaying features
IFINISPAN jboss version data grid cache solutions.

[Need Consideration]

JBPM performance issue, please refer my blog for new way scale up/out jbpm ref: http://community.jboss.org/people/andy.song/blog/2010/08/06/new-way-scale-upout-the-jbpm-without-jbpm-clustering-jbpm-sharding
Seralization issue please use MapMessage or TextMessage when you want transfer data using message protocol. JSON or JAXB was good solutions for that.
Reliable Multi-Thread components:
- CommonJ pattern.
- Data Grid
- Computing Grid
Recovery may need pre-persist the input for each stage/state execution. Then Command pattern (REDO) part could help that recovery.

[Benefitical]

Parellelism will be largely put inside the SEDA models. So your system could survive in huge load or small load
Scalability already be built-in. So add more servers will help you a lot or upgrade your single machine power may also good. (Scale-up/Scale-out)
JBPM help you on the state transition visible and manageable, versioning already be build-in
INFINISPAN will help you on data grid part, so very soon it will provide much reliable data grid execution platform for you.
Plug-in and extension will be easily, like Drools, etc.

New way scale up/out the JBPM without JBPM clustering [JBPM Sharding]

Posted by andy.song Aug 6, 2010

[Introduction]

JBPM (JBoss Business Process Management) is the one of famous open source BPM tool in the world! Even the orignial founder of JBPM already moved to another company but still you need admit JBPM still be good as your expectation. But as usual most BPM trigger pattern hold for user/role interaction model or some explictly rule model or some business calendar pattern, so it's not suitable for some OLTP/Emergency action need. So from my research and sprawlling in the internet, I don't find lot interesting information about the scalability and performance discussion about JBPM. So we dig into and find our way break through the barrier blocking the JBPM cannot support OLTP/Emergency action need. We did some interesting practice and get huge benefit from that, I will share that information for all of you.

[Background]

Our company main handle the communication/negotiation between planner and supplier in the meeting planning fields. And most actions finished by the online web operations, so when each peer finish their operations we need trigger some designated processes then notify the another peers about the detail information as the format with Email, SMS or FAX. The process includes serveral little complex rules, and each step output will impact next steps behavior in most situations. Then we find it mostly match with BPM pattern. So we decide use the JBPM plus MVEL. May someone will jump out say why not Drools? It's too complex for our small business requirements at least from my perspectives.

[Lessons]

So what we learned from:

Which version supposed to be trust in many situations?

The answer is JBPM 3.X. Why? Because the founder of JBPM left JBoss when the date of JBPM 4.X supposed to be released, many concept and api was designed very well. But the backend storage design still leverage the old model of JBPM 3.X (and I quite cannot believe the JBoss guys also design so badly about the schema of DB and Hibernate especially from performance perspective). After carfully and painful practices we gave up the JBPM 4.X and moved back to JBPM 3.X. Current in prod we are using JBPM 3.2.7

Please don't put any execution of business logic inside JBPM VM

That doesn't mean you cannot do that, only reason if you use that you will bring lots of contention into JBPM backend model (especially DB parts) and also memory. So try your best push out the exeution outside the JBPM VM with messaging system or some distributed invocation. Current in prod we are using JMS (HornetQ is really good messaging-oriented product really supposed to be tried)

Please don't simply persist you map variables requests into JPBM

If you are familiar with JBPM, jpbm try to transit the variables as map format (similar with java.util.Map) during process execution. So in most cases I do see many person try to simply leverage with protocol transit the variables, that was the most victim of performance devil. So we still need that protocol but please use that smartly, rather than simply persist the Map information iteratively change to only persist one key/value with unique business key/ whole serializd map. You will get huge performance indicates.

Before Recommend

//Initialize the Map //Initialize the Map

Map<String, ?> variables; Map<String, ?> variables;

.... .... .... ....

variables.put("hello", 10); variables.put("hello", 10);

variables.put("hello1", 20); variables.put("hello1", 20);

variables.put("hello1", new HashMap()); variables.put("hello1", new HashMap());

... ... ..... ....

//Call JBPM api //Call JBPM api

ProcessInterface pi = getProcessInterface(); ProcessInterface pi = getProcessInterface();

ContextInterface ci = pi.getContextInterface(); ContextInterface ci = pi.getContextInterface();

for (Map.Entry<String, ?> aEntry : variables.entrySet()) { String serilizedText = ToStringUtil.

serializedByJason(variables);
ci.setVariable(aEntry.getKey(), aEntry.getValue()); Context.setVariable("BusinessKey", serilizedText);
}

After at least 10 rounds of performance testing we got the following comparable results:

total invocation(start process) per hour Avg Time (milliseconds)

Before 1,000 8,982

Recommend 2,000 412

Almost we got 20X increasing from performance perspective.

[Sharding !!!]

Sorry for so late we came to the main point, here introduce the sharding.

JBPM clustering

JBPM sharding

Pros:

Every node should consist the same process definitions (not mandatory, I will discuss some variants)
You should have your own distributor or LB controller (inter-jvm or intra-jvm will be fine, only difference you want scale-up or scale-out)
Try you best be aware of your single node capacity, if your application request far beyond the single node or clustering nodes capacity then try this pattern.

Cons:

The load of your application (for JBPM part) will divided into X parts (X means how many sharding you want to setup) and also keep the performance of single nodes under specific loads. Increase total capacities.
Maintainence is harder than before, some deployment or other useful tools need be developed by yourselves.
Issue diagnostic will be even harder than before. You should have some centralized logging system helpping you dig into the process existed within which nodes, etc.
Archiving strategy need clearly definied and controlled. Otherwise you will be messed up.

Suitable areas:

Can be used in any areas, but need resolve the process locality/stickness issues. I will discuss that later.

[Knowledges]

Distributor design
- Intra-JVM

Within single JVM we started multiple JBPM shardings, the distributor will be easily become the java method calling

Inter-JVM

Running multiple JVMS holds for different JBPM shardings, the distributor will be designed as remote method calling:

Web Services
Socket
JMS

Current in prod we use the intra-Jvm design

Distribution Algorithm
- Consistent Hashing LB
- Weight LB
- Round robin LB
Process Locality/Stickness
- tight coupling with JBPM

JBPM ProcessInstance structure cotain one not used field "key", so when we create process instance in jbpm we can put the sharding ID of jbpm into process instance. So later we need populate that key information when do node event handling and signal process continuation.

pros:

Very easy and useful, not spof

cons:

Hard to diagnostic

Current in prod we use that design, but we will log the process creation information into logging system

Loose coupling with JBPM

Create one centralized JBPM process DNS lookup table, similar with other DB sharding design. But you also meet spof and maintainence issues.

Our final design

Results

Invocation Cnt (1 hour) Avg Invocation Time(Milliseconds)

Single JBPM 9,000 412

JBPM Clustering (4 JBPM nodes + 1 JBPM storage) 10,000 350

JBPM sharding (4 JBPM nodes + 4 JBPM storages) 38,000 370

+ Intra-JVM distributor

So we can see the throughputs and performance increase a lot.

Other variants
- Computing Grid + JBPM sharding (TBD)

Clouding + JBPM sharding (JBPM clouding) (TBD)

[Note]

Please forgive me poor english, that skill need to be long-going improvements as I'm Chinese guy.

All testing setting:

1. JVM: jdk 1.6 u21

2. Marchine: Single machine (VM), with dual core cpu and 4G memory

[Reference]

JBPM site: http://jboss.org/jbpm
JBPM 3.2.7 docs: http://docs.jboss.com/jbpm/v3.2/userguide/html_single/
JBPM Clusering references: http://www.theserverside.com/news/1363588/Scalability-and-Performance-of-jBPM-Workflow-Engine-in-a-JBoss-Cluster
Consistent Hashing: http://michaelnielsen.org/blog/consistent-hashing/
CommonJ pattern: http://commonj.myfoo.de/
Terracotta: http://www.terracotta.org/
My twins photo: http://cid-afd8b274e5b99db9.photos.live.com/browse.aspx/10%e6%9c%885%e6%97%a5

[My Bio]

I am living and working in Shanghai of China, my current company was StarCite. I worked as technical leader in that company and my most forcus was scalability and performance design.

You can contact me via email: hqxsn@hotmail.com

MSN: hqxsn@sina.com

Filter Blog

By date:

By tag:

JBossDeveloper

Twins Father

How JBoss Products Contribute to SEDA Models

New way scale up/out the JBPM without JBPM clustering [JBPM Sharding]

Actions

Filter Blog