New way traversing the object graph of java efficiently

Posted by andy.song Dec 26, 2010

Normally we only have two ways to traversing the object graph of java.

(Assume the java object we talking is conform to java bean specification getXXX/setXXX)

* Prototype java method calling

For instance:

A a = new A();

String name = a.getName();

Benefit:

* Fast and efficient

* No need arbitrary cast

Drawback:

* Only static binding and cannot dynamic specify

* Maintance effort if high and may change regarding to object instance change.

* Java reflection

For instance;

Method method = A.getClass().getMethod("getName());

A a = new A();

String name = (String)method.invoke(null, a);

Benefit:

* Can be used in the flexible way

Drawback:

* Slowly and may cause PERM generation gc times.

* Type safety not be promised.

So if you want to traversing the object graph of java only way - balance the effiency and performance - is prototype java method. But many of us cannot affort the maintainence effort.

Right now how do we do ? We can leverage the scriptable tools. Here we have three choices: JAXB + Xml Parsing, JXpath, MVEL.

* JAXB + Xml Parsing

First use JAXB deseralize the object into XML, and then use parsing tool interpret them.

* JXpath

Use the simliar XPath methodology interpret the object graph path. It was created by Apache.

* MVEL

The scriptable container can execute the documented java programming.

At the end we select the MVEL as the library small and the execution spee is very fast and quite the little slower than JVM. For instance:

A a = new A();

String name = (String)MVEL.eval("getName()", a);

But here we still have several chanllenges for object traversing:

First, how can we get know the detail of one object.

The answer is annotation, here we can leverage XmlType.

@XmlAccessorType(XmlAccessType.FIELD)

@XmlType("A", propOrder={ "name",

"title"

})

public class A {

private String name;

private String title;

}

Then we can use the A.getClass().getAnnotation(XmlType.class).getProperOrder(). Then you will know the object fields detail.

But several things need to be handled:

* Class hierachy. Class.getSuperClass()

* Static fields. Class.getDeclaredFields()

* All detail collection variables need have a value. Because that's limitation for java reflection

So you can generate the fields collection with the following structures:

public class AutoFieldEntry {

private String simpleFieldName;
private String typeInfo;

    private String fullFieldPath;
    private String xpathFieldPath;
    private String mvelFieldPath;

private String fieldReferenceName;

*** ***

}

Then you will have the whole object structures.

Second, how can we avoid the arbitrary object casting.

* Generics help

public T getValue(String simpleFieldName, Class<T> type, A container) {

AutoFieldEntry entry = getFieldsEntry(simpleFieldName);

if(entry.getTpeInfo().equals(type.getName())) {

return (T)MVEL.eval(mvelFieldPath, container);

} else {

throw new Exception("The type you want don't match with the class definition");

}

So after those things done, you have the object atrribute arbitrary access container. It reduce the reflection disadvantages and usage was more simlar with java propotype method calling with acceptable peformance ahead.

Example for understanding:

The suitable areas:

1. Data binding. Struts, UI drawing, Rule engine or other areas.

Later we will open-source the codes. So you can refer from that.

Reference:

1. JXPath: http://commons.apache.org/jxpath/

2. MVEL: http://mvel.codehaus.org/

3. Visitor for object graph traversing: http://stackoverflow.com/questions/3361608/java-object-graph-visitor-library

How will you convince your boss about the ROI performance tuning?

Posted by andy.song Dec 25, 2010

We often meet this situation "Boss always need balance between technical initiatives (especially performance tuning) and business feature". The account manager or sales will always say more features more revenue. How do we say competitive comments ?

Here I want to post one interesting point, performance tuning can reduce the cost of everyday and greenfield. As all of us know the normal state of computer quite like bear hibernate, so the cost for that state is very low and the voltage usage of power is low either. But if something want to running in the computer the voltage usage will go-up. So the goal of performance tuning want to keep the go-up state lifetime shortest. So please do the following steps:

1. First, get the average voltage usage of normal state of your server.

2. Second, without tunning get the average voltage usage of load running state of your sever.

3. Third, with turning get the average voltage usage of load running state of your sever.

4. Fourth, conclude with each milliseconds performance gainning = # number of voltage reduce / hour

5. Fifth, calculate the money reduced for the voltage saving.

Then later you would say like this "If we could reduce the performance by XXX milliseconds then we can approximately reduce XXX voltage / hour". And each voltage will cost # dollar, so we approximately reduce # dollar totoally 1day. And also by reduce # of cs call, we also reduce # dollar totally 1 day.

At that stage, I think your comments and point will be easily accepted by your boss and sales. Because all of us the purpose is reduce cost and enlarge revenue (Money).

One more hidden goatches for messaging system design

Posted by andy.song Dec 16, 2010

Keep Long Live Consumer or Producer healthy

Most messaging system suggest keep reusing the constructed connection and reduce the significant overhead as you can. But many of them hadn't given us the solution how to keep the connection or other underlying dependent object healthy, so if you didn't take care of that you would find many unexpected behavior happen. So solution could be the following two:

Reactive Mode

You are sure the connection or other underlying depend object unhealthy in sometime, so you will regular check each connection or underlying depend object usages. After one connection or underlying depend object is not used from a while, you can reload or reinitialize that connection.

Benefit: It can meet the connection healthy requirement and no need big moving for your infrastructure

Drawback: The point for reloading or reinitilization was hard to decide. And almost failed in many cases

Proactive Mode

Many messaging system are building upon the traditional TCP/IP, so that means the underlying was one tcp (socket) connection. So keep the connection or underlying depend object for messaging system would transfter to keep tcp (socket) connection healthy. So here you can use heartbeat methodology.

Benefit: It can fullfil the healthy requirement and meet many situations.

Drawback: Big moving for your infrasturcture

So the goal for many messaging system in the future is reduce the overhead for create connection or underlying depend object, and provide the configurable way or transparently keep the connection or underlying depend object healthy fullfil the requirement for the long live (or large scalable messaging driven system)

Ephemeral Port and Windows MaxUserPort revisit

Posted by andy.song Dec 16, 2010

[Ephemeral Port]

Ephemeral port is one virtual port concept, it includes the 4 parts {client_ip, client_port, server_ip, server_port}

So if server can provide 5000 port number for tcp connection, that means total sever can allow the maximumconnections was:

# of client * 5000

Below was referenced from: http://www.ncftp.com/ncftpd/doc/misc/ephemeral_ports.html

Ephemeral ports are temporary ports assigned by a machine's IP stack, and are assigned from a designated range of ports for this purpose. When the connection terminates, the ephemeral port is available for reuse, although most IP stacks won't reuse that port number until the entire pool of ephemeral ports have been used. So, if the client program reconnects, it will be assigned a different ephemeral port number for its side of the new connection.

[Windows MaxUserPort]

Windows default provide the limition to be 5000 in windows 2003 and 15000 in windows 2003.

[Consideration]

Question #1: Will I need always tuned the port range?

Answer #1: Yes and except you know your client will be intranet connections. And each connection is short-around request and response so can finish very quickly. And time_out setting you should also consideration.

For examples:

10000 client of intranet Server port range is 5000

If each connection can be finished below 500 ms, so you won't need tuned the port range. Otherwise you need.

Several performance tuning for web services

Posted by andy.song Dec 15, 2010

Turn on Gzip for web service request and response

When constructing client stub instances, we can do the following things: (Assume we use CXF or JAWS-ri)

AService portyType = super.getPort(AServiceQName, AServicePortType.class);

        BindingProvider bp = (BindingProvider) portType;
        bp.getRequestContext().put(BindingProvider.ENDPOINT_ADDRESS_PROPERTY,
                "<service address>");

Map<String, List<String>> httpHeaders = new HashMap<String, List<String>>();

// For request
httpHeaders.put("Content-Encoding", Collections.singletonList("gzip"));

// For response
httpHeaders.put("Accept-Encoding", Collections.singletonList("gzip"));

bp.getRequestContext().put(MessageContext.HTTP_REQUEST_HEADERS, httpHeaders);

It can reduce a lots network bandwidth and faster transfer flow in the network But the server need do some configuration change can accept or

recognize gzip content.

Cache | Pool the service stub handler

For jbossws, CXF, jaws-ri you can cache or pool the client service stub handler, you can get benefit from the JAXB cache. So the total

performance will improve a lot. But XFire you cannot as it will have perm generation contiue gc issues.

And cache | pool has two choices:

* Session based

Invovle http session, it more like pool

* Dispather

Invovle some load balancing, so it more like cache

Reduce as many network communication as you can

* Coarse-grained api provided

But that not 100% true. For example: 99% customer need one field of one service model, at that time corase-grained or fine-grained ? But in

general coarse-grained api was the goal

* Know your capacity

Define the box for your services, so that means know your server capacity. Load balancing and partitionning prepare from the very beginning.

Service aggregator

* Multiple correlated service request can be aggregate together and consider gateway service.

So the average performance of mutiple intranet communications better than the average of multiple internet communications.

Tuning your enviornment

* Turn on the large memory page support

Reduce the page missing and avoid cpu swap the phsyical memory and disk.

* Turn on NUMA, turn of hyperthreading, turn Tubor boost

* Configure the JVM with the folloing:

** Turn on support NUMA: -XX:+UseNUMA -XX:NUMAPageScanRate=0 -XX:-UseAdaptiveNUMAChunkSizing

** Give enough memory with 64bit: -Xmx16g -Xms16g -Xmn4g

** Turn on parallel old gc if support: -XX:ParallelGCThreads=50 -XX:+UseParallelOldGC

** Turn on large page support: -XX:LargePageSizeInBytes=64m

Sql Group By Common Trap Review

Posted by andy.song Dec 15, 2010

Column Restriction

If you choose to also include expressions that reference columns but do not include an aggregate function, you must list all columns you use this way in the GROUP BY clause.

One of the most common mistakes is to assume that you can reference columns in nonaggregate expressions as long as the columns come from unique rows.

For example:

Table

Student

Colunms:

ID Numeric Primary key

Name String Not Null

Age Int Not Null

Class String Not Null

Score Int Not Null

So if we want get the average score and total class selected for each student include ID, name

Failed query:

select ID, Name, Count(*) as NumOfClass, Sum(score)/numOfClass

from Student

group by ID

Correct query:

select ID, Name, Count(*) as NumOfClass, Sum(score)/numOfClass

from Student

group by ID, Name

More reasonable query:

select Name, Count(*) as NumOfClass, Sum(score)/numOfClass

from Student

group by Name

Grouping on Expressions

One of the most common mistakes is to attempt to group on the expression you create in the SELECT clause rather than on the individual columns. Remember that the GROUP BY clause must refer to columns created by the FROM and WHERE clauses. It cannot use an expression you create in your SELECT clause.

For example:

Wrong Sql:

SELECT Customers.CustLastName || ', ' ||
Customers.CustFirstName AS CustomerFullName,
Customers.CustStreetAddress || ', ' ||
Customers.CustCity || ', ' ||
Customers.CustState || ' ' ||
Customers.CustZip AS CustomerFullAddress
MAX(Engagements.StartDate) AS LatestDate,
SUM(Engagements.ContractPrice)
AS TotalContractPrice
FROM Customers
INNER JOIN Engagements
ON Customers.CustomerID =
Engagements.CustomerID
WHERE Customers.CustState ='WA'
GROUP BY CustomerFullName,
CustomerFullAddress

Correct Sql:

SELECT CE.CustomerFullName,
CE.CustomerFullAddress,
MAX(CE.StartDate) AS LatestDate,
SUM(CE.ContractPrice)
AS TotalContractPrice
FROM
(SELECT Customers.CustLastName || ', ' ||
Customers.CustFirstName AS CustomerFullName,
Customers.CustStreetAddress || ', ' ||
Customers.CustCity || ', ' ||
Customers.CustState || ' ' ||
Customers.CustZip AS CustomerFullAddress,
Engagements.StartDate,
Engagements.ContractPrice
FROM Customers
INNER JOIN Engagements
ON Customers.CustomerID =
Engagements.CustomerID
WHERE Customers.CustState ='WA')
AS CE
GROUP BY CE.CustomerFullName,
CE.CustomerFullAddress

It referenced from "SQL Queries for Mere Mortals, Second Edition"

One general solution replace the UUID

Posted by andy.song Dec 9, 2010

Most distrubution system will select UUID as the object indentifier. But the length of UUID would be 36 chars and in Java it need 72 bytes persist and it was randomly and no-sequential inside so it will hurt the search/lookup within storage. Even some system provide some sequential UUID support, but it binding with system (specialization). So here I want post new way replace that:

The new ID will be includes the following parts:

1. Sever identifier

2. Component identifier

3. Thread identifier (need consider thread group | thread pool either)

4. Time (it need be judged by your system loads, seconds, milliseconds, etc)

Note: Time should be consider centralized synchronized. Otherwise it's not universal unique. Please refer:

http://community.jboss.org/people/andy.song/blog/2010/12/09/the-time-of-your-machine-can-be-trust

For examples:

1 2 111 6183640443687

Sever Component Thread NanoTime

So totoal 19 bits, and its numeric values. And in general situation is has sequence, so use that kinds of id will improve a lot your system performance.

It's not new, Twitter current will leverage that ideas. One of their engineer open source that library (snowflake) in:

https://github.com/twitter/snowflake/tree/1cd0af14db9efa7972a9ed605661a7b70962914a/src

The "Time" of your machine can be trust?

Posted by andy.song Dec 9, 2010

OS will provid your the time support for you automatically, so you already used to that. But how about you go to distrubuted computing senarios, which means multiple servers works for large chunk requests, can you trust each machine time? The answer is "no", as time for each machine depends on the electronic power so some may go faster some may go slower. So that may introduct some expected behaviors if you don't handled that.

So from machine perspective the time synchronization come to stage, for example: Windows Time Service.

But how about from software perspective ? The answer is "centralized or distributed Time service".

The chanllenges will be:

* Network Communication Latency

* The scale level you want to achieve (Seconds, Milliseconds, Nanoseconds, etc)

* Still need machine time service coordination because how about you want to do geographical distribution

* SPOF invovled ? Maybe, but you can overcome by some other design.

The 6 hidden gotchas in messaging system design

Posted by andy.song Dec 9, 2010

1. The restriction of ephemeral port range in OS when designning long running Queue|Topic consumer

2. TCP time_wait and close_wait impact on the create new connections with Messaging Server

3. Due to producer or consumer were client from TCP/IP design, so the close one tcp/ip connection wasn't active close. That will hurt the messaging server capacities.

4. Equal messaging size or freedom messaging size? Equal messaging size will decrease the message server persistence fragment will bring you more benefit when large concurrent messaging going-in and -out.Compressed or uncompressed? Compressed with some overhead with client perspective but gain more benefit from Messaging Server perspective, so it deserved to compressed messages as you can.

5. General Messaging Header need consider, recommend:

* Source

* Destination

* Version

* SendTime

* ReceiveTime

* TransactionId or CorrelationId

6. Messages compatibility.

Do you have more advices?

Will Java "New" keyword putting inside or outside synchronized block?

Posted by andy.song Dec 1, 2010

[Introduction]

Recently I was joined one interesting discussion with my colleagues "What's details for "new" operations of Java? Putting "new" inside or outside synchronized block?". I surprisingly found nobody could telling correctly about that. So here I listed the possible semantics about "new".

A a = new A();

1. Loading class definition of Class "A"

* Using class loader of current thread trying to load class "A"

* Reading the class information into memory (it was "Native Memory" or "VM heap" ? - My answer will be both)

* Transforming the class information to runtime JIT codes let JVM could executing them.

2. Calculate the initial memory consumption for the instance of class "A"

* Calculation contains:

** Static Variables

** None-static Variables

note: Please remember the differences between reference and object. Otherwise you cannot understanding the concept.

3. Analysis the hiercharies of Class "A". Then loop the step 4, 5, 6 until all the class of hiercharies were touched:

4. Use the size be calculated by previous step and use system level "C" function "malloc" allocate the memeory of OS.

* Malloc is not thread safe except several flag need to be set

* Malloc implementation is OS depend.

note: Sun already cover those issues. 99% is thread safe.

5. Create the reference in the stack, and make the reference point to the Object.

6. Calling the object constructor method and do initialization

* Static initialization

* None-static initialization

* Constructor body

note: any of above path could be none-thread safe victim.

So here the answer for "new" is thread safe or not is uncentainty or Dependes class/object instantiation need to be thread-safe.

[Back to topic]

Then the strategy for putting "new" operator inside or outside synchronized block will be:

* Always putting the "new" operator inside the synchronized block except when

* Only if you understand or design class completely and already make sure the whole path is thread safe, you can put the "new" operator outside synchronized block.

References:

"http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/main/java/util/concurrent/LinkedBlockingQueue.java?view=log reversion 1.54

move value allocation outside the lock scope."

Thread Safe Collection Framework interesting notes

Posted by andy.song Nov 21, 2010

1. In many situations, the JDK concurrent collection framework was your fist choice except:

IBM machine
Window Azue
Other JDK vendor

2. Library wasn't common or portable when the high scalability or performance is required. You need better understand your mahcine or system.

3. Entry, key data structure defined in many collections, node allocation was already in guarded. So make sure not put the "new" operation inside the lock scope.

4. Keep using the latest version of JDK concurrent library as you can. They always keep tracing the best result offer for you.

5. Little trick can bring huge return, for instance:

Do you know why synchronized hash map still better than hash table in some conditions? As HashMap use the "==" first then try "Equals", so that's reason why concurrent library provide ConcurrentHashMap than ConcurrentHashTable.

6. Please evaluate your requirement, read only, write only, read most write rarely, read and write almost equals. Then seek better solutions for you. Not blendly trust other person testing results. (Of course, you should suspect my results either )

7. JDK leverage CAS resolving the consistency of memory and it was the key break through for our concurrent programming toolkits. But it has limitations, so STM (Software Transaction Memory), HTM (Hardware Transaction Memeory) will help you resolve the consistency in high concurrent env.

8. Do you consider the memory fragment after you leverage the collections especially you have large chunk of data put/get/remove and each item consume large memory size either? If true, please isolate key word and information storage. One interesting implemenation for your reference Clojure's persistence vector or map.

Any thoughts from you? Hope we can provide the long list for our future reference! That's my dream.

Doug Lea JSR166 ConcurrentHashMap vs Click Cliff NonBlockingHashMap

Posted by andy.song Nov 21, 2010

[Introduction]

Concurrent version of HashMap was laregely used in the multiple thread programming. But most of them based on JDK, how about other alternative?

Today we will compare Click Cliff version of NonBlockingHashMap vs JDK ConcurrentHashMap

[Lab Environment]

Laptop: Dell E6400

CPU: Intel Core 2 (with 2 core)

Memory: 4G DDR2

JDK: Sun JDK 1.6 u21

//Forget: add OS information

OS: Windows XP sp3

[Testing Senario]

start 16 threads currently write string (UUID), each thread will write 1000 numbers of string,into the underlying single queue instance, and at them same time start 16 threads concurrently reading the string, each thread will read 1000 numbers of string,has been inserted from the same underlying single queue instance.

I have tested 3 rounds for each version:

Version	Round 1	Round 2	Round 3
	Get	Get	Get
JSR 166 ConcurrentHashMap	AVG: 3440.396	AVG: 5284.996	AVG: 3259.3
Cliff NonBlockingHashMap	AVG: 2935.274	AVG: 3158.488	AVG: 7471.13
	Put	Put	Put
JSR 166 ConcurrentHashMap	AVG: 36421688.57	AVG: 11300718.2	AVG: 8250990.84
Cliff NonBlockingHashMap	AVG: 91931746.36	AVG: 63642538.64	AVG: 57540781

Note:

1. All the results was NanoSeconds.

2. Don't forget if running with JSR jar please add one JVM parameters: "-Xbootclasspath/p:D:/tmp/distribution/lib/concurrent/jsr166.jar" and modify the path to yours.

3. As HashMap will pre-consume memory. So please remember add "-Xmx" JVM parameters.

[Summary]

1. JSR166 ConcurrentHashMap was comparatively better than Cliff NonBlockingHashMap in the system with little core cpu.

2. If you need most was read operation, you can try Cliff NonBlockingHashMap. As the scalability and performance of read was it's strength.

Very important part:

As the condition was limited, so I couldn't find the powerful machine (Like 16Core or more). So the comparision may not be fair to Cliff's implementation. And Windows Azul has it's build-in JVM. So the behavior will be changed if moved to their machines.

[Reference]

1. Doug Lea : http://gee.cs.oswego.edu/dl/concurrency-interest/

2. Bootclasspath: http://www.tedneward.com/files/Papers/BootClasspath/BootClasspath.pdf

3. JSR 166.jar: http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166.jar

4. Cliff's blog:http://www.azulsystems.com/blogs/cliff

5. Cliff none-blocking collection project: http://sourceforge.net/projects/high-scale-lib/

New JSR166 LinkedBlockingQueue VS JDK 1.6 u21 LinkedBlockingQueue

Posted by andy.song Nov 19, 2010

[Introduction]

Everyone current only use JDK provided concurrent library as that can meet many situations. But if you need more scalable one, I think maybe Doug Lea maintained new version of concurrent library would be your first considerations.

Today we will compare two version LinkedBlockingQueue performance

[Lab Environment]

Laptop: Dell E6400

CPU: Intel Core 2 (with 2 core)

Memory: 4G DDR2

JDK: Sun JDK 1.6 u21

//Forget: add OS information

OS: Windows XP sp3

[Testing Senario]

start 16 threads currently write string (UUID), each thread will write 1000 numbers of string,into the underlying single queue instance, and at them same time start 16 threads concurrently reading the string, each thread will read 1000 numbers of string,has been inserted from the same underlying single queue instance.

I have tested 3 rounds for each version:

Version	Round 1	Round 2	Round 3
	Get	Get	Get
JSR 166 LinkedBlockingQueue	AVG: 259435.8	AVG: 209333.5	AVG: 261711.1
JDK LinkedBlockingQueue	AVG: 805892.1	AVG: 381774.3	AVG: 477590.5
	Put	Put	Put
JSR 166 LinkedBlockingQueue	AVG: 47124513.2	AVG: 94489344.8	AVG: 289728496
JDK LinkedBlockingQueue	AVG: 240820336.6	AVG: 268540401	AVG: 108976599

Note: All the results was NanoSeconds. Don't forget if running with JSR jar please add one JVM parameters: "-Xbootclasspath/p:D:/tmp/distribution/lib/concurrent/jsr166.jar" and modify the path to yours.

[Summary]

If you do need high scalability for Queue implementation we think JSR 166 LinkedBlockingQueue will be first choice for you.

We think one important change had already made into LinkedBlockingQueue but still not includes with latest JDK 1.6 library.

"http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/main/java/util/concurrent/LinkedBlockingQueue.java?view=log reversion 1.54

move value allocation outside the lock scope."

make the two versions behavior differently.

Next round we will compare Dr.Click Cliff LockFreeHashtable vs Doug Lea ConncurrentHashMap

[Reference]

1. Doug Lea : http://gee.cs.oswego.edu/dl/concurrency-interest/

2. Bootclasspath: http://www.tedneward.com/files/Papers/BootClasspath/BootClasspath.pdf

3. JSR 166.jar: http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166.jar

JDK LinkedBlockingQueue VS Amino LockFreeBlockQueue

Posted by andy.song Nov 18, 2010

[Introduction]

Recently we have done one scale backend messaging processing system. Underlying the thread/concurrent design need one Queue implementation. And I found one interest replacement one "Amino Project" "LockFreeBlockQueue". It mentioned this implementation outperformed than jdk "ConcurrentLinkedQueue".http://www.infoq.com/articles/scalable-java-components. So I think maybe it will help us improve the whole system performance. And this project was based one IBM jdk, so I migrated it to SUN jdk.

[Lab Environment]

Laptop: Dell E6410

CPU: Intel i5 (with 4 core)

Memory: 4G DDR3

JDK: Sun JDK 1.6 u18

//Edit: Forget add OS information

OS: windows XP sp3

[Testing Senario]

start 16 threads currently write string (UUID), each thread will write 200 numbers of string,into the underlying single queue instance, and at them same time start 16 threads concurrently reading the string, each thread will read 200 numbers of string,has been inserted from the same underlying single queue instance.

I have tested 5 rounds:

Version

Round1

Get

Round2

Get

Round3

Get

Round4

Get

Round5

Get

LinkedBlockingQueue(JDK)

Max: 97384444

Min: 1117

Avg: 249965.7841

Max: 87218068

Min: 1117

Avg: 310944.5438

Max: 78754143

Min: 1117

Avg: 243207.1484

Max: 79043007

Min: 1117

Avg: 225175.4475

Max: 79446969

Min: 1117

Avg: 185973.6063

LockFreeBlockQueue(Amnio)

Max:94351097

Min:1396

Avg:233762.4281

Max: 143679993

Min: 1117

Avg: 520364.2

Max: 87081459

Min: 1117

Avg: 99177.67844

Max: 79612632

Min: 1117

Avg: 223982.8916

Max: 90056697

Min: 1117

Avg: 280250.0991

Note: All the results was NanoSeconds. Only calculate the read operations rather than write operations. Later will add the write operation statistics.

[Summary]

Doug Lea had done pretty jobs on the LinkedBlockingQueue, it's performance was stable and could meet most kinds of situation even in high concurrent situations.

Amino project wasn't so bad from the performance testing. But it cannot tell have outporformed than the Sun JDK implementation. But I didn't test on the IBM jdk env, maybe the situation will be changed.

And one important change had already made into LinkedBlockingQueue but still not includes with latest JDK 1.6 library.

http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/main/java/util/concurrent/LinkedBlockingQueue.java?view=log reversion 1.54

move value allocation outside the lock scope.

Later we will testing the latest version of Doug lea

[Reference]

1. Doug Lea : http://gee.cs.oswego.edu/dl/concurrency-interest/

2. Amino Project: http://sourceforge.net/projects/amino-cbbs/files/cbbs/1.0/amino-java-src-1.0.tar.gz/download?use_mirror=jaist

How will we reach the "Linear Parallelism" (Took JBPM as examples)

Posted by andy.song Aug 16, 2010

[Introduction]

Many my workmates and viewers ask me about the detail about the "Parallelism" I mentioned in my previous blog post http://community.jboss.org/people/andy.song/blog/2010/08/06/new-way-scale-upout-the-jbpm-without-jbpm-clustering-jbpm-sharding. Because what they think and practice it is impossible from their perspectives, and without the details they even cannot accept the "JBPM sharding" solutions. Then I realized I hided too much things, without that level details something about the new ideas could not be practicable.

[Glossary]

Parallelism

Here I will not lend some expert explaination, I will use my little poor words skill (). If we make the whole system behavior (SLA, Stability) is log（N) same when dealing single request or multiple requests. Simplifies with the examples:

Type	Single Request (Total Time)	10 Requests (Total Time)	100 Request (Total Time)
Parallelism	10 Seconds	10.XX seconds	11.XX seconds
None Parallelism	10 Seconds	15 seconds	30 seconds

Type

Single Request

(Total Time)

10 Requests

(Total Time)

100 Request

(Total Time)

Parallelism

10 Seconds

10.XX seconds

11.XX seconds

None Parallelism

10 Seconds

15 seconds

30 seconds

Linear Parallelism

[Details]

Normal JBPM operation contains 3 steps:

1. Start New Processes and signal to the first nodes

2. Do the business logics with each node defined.

3. Singal the process continue execution to next nodes

So here we will fundmentally two different choices will come up, others will be little variant with the following two.

1. Everything is conceptual inside-bpm operations

2. Everything is conceptual outside-bpm operations

So which was the parrallelism version? Of course "second" was the one we want.

[Pros]

1. Any invocation/interaction with JBPM should be break-up to two pieces: 1. Trigger , 2. Execution. Trigger leverage with asynchronous messaging protocol, jboss does great jobs on that part so HornetQ/Jboss Message is your first choice.

2. The each invocation/interaction input could be searilized to some text format stream, so business logic will only meet some identifiers stream and true business state information will load from storage (DB, Cache, etc).

3. The same datacenter for jBPM and execution logics will be important, otherwise the network partition will become some bottelnecks.Please ref: http://highscalability.com/blog/2010/8/12/think-of-latency-as-a-pseudo-permanent-network-partition.html

4. Transaction ISOLATION prefer the eventually consistency, rather than read_committed. As snapshot information will be largely used in whole process executions.So tranditional transaction isolation will become the stone-block for parallelism.

[Cons]

1. Easy model the transparancey layer wrapper jbpm. So the JBPM sharding will become possible implictly.

2. Object seralization is not acceptable anymore only applicable is identiifer of object seralization.

3. So the individual request processing a little longer than none-parallelism mode.

[Changllenges]

1. Select correct phase for split-up is more important, so SEDA model could help you re-think your system architecture. Please ref: http://community.jboss.org/people/andy.song/blog/2010/08/12/how-jboss-products-contribute-to-seda-models

2. Select correct seralization mode is also important, TEXT/String will be applicable solutions. But memory consumption for sting in java is very bad, you should try your best banlance the memory consumption and parallelism maximum.

3. Only could be put inside the same location Data Center. If cross network locations, the pattern is not applicable. We may need other tools for help like Inter-Data Grid service or Clouding services.

4. You should build your own diagnostic tools for tracing, debuging and deployments. But that means you can learn more, good thing isn't it from DEV perspectives. But managers may hate that.

So right now you understand a little more detail about my previous two blog messages regarding parallelism? Some simple principles comes online:

1. try your best split/isolations.

2. seralization is not always your victim.

3. network communciation isn't always your victim.

4. synchronous or in-jvm execution isn't always means the parellelism

5. research, research, .....

6. reading source codes is good habit

7. slowness, none-parellelsim isn't tools fault, actually is your mind.

8. you should always has your own libraries, because your always don't have enough time for practices.

9. try your best balance the performance(SLA), parellelism, and high avaibilities.

JBossDeveloper

Twins Father

New way traversing the object graph of java efficiently

How will you convince your boss about the ROI performance tuning?

One more hidden goatches for messaging system design

Ephemeral Port and Windows MaxUserPort revisit

Several performance tuning for web services

Sql Group By Common Trap Review

One general solution replace the UUID

The "Time" of your machine can be trust?

The 6 hidden gotchas in messaging system design

Will Java "New" keyword putting inside or outside synchronized block?

Thread Safe Collection Framework interesting notes

Doug Lea JSR166 ConcurrentHashMap vs Click Cliff NonBlockingHashMap

New JSR166 LinkedBlockingQueue VS JDK 1.6 u21 LinkedBlockingQueue

JDK LinkedBlockingQueue VS Amino LockFreeBlockQueue

How will we reach the "Linear Parallelism" (Took JBPM as examples)

[Details]

Actions

Filter Blog