Skip navigation

Normally we only have two ways to traversing the object graph of java.


(Assume the java object we talking is conform to java bean specification getXXX/setXXX)


* Prototype java method calling

  For instance:

       A a = new A();

      String name = a.getName();


  Benefit:

      * Fast and efficient

      * No need arbitrary cast

  Drawback:

      * Only static binding and cannot dynamic specify

      * Maintance effort if high and may change regarding to object instance change.


* Java reflection

  For instance;

  Method method = A.getClass().getMethod("getName());

  A a = new A();

  String name = (String)method.invoke(null, a);


  Benefit:

       * Can be used in the flexible way

  Drawback:

       * Slowly and may cause PERM generation gc times.

       * Type safety not be promised.


So if you want to traversing the object graph of java only way - balance the effiency and performance - is prototype java method. But many of us cannot affort the maintainence effort.


Right now how do we do ? We can leverage the scriptable tools. Here we have three choices: JAXB + Xml Parsing, JXpath, MVEL.

* JAXB + Xml Parsing

  First use JAXB deseralize the object into XML, and then use parsing tool interpret them.

* JXpath

  Use the simliar XPath methodology interpret the object graph path. It was created by Apache.

* MVEL

  The scriptable container can execute the documented java programming.

At the end we select the MVEL as the library small and the execution spee is very fast and quite the little slower than JVM. For instance:

A a = new A();

String name = (String)MVEL.eval("getName()", a);


But here we still have several chanllenges for object traversing:

First, how can we get know the detail of one object.

The answer is annotation, here we can leverage XmlType.


@XmlAccessorType(XmlAccessType.FIELD)

@XmlType("A", propOrder={ "name",

"title"

})

public class A {


private String name;

private String title;


}


Then we can use the A.getClass().getAnnotation(XmlType.class).getProperOrder(). Then you will know the object fields detail.

But several things need to be handled:

* Class hierachy. Class.getSuperClass()

* Static fields. Class.getDeclaredFields()

* All detail collection variables need have a value. Because that's limitation for java reflection


So you can generate the fields collection with the following structures:

public class AutoFieldEntry {


    private String simpleFieldName;
    private String typeInfo;


    private String fullFieldPath;
    private String xpathFieldPath;
    private String mvelFieldPath;


    private String fieldReferenceName;


    *** ***

}


Then you will have the whole object structures.


Second, how can we avoid the arbitrary object casting.

* Generics help


public T getValue(String simpleFieldName, Class<T> type, A container) {


    AutoFieldEntry entry = getFieldsEntry(simpleFieldName);

   if(entry.getTpeInfo().equals(type.getName())) {

        return (T)MVEL.eval(mvelFieldPath, container);

   } else {

        throw new Exception("The type you want don't match with the class definition");

   }


}


So after those things done, you have the object atrribute arbitrary access container. It reduce the reflection disadvantages and usage was more simlar with java propotype method calling with acceptable peformance ahead.

 

Example for understanding:

 

Docuementation.PNG


The suitable areas:

1. Data binding. Struts, UI drawing, Rule engine or other areas.


Later we will open-source the codes. So you can refer from that.


Reference:

1. JXPath: http://commons.apache.org/jxpath/

2. MVEL:  http://mvel.codehaus.org/

3. Visitor for object graph traversing: http://stackoverflow.com/questions/3361608/java-object-graph-visitor-library

We often meet this situation "Boss always need balance between technical initiatives (especially performance tuning) and business feature". The account manager or sales will always say more features more revenue. How do we say competitive comments ?


Here I want to post one interesting point, performance tuning can reduce the cost of everyday and greenfield. As all of us know the normal state of computer quite like bear hibernate, so the cost for that state is very low and the voltage usage of power is low either. But if something want to running in the computer the voltage usage will go-up. So the goal of performance tuning want to keep the go-up state lifetime shortest. So please do the following steps:


1. First, get the average voltage usage of normal state of your server.

2. Second, without tunning get the average voltage usage of load running state of your sever.

3. Third, with turning get the average voltage usage of load running state of your sever.

4. Fourth, conclude with each milliseconds performance gainning = # number of voltage reduce / hour

5. Fifth, calculate the money reduced for the voltage saving.


Then later you would say like this "If we could reduce the performance by XXX milliseconds then we can approximately reduce XXX voltage / hour". And each voltage will cost # dollar, so we approximately reduce # dollar totoally 1day. And also by reduce # of cs call, we also reduce # dollar totally 1 day.


At that stage, I think your comments and point will be easily accepted by your boss and sales. Because all of us the purpose is reduce cost and enlarge revenue (Money).

  • Keep Long Live Consumer or Producer healthy

     Most messaging system suggest keep reusing the constructed connection and reduce the significant overhead as you can. But many of them hadn't given us the solution how to keep the connection or other underlying dependent object healthy, so if you didn't take care of that you would find many unexpected behavior happen. So solution could be the following two:

 

    • Reactive Mode

                You are sure the connection or other underlying depend object unhealthy in sometime, so you will regular check each connection or underlying depend object usages. After one connection or underlying depend object is not used from a while, you can reload or reinitialize that connection.

                Benefit:  It can meet the connection healthy requirement and no need big moving for your infrastructure

                Drawback: The point for reloading or reinitilization was hard to decide. And almost failed in many cases

    • Proactive Mode

                Many messaging system are building upon the traditional TCP/IP, so that means the underlying was one tcp (socket) connection. So keep the connection or underlying depend object for messaging system would transfter to keep tcp (socket) connection healthy. So here you can use heartbeat methodology.

                Benefit: It can fullfil the healthy requirement and meet many situations.

                Drawback: Big moving for your infrasturcture

 

 

      So the goal for many messaging system in the future is reduce the overhead for create connection or underlying depend object, and provide the configurable way or transparently keep the connection or underlying depend object healthy fullfil the requirement for the long live (or large scalable messaging driven system)

[Ephemeral Port]

 

Ephemeral port is one virtual port concept, it includes the 4 parts {client_ip, client_port, server_ip, server_port}

 

So if server can provide 5000 port number for tcp connection, that means total sever can allow the maximumconnections was:

# of client * 5000

 

Below was referenced from: http://www.ncftp.com/ncftpd/doc/misc/ephemeral_ports.html


Ephemeral ports are temporary ports assigned by a machine's IP stack, and are  assigned from a designated range of ports for this purpose.  When the connection  terminates, the ephemeral port is available for reuse, although most IP stacks  won't reuse that port number until the entire pool of ephemeral ports have been  used.  So, if the client program reconnects, it will be assigned a different  ephemeral port number for its side of the new connection.

 

 

[Windows MaxUserPort]

 

Windows default provide the limition to be 5000 in windows 2003 and 15000 in windows 2003.

 

 

[Consideration]

Question #1:  Will I need always tuned the port range?

 

Answer #1:  Yes and except you know your client will be intranet connections. And each connection is short-around request and response so can finish very quickly.  And time_out setting you should also consideration.

 

For examples:

10000 client of intranet            Server port range is 5000

If each connection can be finished below 500 ms, so you won't need tuned the port range. Otherwise you need.

  • Turn on Gzip for web service request and response

       When constructing client stub instances, we can do the following things: (Assume we use CXF or JAWS-ri)

 

       AService portyType = super.getPort(AServiceQName, AServicePortType.class);

        BindingProvider bp = (BindingProvider) portType;
        bp.getRequestContext().put(BindingProvider.ENDPOINT_ADDRESS_PROPERTY,
                "<service address>");

 

        Map<String, List<String>> httpHeaders = new HashMap<String, List<String>>();

        // For request
        httpHeaders.put("Content-Encoding", Collections.singletonList("gzip"));

        // For response
        httpHeaders.put("Accept-Encoding", Collections.singletonList("gzip"));

 

        bp.getRequestContext().put(MessageContext.HTTP_REQUEST_HEADERS, httpHeaders);

 

        It can reduce a lots network bandwidth and faster transfer flow in the network But the server need do some configuration change can accept or

        recognize gzip content.

 

  • Cache | Pool the service stub handler

        For jbossws, CXF, jaws-ri you can cache or pool the client service stub handler, you can get benefit from the JAXB cache. So the total

        performance will improve a lot. But XFire you cannot as it will have perm generation contiue gc issues.

 

        And cache | pool has two choices:

        *  Session based

           Invovle http session, it more like pool

        *  Dispather

           Invovle some load balancing, so it more like cache

 

  • Reduce as many network communication as you can

        * Coarse-grained api provided

        But that not 100% true. For example: 99% customer need one field of one service model, at that time corase-grained or fine-grained ? But in

        general coarse-grained api was the goal

        * Know your capacity

        Define the box for your services, so that means know your server capacity. Load balancing and partitionning prepare from the very beginning.

 

  • Service aggregator 

       * Multiple correlated service request can be aggregate together and consider gateway service.

         So the average performance of mutiple intranet communications better than the average of multiple internet communications.

 

  • Tuning your enviornment

       * Turn on the large memory page support

          Reduce the page missing and avoid cpu swap the phsyical memory and disk.

       * Turn on NUMA, turn of hyperthreading, turn Tubor boost

       * Configure the JVM with the folloing:

          ** Turn on support NUMA: -XX:+UseNUMA -XX:NUMAPageScanRate=0   -XX:-UseAdaptiveNUMAChunkSizing

          ** Give enough memory with 64bit: -Xmx16g -Xms16g -Xmn4g

          ** Turn on parallel old gc if support: -XX:ParallelGCThreads=50  -XX:+UseParallelOldGC

          ** Turn on large page support: -XX:LargePageSizeInBytes=64m

    • Column Restriction

    If you choose to also include expressions that reference columns but do not include an aggregate function, you must list all columns you use this way in the GROUP BY clause.


    One of the most common mistakes is to assume that you can reference columns in nonaggregate expressions as long as the columns come from unique rows.

     

     

    For example:

    Table

      Student


    Colunms:

      ID                Numeric   Primary key

      Name          String       Not Null

      Age             Int            Not Null

      Class          String       Not Null

      Score          Int            Not Null

    So if we want get the average score and total class selected for each student include ID, name


    Failed query:

    select ID, Name, Count(*) as NumOfClass, Sum(score)/numOfClass

    from Student

    group by ID


    Correct query:

    select ID, Name, Count(*) as NumOfClass, Sum(score)/numOfClass

    from Student

    group by ID, Name


    More reasonable query:

    select Name, Count(*) as NumOfClass, Sum(score)/numOfClass

    from Student

    group by Name


    • Grouping on Expressions

    One of the most common mistakes is to attempt to group on the expression you create in the SELECT clause rather than on the individual columns. Remember that the GROUP BY clause must refer to columns created by the FROM and WHERE clauses. It cannot use an expression you create in your SELECT clause.

     

    For example:

    Wrong Sql:

    SELECT Customers.CustLastName || ', ' ||
    Customers.CustFirstName AS CustomerFullName,
    Customers.CustStreetAddress || ', ' ||
    Customers.CustCity || ', ' ||
    Customers.CustState || ' ' ||
    Customers.CustZip AS CustomerFullAddress
    MAX(Engagements.StartDate) AS LatestDate,
    SUM(Engagements.ContractPrice)
    AS TotalContractPrice
    FROM Customers
    INNER JOIN Engagements
    ON Customers.CustomerID =
    Engagements.CustomerID
    WHERE Customers.CustState ='WA'
    GROUP BY CustomerFullName,
    CustomerFullAddress


    Correct Sql:

    SELECT CE.CustomerFullName,
    CE.CustomerFullAddress,
    MAX(CE.StartDate) AS LatestDate,
    SUM(CE.ContractPrice)
    AS TotalContractPrice
    FROM
    (SELECT Customers.CustLastName || ', ' ||
    Customers.CustFirstName AS CustomerFullName,
    Customers.CustStreetAddress || ', ' ||
    Customers.CustCity || ', ' ||
    Customers.CustState || ' ' ||
    Customers.CustZip AS CustomerFullAddress,
    Engagements.StartDate,
    Engagements.ContractPrice
    FROM Customers
    INNER JOIN Engagements
    ON Customers.CustomerID =
    Engagements.CustomerID
    WHERE Customers.CustState ='WA')
    AS CE
    GROUP BY CE.CustomerFullName,
    CE.CustomerFullAddress


    It referenced from "SQL Queries for Mere Mortals, Second Edition"

    Most distrubution system will select UUID as the object indentifier. But the length of UUID would be 36 chars and in Java it need 72 bytes persist and it was randomly and no-sequential inside so it will hurt the search/lookup within storage. Even some system provide some sequential UUID support, but it binding with system (specialization). So here I want post new way replace that:

     

    The new ID will be includes the following parts:

    1. Sever identifier

    2. Component identifier

    3. Thread identifier    (need consider thread group | thread pool either)

    4. Time          (it need be judged by your system loads, seconds, milliseconds, etc)

     

    Note: Time should be consider centralized synchronized. Otherwise it's not universal unique. Please refer:

    http://community.jboss.org/people/andy.song/blog/2010/12/09/the-time-of-your-machine-can-be-trust

     

    For examples:

    1                   2                        111           6183640443687

    Sever           Component           Thread      NanoTime

     

    So totoal 19 bits, and its numeric values. And in general situation is has sequence, so use that kinds of id will improve a lot your system performance.

     

    It's not new, Twitter current will leverage that ideas. One of their engineer open source that library (snowflake) in:

    https://github.com/twitter/snowflake/tree/1cd0af14db9efa7972a9ed605661a7b70962914a/src

    OS will provid your the time support for you automatically, so you already used to that. But how about you go to distrubuted computing senarios, which means multiple servers works for large chunk requests, can you trust each machine time? The answer is "no", as time for each machine depends on the electronic power so some may go faster some may go slower. So that may introduct some expected behaviors if you don't handled that.

     

    So from machine perspective the time synchronization come to stage, for example: Windows Time Service.

     

    But how about from software perspective ? The answer is "centralized or distributed Time service".

     

    Time Service.PNG

     

    The chanllenges will be:

    * Network Communication Latency

    * The scale level you want to achieve (Seconds, Milliseconds, Nanoseconds, etc)

    * Still need machine time service coordination because how about you want to do geographical distribution

    * SPOF invovled ? Maybe, but you can overcome by some other design.

    1.  The restriction of ephemeral port range in OS when designning long running Queue|Topic consumer

    2.  TCP time_wait and close_wait impact on the create new connections with Messaging Server

    3.  Due to producer or consumer were client from TCP/IP design, so the close one tcp/ip connection wasn't active close. That will hurt the messaging server capacities.

    4.  Equal messaging size or freedom messaging size? Equal messaging size will decrease the message server persistence fragment will bring you more benefit when large concurrent messaging going-in and -out.Compressed or uncompressed? Compressed with some overhead with client perspective but gain more benefit from Messaging Server perspective, so it deserved to compressed messages as you can.

    5.  General Messaging Header need consider, recommend:

         *   Source

         *   Destination

         *   Version

         *   SendTime

         *   ReceiveTime

         *   TransactionId or CorrelationId

    6.  Messages compatibility.

     

    Do you have more advices?

    [Introduction]

    Recently I was joined one interesting discussion with my colleagues "What's details for "new" operations of Java? Putting "new" inside or outside synchronized block?". I surprisingly found nobody could telling correctly about that. So here I listed the possible semantics about "new".


    A a = new A();


    1. Loading class definition of Class "A"

        * Using class loader of current thread trying to load class "A"

        * Reading the class information into memory (it was "Native Memory" or "VM heap" ? - My answer will be both)

        * Transforming the class information to runtime JIT codes let JVM could executing them.

    2. Calculate the initial memory consumption for the instance of class "A"

        * Calculation contains:

          ** Static Variables

          ** None-static Variables

          note: Please remember the differences between reference and object. Otherwise you cannot understanding the concept.

    3. Analysis the hiercharies of Class "A". Then loop the step 4, 5, 6 until all the class of hiercharies were touched:


    4. Use the size be calculated by previous step and use system level "C" function "malloc" allocate the memeory of OS.

        * Malloc is not thread safe except several flag need to be set

        * Malloc implementation is OS depend.

        note: Sun already cover those issues. 99% is thread safe.

    5. Create the reference in the stack, and make the reference point to the Object.

    6. Calling the object constructor method and do initialization

        * Static initialization

        * None-static initialization

        * Constructor body

        note: any of above path could be none-thread safe victim.


    So here the answer for "new" is thread safe or not is uncentainty or Dependes class/object instantiation need to be thread-safe.

     

    [Back to topic]

    Then the strategy for putting "new" operator inside or outside synchronized block will be:

    * Always putting the "new" operator inside the synchronized block except when

    * Only if you understand or design class completely and already make sure the whole path is thread safe, you can put the "new" operator outside synchronized block.

     

     

    References:

     

    "http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/main/java/util/concurrent/LinkedBlockingQueue.java?view=log reversion 1.54

    move value allocation outside the lock scope."



    1. In many situations, the JDK concurrent collection framework was your fist choice except:

    • IBM machine
    • Window Azue
    • Other JDK vendor

    2. Library wasn't common or portable when the high scalability or performance is required. You need better understand your mahcine or system.

    3. Entry, key data structure defined in many collections, node allocation was already in guarded. So make sure not put the "new" operation inside the lock scope.

    4. Keep using the latest version of JDK concurrent library as you can. They always keep tracing the best result offer for you.

    5. Little trick can bring huge return, for instance:

        Do you know why synchronized hash map still better than hash table in some conditions? As HashMap use the "==" first then try "Equals", so that's reason why concurrent library provide ConcurrentHashMap than ConcurrentHashTable.

    6. Please evaluate your requirement, read only, write only, read most write rarely, read and write almost equals. Then seek better solutions for you. Not blendly trust other person testing results. (Of course, you should suspect my results either )

    7. JDK leverage CAS resolving the consistency of memory and it was the key break through for our concurrent programming toolkits. But it has limitations, so STM (Software Transaction Memory), HTM  (Hardware Transaction Memeory) will help you resolve the consistency in high concurrent env.

    8. Do you consider the memory fragment after you leverage the collections especially you have large chunk of data put/get/remove and each item consume large memory size either? If true, please isolate key word and information storage. One interesting implemenation for your reference Clojure's persistence vector or map.

     

    Any thoughts from you? Hope we can provide the long list for our future reference! That's my dream.

    [Introduction]

    Concurrent version of HashMap was laregely used in the multiple thread programming. But most of them based on JDK, how about other alternative?

    Today we will compare Click Cliff version of NonBlockingHashMap vs JDK ConcurrentHashMap

     

    [Lab Environment]

     

    Laptop: Dell E6400

    CPU: Intel Core 2  (with 2 core)

    Memory: 4G DDR2

    JDK: Sun JDK 1.6 u21

     

    //Forget: add OS information

    OS: Windows XP sp3

     

     

    [Testing Senario]

     

    start   16 threads currently write string (UUID), each thread will write 1000  numbers of string,into the underlying single queue instance, and at  them  same time start 16 threads  concurrently reading the string, each  thread will read 1000 numbers of  string,has been inserted from the same  underlying single queue instance.

     

    I have tested 3 rounds for each version:

     

    VersionRound 1
    Round 2
    Round 3

    GetGetGet
    JSR 166 ConcurrentHashMapAVG: 3440.396AVG: 5284.996AVG: 3259.3
    Cliff NonBlockingHashMapAVG: 2935.274AVG: 3158.488AVG: 7471.13

    PutPutPut
    JSR 166 ConcurrentHashMapAVG: 36421688.57AVG: 11300718.2AVG: 8250990.84
    Cliff NonBlockingHashMapAVG: 91931746.36AVG: 63642538.64AVG: 57540781

     

    Note: 

    1. All the results was NanoSeconds.

    2. Don't forget if running with JSR jar please add one JVM parameters: "-Xbootclasspath/p:D:/tmp/distribution/lib/concurrent/jsr166.jar" and modify the path to yours.

    3. As HashMap will pre-consume memory. So please remember add "-Xmx" JVM parameters.

     

    [Summary]

    1. JSR166 ConcurrentHashMap was comparatively better than Cliff NonBlockingHashMap in the system with little core cpu.

    2. If you need most was read operation, you can try Cliff NonBlockingHashMap. As the scalability and performance of read was it's strength.

     

    Very important part:

    As the condition was limited, so I couldn't find the powerful machine (Like 16Core or more). So the comparision may not be fair to Cliff's implementation. And Windows Azul has it's build-in JVM. So the behavior will be changed if moved to their machines.

     

     

    [Reference]

    1. Doug Lea : http://gee.cs.oswego.edu/dl/concurrency-interest/

    2. Bootclasspath: http://www.tedneward.com/files/Papers/BootClasspath/BootClasspath.pdf

    3. JSR 166.jar: http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166.jar

    4. Cliff's blog:http://www.azulsystems.com/blogs/cliff

    5. Cliff none-blocking collection project: http://sourceforge.net/projects/high-scale-lib/

    [Introduction]

    Everyone current only use JDK provided concurrent library as that can meet many situations. But if you need more scalable one, I think maybe Doug Lea maintained new version of concurrent library would be your first considerations.

     

    Today we will compare two version LinkedBlockingQueue performance

     

    [Lab Environment]

     

    Laptop: Dell E6400

    CPU: Intel Core 2  (with 2 core)

    Memory: 4G DDR2

    JDK: Sun JDK 1.6 u21

     

    //Forget: add OS information

    OS: Windows XP sp3

     

     

    [Testing Senario]

     

    start  16 threads currently write string (UUID), each thread will write 1000 numbers of string,into the underlying single queue instance, and at them  same time start 16 threads  concurrently reading the string, each thread will read 1000 numbers of  string,has been inserted from the same underlying single queue instance.

     

    I have tested 3 rounds for each version:

     

    VersionRound 1
    Round 2
    Round 3
    GetGetGet
    JSR 166 LinkedBlockingQueueAVG: 259435.8AVG: 209333.5AVG: 261711.1
    JDK  LinkedBlockingQueueAVG: 805892.1AVG: 381774.3AVG: 477590.5
    PutPutPut
    JSR 166 LinkedBlockingQueueAVG: 47124513.2AVG: 94489344.8AVG: 289728496
    JDK LinkedBlockingQueueAVG: 240820336.6AVG: 268540401AVG: 108976599

    Note:  All the results was NanoSeconds. Don't forget if running with JSR jar please add one JVM parameters: "-Xbootclasspath/p:D:/tmp/distribution/lib/concurrent/jsr166.jar" and modify the path to yours.


    [Summary]

    If you do need high scalability for Queue implementation we think JSR 166 LinkedBlockingQueue will be first choice for you.

     

    We think one important change had already made into LinkedBlockingQueue but still not includes with latest JDK 1.6 library.

    "http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/main/java/util/concurrent/LinkedBlockingQueue.java?view=log reversion 1.54

    move value allocation outside the lock scope."

    make the two versions behavior differently.

     

    Next round we will compare Dr.Click Cliff LockFreeHashtable vs Doug Lea ConncurrentHashMap

     

     

    [Reference]

    1. Doug Lea : http://gee.cs.oswego.edu/dl/concurrency-interest/

    2. Bootclasspath: http://www.tedneward.com/files/Papers/BootClasspath/BootClasspath.pdf

    3. JSR 166.jar: http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166.jar

     

    [Introduction]

    Recently we have done one scale backend messaging processing system. Underlying the thread/concurrent design need one Queue implementation. And I found one interest replacement one "Amino Project" "LockFreeBlockQueue". It mentioned this implementation outperformed than jdk "ConcurrentLinkedQueue".http://www.infoq.com/articles/scalable-java-components. So I think maybe it will help us improve the whole system performance. And this project was based one IBM jdk, so I migrated it to SUN jdk.

     

    [Lab Environment]

     

    Laptop: Dell E6410

    CPU: Intel i5  (with 4 core)

    Memory: 4G DDR3

    JDK: Sun JDK 1.6 u18

     

    //Edit: Forget add OS information

    OS: windows XP sp3

     

     

    [Testing Senario]

     

    start 16 threads currently write string (UUID), each thread will write 200 numbers of string,into the underlying single queue instance, and at them same time start 16 threads concurrently reading the string, each thread will read 200 numbers of string,has been inserted from the same underlying single queue instance.

     

    I have tested 5 rounds:

    Version

    Round1

    Get

    Round2

    Get

    Round3

    Get

    Round4

    Get

    Round5

    Get

    LinkedBlockingQueue(JDK)

    Max: 97384444

    Min:  1117

    Avg: 249965.7841

    Max: 87218068

    Min:  1117

    Avg: 310944.5438

    Max: 78754143

    Min:  1117

    Avg: 243207.1484

    Max: 79043007

    Min:  1117

    Avg: 225175.4475

    Max: 79446969

    Min:  1117

    Avg: 185973.6063

    LockFreeBlockQueue(Amnio)

    Max:94351097

    Min:1396

    Avg:233762.4281

    Max: 143679993

    Min:  1117

    Avg: 520364.2

    Max: 87081459

    Min:  1117

    Avg: 99177.67844

    Max: 79612632

    Min:  1117

    Avg: 223982.8916

    Max: 90056697

    Min:  1117

    Avg: 280250.0991

     

    Note: All the results was NanoSeconds. Only calculate the read operations rather than write operations. Later will add the write operation statistics.

     

    [Summary]

    Doug Lea had done pretty jobs on the LinkedBlockingQueue, it's performance was stable and could meet most kinds of situation even in high concurrent situations.


    Amino project wasn't so bad from the performance testing. But it cannot tell have outporformed than the Sun JDK implementation. But I didn't test on the IBM jdk env, maybe the situation will be changed.

     

    And one important change had already made into LinkedBlockingQueue but still not includes with latest JDK 1.6 library. 

    http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/main/java/util/concurrent/LinkedBlockingQueue.java?view=log reversion 1.54

    move value allocation outside the lock scope.

     

    Later we will testing the latest version of Doug lea

     

    [Reference]

    1. Doug Lea : http://gee.cs.oswego.edu/dl/concurrency-interest/

    2. Amino Project: http://sourceforge.net/projects/amino-cbbs/files/cbbs/1.0/amino-java-src-1.0.tar.gz/download?use_mirror=jaist

    [Introduction]

     

    Many my workmates and viewers ask me about the detail about the "Parallelism" I mentioned in my previous blog post http://community.jboss.org/people/andy.song/blog/2010/08/06/new-way-scale-upout-the-jbpm-without-jbpm-clustering-jbpm-sharding.  Because what they think and practice it is impossible from their  perspectives, and without the details they even cannot accept the "JBPM  sharding" solutions. Then I realized I hided too much things, without  that level details something about the new ideas could not be practicable.

     

    [Glossary]

     

    • Parallelism

    Here I will not lend some expert explaination, I will use my little poor words skill ().  If we make the whole system behavior (SLA, Stability) is log(N) same  when dealing  single request or multiple requests. Simplifies with the  examples:

     

    Type

    Single Request

    (Total Time)

    10 Requests

    (Total Time)

    100 Request

    (Total Time)

    Parallelism10 Seconds10.XX seconds11.XX seconds
    None Parallelism10 Seconds15 seconds30 seconds

     

     

    • Linear Parallelism

    Linear Parallelism.PNG

    [Details]

     

    Normal JBPM operation contains 3 steps:

    1. Start New Processes and signal to the first nodes

    2. Do the business logics with each node defined.

    3. Singal the process continue execution to next nodes

     

    So here we will fundmentally two different choices will come up, others will be little variant with the following two.

     

    1. Everything is conceptual inside-bpm operations

    N-Parralism.PNG

    2. Everything is conceptual outside-bpm operations

     

    L-Parralism.PNG

     

     

    So which was the parrallelism version? Of course "second" was the one we want.

     

    [Pros]

     

    1. Any invocation/interaction with JBPM should be break-up to two pieces: 1. Trigger , 2. Execution. Trigger leverage with asynchronous messaging protocol, jboss does great jobs on that part so HornetQ/Jboss Message is your first choice.

     

    2. The each invocation/interaction input could be searilized to some text format stream, so business logic will only meet some identifiers stream and true business state information will load from storage (DB, Cache, etc).

     

    3. The same datacenter for jBPM and execution logics will be important, otherwise the network partition will become some bottelnecks.Please ref: http://highscalability.com/blog/2010/8/12/think-of-latency-as-a-pseudo-permanent-network-partition.html

     

    4. Transaction ISOLATION prefer the eventually consistency, rather than read_committed. As snapshot information will be largely used in whole process executions.So tranditional transaction isolation will become the stone-block for parallelism.

     

    [Cons]

    1.  Easy model the transparancey layer wrapper jbpm. So the JBPM sharding will become possible implictly.

    2.  Object seralization is not acceptable anymore only applicable is identiifer of object seralization.

    3.  So the individual request processing a little longer than none-parallelism mode.

     

    [Changllenges]

    1.  Select correct phase for split-up is more important, so SEDA model could help you re-think your system architecture. Please ref: http://community.jboss.org/people/andy.song/blog/2010/08/12/how-jboss-products-contribute-to-seda-models

     

    2.  Select correct seralization mode is also important, TEXT/String will be applicable solutions. But memory consumption for sting in java is very bad, you should try your best banlance the memory consumption and parallelism maximum.

     

    3.  Only could be put inside the same location Data Center. If cross network locations, the pattern is not applicable. We may need other tools for help like Inter-Data Grid service or Clouding services.

     

    4.  You should build your own diagnostic tools for tracing, debuging and deployments. But that means you can learn more, good thing isn't it from DEV perspectives. But managers may hate that.

     

    So right now you understand a little more detail about my previous two blog messages regarding parallelism? Some simple principles comes online:

    1. try your best split/isolations.

    2. seralization is not always your victim.

    3. network communciation isn't always your victim.

    4. synchronous or in-jvm execution isn't always means the parellelism

    5. research, research, .....

    6. reading source codes is good habit

    7. slowness, none-parellelsim isn't tools fault, actually is your mind.

    8. you should always has your own libraries, because your always don't have enough time for practices.

    9. try your best balance the performance(SLA), parellelism, and high avaibilities.