2 Replies Latest reply on Apr 26, 2012 2:30 PM by yelin666

    data replication performance issue in 5.1.0 & 5.1.3

    yelin666

      We encountered data replication performance issue in both Infinispan 5.1.3.FINAL & 5.1.0.FINAL. I attached a simple example, which could reproduce the issue. In general, the issue occurs on Windows platforms (I tried on XP & Server 2003) more frequently. On Linux, it may only occur once in a few hours. Also, it happens more frequently to TCP than UDP. So my example is using TCP.

       

      To use the attached example, unzip it to a Windows directory and follow these steps:

      1. Go to cache-write-test directory, run "mvn install".
      2. Go to cache-write-test\target\release diretory, run "run.bat" and leave it there.
      3. Open another cmd window, go to the same directory, run "run.bat >out.log", and leave it running for 20-30 minites.
      4. Open another cmd window, go to the same directory, run "grep --regex="[0123456789]\{3,\} milliseconds" out.log". It will show all replication times more than 100 milliseconds, while the normal case takes no time (0 milliseconds). Sometimes the replication could go more than 800 milliseconds.

       

      Could you please suggest what introduces the big jitter in the replication time? My JGroups configuration is under cache-writer-test\src\main\resources\my-jgroups-tcp.xml, please let me know if you see any issue in the configuration.

       

      I'd appreciate help on this issue.

       

      Lin

        • 1. Re: data replication performance issue in 5.1.0 & 5.1.3
          belaban

          Hard to say, this could be caused by:

          • Garbage collection. The rule of thumb is that GC stops for 1 sec per GB of heap eventually. This also depends on the OS/JVM used.
          • Both thread pools are configured to discard a message when all threads are in use. This would lead to retransmission and you have 300ms as smallest retransmit time.
          • Flow control: when a sender runs out of credits, it is blocked until the receiver(s) send more credits.

           

          I suggest you enable verbose GC and see if the times when a full GC happened correlate with your hiccups.

          • 2. Re: data replication performance issue in 5.1.0 & 5.1.3
            yelin666

            Per your suggestion on GC, I updated my script, which disabled explicit GC, used CMS GC, and turned on GC logging. I also upgraded to Infinispan 5.1.4.FINAL, which is using JGroups 3.0.9.FINAL. I re-run the test, and the jitters are NOT related to GC.

             

            What you mentioned about thread pools & flow control are the general rules. In my test, I have one cache instance stand by without any activities (just to support the data replication), and another instance updating 50 data objects every 2 seconds. So I don't see those 2 rules would be applicable to my test either.

             

            I attached my simple example again with the new script. Now run the first instance using "run 1", and the second instance using "run 2 >out.log". So could you please try running the test on your Windows platform, and see if you can find out what's the problem? Thanks in advance.