12 Replies Latest reply on Aug 12, 2011 11:59 AM by galder.zamarreno

    replication fetchInMemoryState="true" loading time

    itayhindy

      Hi,

       

      I use the next configuration for several caches:

              <clustering mode="replication">

                 <async asyncMarshalling="true" useReplQueue="true" replQueueInterval="100" replQueueMaxElements="1000" />

                 <stateRetrieval timeout="180000" fetchInMemoryState="true" alwaysProvideInMemoryState="true"/>

              </clustering>

       

      Each cache has more or less the same amount of data (not too much)

       

      There are several servers in cluster. When I restart server while requests are coming to all of the servers I see that every cache has different time to load and it is different from restart to restart. The difference is hugh.

      For example: cache A takes 300 milisecons and cache B takes 30 secons. In the second restart cache B takes 300 milisecons and cache A takes 30 secons.

       

      In case no requests for the servers while one server is restarted then state transfer is quick and finished in few milisecons.

       

      It look that the state transfer time is depend on getting lock and not on the actual size.

      Is it correct? How can we be sure that a lock will be aquire and that state transfer will be finished successfully? We depends on that the state transfer will finish successfully and that all data is loaded to the restarted server.

       

      Thanks in advance

       

      Here is some more information,

       

      I am runing a simple unit test that write an integer to replicated cache every 1 second. Cache configuration:

       

              <clustering mode="replication">

                 <async asyncMarshalling="true" useReplQueue="true" replQueueInterval="100" replQueueMaxElements="1000" />

                 <stateRetrieval timeout="180000" fetchInMemoryState="true" alwaysProvideInMemoryState="true"/>

              </clustering>

       

      After the first node is up I am starting a second node with the same unit test. Usually it start ok but from time to time the State transfer takes up to 15 seconds (usually it is 200MS)

       

      For the third node that I run, the unit test the state transfer takes 15 seconds, for the forth node it takes 60 seconds.

       

      Looks like JGroupsDistSync.acquireProcessingLock(JGroupsDistSync.java:71) taking alot of time.

       

      I am using 4.2.0 FINAL, but tested it also with 4.2.1.CR1 and 5.0.0.ALPHA2.

       

      Here is the simple test code:

       

          @Test

          public void test() {

              final AdvancedCache<Integer, Integer> cache = cacheFactory.getDefaultCache();

              synchronized (object) {

                  try {

                      int i = new Random().nextInt();

                      while (true) {

                          cache.put(i++, i);

                          object.wait(250);

                      }

       

                  } catch (InterruptedException e) {

       

                  }

              }

       

          }

       

      Attached log file