8 Replies Latest reply on Jan 2, 2012 9:29 AM by galder.zamarreno

    performance: N1 --replication-->N2 -- >jdbccachestore

    nikolay1981

      Hi!

       

      The goal is to have background data dump by a node in a cluster into cache store to have ability recover from this store in a case of disaster.

      N1 --replication-->N2 -- >jdbccachestore (mysql)

      N1 runs the following code for 10 seconds

       for(;;){
       tm.begin
       cache.put
       tm.commit
       }
      

      N2 does nothing.


      In 10 seconds interval N1 puts ~100K entries into the cache. In the same interval cache in N2 gets only 1K.

      When I detach jdbccachestore and run the test again N2 gets ~100K in 10 seconds interval.

      In the 1st test after 10 seconds nodes are being kept alive. N2 continues to dump data into database. If I shutdown N1 then N2 stops to dump data as it’s not been committed(received?) in N2 cache .

      So my point is replication is waiting while entry is persisted into a cachestore. This is strange a bit because that replication is async and cachestore is async as well.

      Could you suggest anything?

       

      Thanks a lot.

       

      Infinispan 5.1.0 Beta.

      N1 and N2 ctustering definition

              <clustering mode="replication">
                  <async asyncMarshalling="true" useReplQueue="true" replQueueInterval="2000" replQueueMaxElements="10000"/>
                  <stateRetrieval fetchInMemoryState="true" timeout="240000"/>
              </clustering>
              <transaction
                      transactionManagerLookupClass="org.infinispan.transaction.lookup.JBossStandaloneJTAManagerLookup"
                      syncRollbackPhase="false"
                      syncCommitPhase="false"
                      useEagerLocking="false"/>
      
      

       

      N2 loaders configaration

      <loaders passivation="false" shared="false" preload="true">
                  <loader class="org.infinispan.loaders.jdbc.stringbased.JdbcStringBasedCacheStore"
                          fetchPersistentState="true" ignoreModifications="false" purgeOnStartup="false">
                      <properties>
                          <property name="key2StringMapperClass" value="AATwoWayKey2StringMapper"/>
                          <property name="stringsTableNamePrefix" value="ISPN_B"/>
                          <property name="idColumnName" value="ID"/>
                          <property name="dataColumnName" value="DATA"/>
                          <property name="timestampColumnName" value="TSTAMP"/>
                          <property name="timestampColumnType" value="BIGINT"/>
                          <property name="connectionFactoryClass"
                                    value="org.infinispan.loaders.jdbc.connectionfactory.PooledConnectionFactory"/>
                          <property name="connectionUrl" value="jdbc:mysql://localhost:3306/mydb"/>
                          <property name="userName" value="root"/>
                          <property name="password" value=""/>
                          <property name="driverClass" value="com.mysql.jdbc.Driver"/>
                          <property name="idColumnType" value="VARCHAR(100)"/>
                          <property name="dataColumnType" value="BLOB"/>
                          <property name="dropTableOnExit" value="false"/>
                          <property name="createTableOnStart" value="true"/>
                      </properties>
                      <async enabled="true" flushLockTimeout="15000" threadPoolSize="15"/>
                  </loader>
              </loaders>
      
      
        • 1. Re: performance: N1 --replication-->N2 -- >jdbccachestore
          galder.zamarreno

          You've enabled state transfer, and IIRC, that forces the cache to be synchronous rather than asynchronous.

           

          Try disabling state transfer and see how it goes.

          • 2. Re: performance: N1 --replication-->N2 -- >jdbccachestore
            nikolay1981

            Hi Galder,

             

            Thank you for the reply.

             

            I've disabled state transfer by removing

            <stateRetrieval fetchInMemoryState="true" timeout="240000"/>

            and result is still the same

            cache.put

            20 ms per entry.

             

            I changed configuration. having jdbc store shared between nodes. Result is the same. ~20 ms per entry.

            Thread is blocked by each cache.putAsync (or with flag force async) awating entry posted to datastore.

             

            Thanks.

            • 3. Re: performance: N1 --replication-->N2 -- >jdbccachestore
              galder.zamarreno

              Hmmm, can you attach a thread dump when you store an entry? Put it in a file and attach it rather than copy/paste to the forum.

              • 4. Re: performance: N1 --replication-->N2 -- >jdbccachestore
                nikolay1981

                Hi Galder!

                 

                I've stricted the test case. I left only one node in the cluster. Configuration attached. I'm using lthe latest 5.1.0 CR1

                 

                The logic is the same:

                 

                                        tm.begin();

                                        cacheInfinispan.getAdvancedCache().withFlags(Flag.FORCE_ASYNCHRONOUS).putAsync(i, i);

                                        tm.commit();

                 

                I see cache updates are done synchronously with db updates despite async decl. in configuration file.

                 

                <async enabled="true" flushLockTimeout="1500" threadPoolSize="3"/>

                 

                Thank you

                • 5. Re: performance: N1 --replication-->N2 -- >jdbccachestore
                  galder.zamarreno

                  Ok, I can see what's going on. Basically, the coalesced async store is writing to the store, and put call is trying to load the previous value from the cache store. Writing to the store acquires lock, same as for reading. The easiest way to get around this issue is:

                   

                  If you don't need the previous value (that's what NotifyingFuture.get() will return), you can pass Flag.SKIP_CACHE_LOAD so that previous value is not returned from the cache store if not present in the cache. That will get rid of the lock contention when calling putAsync().

                  • 6. Re: performance: N1 --replication-->N2 -- >jdbccachestore
                    nikolay1981

                    Hi Galder,

                     

                    Thanks for pointing at Flag.SKIP_CACHE_LOAD. It helped. N1 works at the speed of light.

                     

                    Another couple of issues. I extended my test case to the original configuration N1 --> N2 --> jdbccashstore. And I see only 7 from 30 CoalescedAsyncStore threads dumping data into database. Could you clarify how I can tune this?

                     

                    And when N1's done its job and data is being dumped by N2 I see interesting thing if kill N1 data dumping stops. It's very strange. It looks like replication works too slow or some locking isues.

                    For instance 100K entries just been put into cache in N1(CPU ~50%), at that moment of time N2 got only 4K dumped to disk. Then CPU ~0% database gets 1K per second. I kill N1. N2 is alive but database gets no updates.

                     

                    Thank you.

                     

                    Merry Christmas.

                    • 7. Re: performance: N1 --replication-->N2 -- >jdbccachestore
                      galder.zamarreno

                      You're overflowing N2 at the JGroups/transport level as shown by threads blocking on flow control protocols on N1:

                       

                      transport-thread-10@3077 daemon, prio=5, in group 'main', status: 'waiting'
                        java.lang.Thread.State: WAITING
                                  at sun.misc.Unsafe.park(Unsafe.java:-1)
                                  at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
                                  at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2116)
                                  at org.jgroups.util.CreditMap.decrement(CreditMap.java:157)
                                  at org.jgroups.protocols.MFC.handleDownMessage(MFC.java:104)
                                  at org.jgroups.protocols.FlowControl.down(FlowControl.java:341)
                      

                       

                      The reason though why N2 might not be sending credits to N1 is because N2 is blocked again trying to load previous values from database:

                       

                      Incoming-1,demoCluster,nykdwm2056268-65313@2778, prio=5, in group 'Thread Pools', status: 'waiting'
                        java.lang.Thread.State: WAITING
                                  at sun.misc.Unsafe.park(Unsafe.java:-1)
                                  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
                                  at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
                                  at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:941)
                                  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1261)
                                  at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:594)
                                  at org.infinispan.util.concurrent.locks.StripedLock.acquireLock(StripedLock.java:103)
                                  at org.infinispan.loaders.LockSupportCacheStore.lockForReading(LockSupportCacheStore.java:101)
                                  at org.infinispan.loaders.LockSupportCacheStore.load(LockSupportCacheStore.java:128)
                      

                       

                      This appears to be a bug. Assuming you're using the skip cache load flag, on tx prepare, it seems like the flag is forgotten in the remote node.

                       

                      More info on FC can be found in: http://www.jgroups.org/javadoc/org/jgroups/protocols/MFC.html and  http://docs.jboss.org/jbossclustering/cluster_guide/5.1/html/jgroups.chapt.html#jgroups-other-fc

                      • 8. Re: performance: N1 --replication-->N2 -- >jdbccachestore
                        galder.zamarreno

                        I believe your issue has been indirected fixed by https://issues.jboss.org/browse/ISPN-1642, at least for a majority of cases, so I'd suggest you give it a go with Infinispan 5.1.0.CR2. There're some corner cases where a proper fix for https://issues.jboss.org/browse/ISPN-1652 is needed though.