8 Replies Latest reply on Jan 2, 2012 9:29 AM by galder.zamarreno

jdbccachestore

nikolay1981 Nov 4, 2011 11:56 AM

Hi!

The goal is to have background data dump by a node in a cluster into cache store to have ability recover from this store in a case of disaster.

N1 --replication-->N2 -- >jdbccachestore (mysql)

N1 runs the following code for 10 seconds

 for(;;){
 tm.begin
 cache.put
 tm.commit
 }

N2 does nothing.

In 10 seconds interval N1 puts ~100K entries into the cache. In the same interval cache in N2 gets only 1K.

When I detach jdbccachestore and run the test again N2 gets ~100K in 10 seconds interval.

In the 1^st test after 10 seconds nodes are being kept alive. N2 continues to dump data into database. If I shutdown N1 then N2 stops to dump data as it’s not been committed(received?) in N2 cache .

So my point is replication is waiting while entry is persisted into a cachestore. This is strange a bit because that replication is async and cachestore is async as well.

Could you suggest anything?

Thanks a lot.

Infinispan 5.1.0 Beta.

N1 and N2 ctustering definition

        <clustering mode="replication">
            <async asyncMarshalling="true" useReplQueue="true" replQueueInterval="2000" replQueueMaxElements="10000"/>
            <stateRetrieval fetchInMemoryState="true" timeout="240000"/>
        </clustering>
        <transaction
                transactionManagerLookupClass="org.infinispan.transaction.lookup.JBossStandaloneJTAManagerLookup"
                syncRollbackPhase="false"
                syncCommitPhase="false"
                useEagerLocking="false"/>

N2 loaders configaration

<loaders passivation="false" shared="false" preload="true">
            <loader class="org.infinispan.loaders.jdbc.stringbased.JdbcStringBasedCacheStore"
                    fetchPersistentState="true" ignoreModifications="false" purgeOnStartup="false">
                <properties>
                    <property name="key2StringMapperClass" value="AATwoWayKey2StringMapper"/>
                    <property name="stringsTableNamePrefix" value="ISPN_B"/>
                    <property name="idColumnName" value="ID"/>
                    <property name="dataColumnName" value="DATA"/>
                    <property name="timestampColumnName" value="TSTAMP"/>
                    <property name="timestampColumnType" value="BIGINT"/>
                    <property name="connectionFactoryClass"
                              value="org.infinispan.loaders.jdbc.connectionfactory.PooledConnectionFactory"/>
                    <property name="connectionUrl" value="jdbc:mysql://localhost:3306/mydb"/>
                    <property name="userName" value="root"/>
                    <property name="password" value=""/>
                    <property name="driverClass" value="com.mysql.jdbc.Driver"/>
                    <property name="idColumnType" value="VARCHAR(100)"/>
                    <property name="dataColumnType" value="BLOB"/>
                    <property name="dropTableOnExit" value="false"/>
                    <property name="createTableOnStart" value="true"/>
                </properties>
                <async enabled="true" flushLockTimeout="15000" threadPoolSize="15"/>
            </loader>
        </loaders>

1. Re: performance: N1 --replication-->N2 -- >jdbccachestore

galder.zamarreno Nov 7, 2011 3:05 AM (in response to nikolay1981)

You've enabled state transfer, and IIRC, that forces the cache to be synchronous rather than asynchronous.

Try disabling state transfer and see how it goes.
Actions
2. Re: performance: N1 --replication-->N2 -- >jdbccachestore

nikolay1981 Nov 22, 2011 5:17 PM (in response to galder.zamarreno)

Hi Galder,

Thank you for the reply.

I've disabled state transfer by removing
<stateRetrieval fetchInMemoryState="true" timeout="240000"/>
and result is still the same
cache.put
20 ms per entry.

I changed configuration. having jdbc store shared between nodes. Result is the same. ~20 ms per entry.
Thread is blocked by each cache.putAsync (or with flag force async) awating entry posted to datastore.

Thanks.
Actions
3. Re: performance: N1 --replication-->N2 -- >jdbccachestore

galder.zamarreno Dec 6, 2011 10:10 AM (in response to nikolay1981)

Hmmm, can you attach a thread dump when you store an entry? Put it in a file and attach it rather than copy/paste to the forum.
Actions
4. Re: performance: N1 --replication-->N2 -- >jdbccachestore

nikolay1981 Dec 16, 2011 4:26 PM (in response to galder.zamarreno)
Hi Galder!

I've stricted the test case. I left only one node in the cluster. Configuration attached. I'm using lthe latest 5.1.0 CR1

The logic is the same:

                        tm.begin();
                        cacheInfinispan.getAdvancedCache().withFlags(Flag.FORCE_ASYNCHRONOUS).putAsync(i, i);
                        tm.commit();

I see cache updates are done synchronously with db updates despite async decl. in configuration file.

<async enabled="true" flushLockTimeout="1500" threadPoolSize="3"/>

Thank you

trtest2-cache-config-tcp-510b-clA.xml 2.9 KB

threaddump-nodeA.txt.zip 2.5 KB
Actions
5. Re: performance: N1 --replication-->N2 -- >jdbccachestore

galder.zamarreno Dec 19, 2011 7:03 AM (in response to nikolay1981)

Ok, I can see what's going on. Basically, the coalesced async store is writing to the store, and put call is trying to load the previous value from the cache store. Writing to the store acquires lock, same as for reading. The easiest way to get around this issue is:

If you don't need the previous value (that's what NotifyingFuture.get() will return), you can pass Flag.SKIP_CACHE_LOAD so that previous value is not returned from the cache store if not present in the cache. That will get rid of the lock contention when calling putAsync().
Actions
6. Re: performance: N1 --replication-->N2 -- >jdbccachestore

nikolay1981 Dec 23, 2011 1:58 PM (in response to galder.zamarreno)
Hi Galder,

Thanks for pointing at Flag.SKIP_CACHE_LOAD. It helped. N1 works at the speed of light.

Another couple of issues. I extended my test case to the original configuration N1 --> N2 --> jdbccashstore. And I see only 7 from 30 CoalescedAsyncStore threads dumping data into database. Could you clarify how I can tune this?

And when N1's done its job and data is being dumped by N2 I see interesting thing if kill N1 data dumping stops. It's very strange. It looks like replication works too slow or some locking isues.
For instance 100K entries just been put into cache in N1(CPU ~50%), at that moment of time N2 got only 4K dumped to disk. Then CPU ~0% database gets 1K per second. I kill N1. N2 is alive but database gets no updates.

Thank you.

Merry Christmas.

N2-threaddump.txt.zip 3.8 KB

N1-threaddump.txt.zip 3.7 KB
Actions

7. Re: performance: N1 --replication-->N2 -- >jdbccachestore

galder.zamarreno Jan 2, 2012 3:24 AM (in response to nikolay1981)

You're overflowing N2 at the JGroups/transport level as shown by threads blocking on flow control protocols on N1:

transport-thread-10@3077 daemon, prio=5, in group 'main', status: 'waiting'
  java.lang.Thread.State: WAITING
            at sun.misc.Unsafe.park(Unsafe.java:-1)
            at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
            at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2116)
            at org.jgroups.util.CreditMap.decrement(CreditMap.java:157)
            at org.jgroups.protocols.MFC.handleDownMessage(MFC.java:104)
            at org.jgroups.protocols.FlowControl.down(FlowControl.java:341)

The reason though why N2 might not be sending credits to N1 is because N2 is blocked again trying to load previous values from database:

Incoming-1,demoCluster,nykdwm2056268-65313@2778, prio=5, in group 'Thread Pools', status: 'waiting'
  java.lang.Thread.State: WAITING
            at sun.misc.Unsafe.park(Unsafe.java:-1)
            at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
            at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
            at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:941)
            at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1261)
            at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:594)
            at org.infinispan.util.concurrent.locks.StripedLock.acquireLock(StripedLock.java:103)
            at org.infinispan.loaders.LockSupportCacheStore.lockForReading(LockSupportCacheStore.java:101)
            at org.infinispan.loaders.LockSupportCacheStore.load(LockSupportCacheStore.java:128)

This appears to be a bug. Assuming you're using the skip cache load flag, on tx prepare, it seems like the flag is forgotten in the remote node.

More info on FC can be found in: http://www.jgroups.org/javadoc/org/jgroups/protocols/MFC.html and http://docs.jboss.org/jbossclustering/cluster_guide/5.1/html/jgroups.chapt.html#jgroups-other-fc

8. Re: performance: N1 --replication-->N2 -- >jdbccachestore

galder.zamarreno Jan 2, 2012 9:29 AM (in response to galder.zamarreno)

I believe your issue has been indirected fixed by https://issues.jboss.org/browse/ISPN-1642, at least for a majority of cases, so I'd suggest you give it a go with Infinispan 5.1.0.CR2. There're some corner cases where a proper fix for https://issues.jboss.org/browse/ISPN-1652 is needed though.
Actions

Go to original post