11 Replies Latest reply on Jul 5, 2012 9:09 PM by galder.zamarreno

    data loss with a shared DIST cache store

    jicken

      Hi,

       

      I am using AS 7.1.1 with a cache configured like https://gist.github.com/2416139

      JBoss is running with a domain HA profile (2 nodes).

       

      Passivation is set to false, so that I have a write-thru configuration and everything is written to the database upon inserting into the cache.

       

      The scenario:

       

      I have a key distributed on these 2 nodes. The primary owner is server2. The key is updated on server1 though.

       

      server1 is telling me:

       

      22:47:30,380 TRACE [org.infinispan.interceptors.DistributionInterceptor] (http--127.0.0.1-8080-2) Not doing a remote get for key 100002075182001 since entry is mapped to current node (jboss1/mapper-cluster), or is in L1.  Owners are [jboss2/mapper-cluster, jboss1/mapper-cluster]

      22:47:30,389 TRACE [org.infinispan.interceptors.locking.ClusteringDependentLogic] (http--127.0.0.1-8080-2) My address is jboss1/mapper-cluster. Am I main owner? - false

      22:47:30,392 TRACE [org.infinispan.interceptors.DistCacheStoreInterceptor] (http--127.0.0.1-8080-2) Skipping cache store since the cache loader is shared and the caller is not the first owner of the key

       

      server2 reports:

       

      22:47:30,486 TRACE [org.infinispan.interceptors.CacheStoreInterceptor] (OOB-20,null) Skipping cache store since the cache loader is shared and we are not the originator.

       

      Ergo .. nothing is written to the cache store. In case of a server restart (or crash) data is lost.

       

      If the primary owner is equal to the originator of the update everything works as expected and the record is persisted.

       

      Is this the desired behavior?

       

      Thx,

       

      Torben

        • 1. Re: data loss with a shared DIST cache store
          galder.zamarreno

          Hmmm, that smells like a bug. Could you please create an issue in http://issues.jboss.org/browse/ISPN and attach the configuration file and any other info you have there?

           

          Thanks!

          • 2. Re: data loss with a shared DIST cache store
            galder.zamarreno

            The only way I can imagine this happening is if server2 is not configured with distribution, otherwise CacheStoreInterceptor.skip(InvocationContext, VisitableCommand) would not be called.

             

            If this is not the case, attach full test case. You might wanna try plugging Infinispan 5.1.4.FINAL into AS 7.1.1 to see if that solves your issue too.

            • 3. Re: data loss with a shared DIST cache store
              jicken

              Galder Zamarreño wrote:

               

              If this is not the case, attach full test case. You might wanna try plugging Infinispan 5.1.4.FINAL into AS 7.1.1 to see if that solves your issue too.

               

              Galder,

               

              I have created https://github.com/jicken/ispn-shared-dist-cache

               

              It's an Arquillian cluster test. The repo is quite big as it contains two instances of JBoss AS7.1.1.Final. The test makes use of a PostgreSQL database; this is why u need to fix the connection settings in both nodes (ispn.xml) prior to executing the test.

               

              Done that, just run 'bash run.sh'.

               

              Please have a look at the Github README. I have mentioned the errors I came across when running this test. I didn't succeed to advance to step 3.) in the scenario as both keys were either distributed on just one node or on both nodes - but on the wrong one each.

               

              Desired distribution would be: key1 on jboss1, key2 on jboss2 to see step 3 failing

               

              If ispn.xml is configured with shared=false the testcases should be OK.

               

              If u have any problems just let me know.

              • 4. Re: data loss with a shared DIST cache store
                jbaris

                I have the same problem with Infinispan 5.1.5.CR1 in embedded mode. I have a cluster with 6 nodes (each one in a dedicated host) with this confinguration:

                 

                <default>

                        <jmxStatistics enabled="true" />

                          <clustering mode="distribution">

                            <hash numOwners="3"/>

                            <sync />           

                        </clustering>

                        <locking useLockStriping="false" />

                          <deadlockDetection enabled="true" spinDuration="500" />

                        <transaction

                            syncCommitPhase="true" syncRollbackPhase="true" useEagerLocking="false"

                              useSynchronization="false" eagerLockSingleNode="false">

                            <recovery enabled="true" />

                        </transaction>

                        <loaders passivation="false" shared="true" preload="true">

                              <loader class="org.infinispan.loaders.jdbc.mixed.JdbcMixedCacheStore"

                                fetchPersistentState="false" ignoreModifications="false"

                                purgeOnStartup="false">

                                  <properties>

                                    <property name="tableNamePrefixForStrings" value="ISPN_MIXED_STR_TABLE" />

                                    <property name="tableNamePrefixForBinary" value="ISPN_MIXED_BINARY_TABLE" />

                                    <property name="idColumnNameForStrings" value="ID_COLUMN" />

                                    <property name="idColumnNameForBinary" value="ID_COLUMN" />

                                    <property name="dataColumnNameForStrings" value="DATA_COLUMN" />

                                    <property name="dataColumnNameForBinary" value="DATA_COLUMN" />

                                    <property name="timestampColumnNameForStrings" value="TIMESTAMP_COLUMN" />

                                    <property name="timestampColumnNameForBinary" value="TIMESTAMP_COLUMN" />

                                    <property name="timestampColumnTypeForStrings" value="BIGINT" />

                                    <property name="timestampColumnTypeForBinary" value="BIGINT" />

                                    <property name="connectionFactoryClass"

                                        value="org.infinispan.loaders.jdbc.connectionfactory.ManagedConnectionFactory" />

                                    <property name="datasourceJndiLocation" value="java:DB2XADS" />

                                    <property name="databaseType" value="DB2"/>

                                    <property name="idColumnTypeForStrings" value="VARCHAR(255)" />

                                    <property name="idColumnTypeForBinary" value="VARCHAR(255)" />

                                    <property name="dataColumnTypeForStrings" value="BLOB" />

                                    <property name="dataColumnTypeForBinary" value="BLOB" />

                                    <property name="dropTableOnExitForStrings" value="false" />

                                    <property name="dropTableOnExitForBinary" value="false" />

                                    <property name="createTableOnStartForStrings" value="false" />

                                    <property name="createTableOnStartForBinary" value="false" />                   

                                </properties>

                            </loader>

                        </loaders>

                        <expiration wakeUpInterval="-1"/>

                    </default>

                At start, all nodes log somethig like this:

                2012-05-30 10:09:22,507 DEBUG [org.infinispan.interceptors.InterceptorChain:org.infinispan.interceptors.InterceptorChain.printChainInfo(InterceptorChain.java:76)] Interceptor chain is:

                    >> org.infinispan.interceptors.InvocationContextInterceptor

                    >> org.infinispan.interceptors.CacheMgmtInterceptor

                    >> org.infinispan.interceptors.StateTransferLockInterceptor

                    >> org.infinispan.interceptors.TxInterceptor

                    >> org.infinispan.interceptors.NotificationInterceptor

                    >> org.infinispan.interceptors.locking.OptimisticLockingInterceptor

                    >> org.infinispan.interceptors.EntryWrappingInterceptor

                    >> org.infinispan.interceptors.ClusteredCacheLoaderInterceptor

                    >> org.infinispan.interceptors.DistCacheStoreInterceptor

                    >> org.infinispan.interceptors.DeadlockDetectingInterceptor

                    >> org.infinispan.interceptors.DistributionInterceptor

                    >> org.infinispan.interceptors.CallInterceptor

                 

                By putting an entry on the first node, the nodes 1, 2 and 5 changes its "numberOfEntries" variable to 1. But any node incrases the "CacheLoaderStores" jmx variable, and therefore, no entry is persisted in the database. Looking at the logs, I see the following:

                Node 1 logs:

                2012-05-30 10:18:41,053 TRACE [org.infinispan.interceptors.DistCacheStoreInterceptor:org.infinispan.interceptors.DistCacheStoreInterceptor.skipKey(DistCacheStoreInterceptor.java:185)] Skipping cache store since the cache loader is shared and the caller is not the first owner of the key

                Node 2 logs:

                2012-05-30 10:18:41,570 TRACE [org.infinispan.interceptors.DistCacheStoreInterceptor:org.infinispan.interceptors.DistCacheStoreInterceptor.skipKey(DistCacheStoreInterceptor.java:185)] Skipping cache store since the cache loader is shared and the caller is not the first owner of the key

                Node 5 logs:

                2012-05-30 10:18:41,845 TRACE [org.infinispan.interceptors.CacheStoreInterceptor:org.infinispan.interceptors.CacheStoreInterceptor.skip(CacheStoreInterceptor.java:121)] Skipping cache store since the cache loader is shared and we are not the originator.

                 

                By setting the property shared="false", the entry is persisted, but I think this is not appropiated, right?

                Regards

                • 5. Re: data loss with a shared DIST cache store
                  galder.zamarreno

                  Irrespective of what Juan Ignacio says, Torben, is your test right? You have 2 nodes, and 2 as number of owners too. I can't see how the ERROR flavours in the README make sense, because both nodes are owners in this case. However, if you switch to 1 owner, I've seen a test I've built in a similar way. I'm investigating.

                   

                  @Juan Ignacio, I've got a test failing. Let's see if a fix can be found and you can try it in your env.

                  • 6. Re: data loss with a shared DIST cache store
                    jicken

                    The intention of owners == 2 was to prevent database access in case of a server crash. Let's say we have 10 nodes, owners would still be 2 (or 3) for redundancy, but not 10.

                     

                    Your 2nd question:

                    The grep in run.sh is wrong as it doesn't grep the following statements of the org.infinispan.interceptors.DistCacheStoreInterceptor:

                     

                    [torben@jit] ~/dev/oss/ispn-shared-dist-cache$ grep owner node?/jboss-as-7.1.1.Final/standalone/log/server.log

                    node1/jboss-as-7.1.1.Final/standalone/log/server.log:18:25:38,899 TRACE [org.infinispan.interceptors.DistCacheStoreInterceptor] (http--127.0.0.1-9080-1) Skipping cache store since the cache loader is shared and the caller is not the first owner of the key

                    node1/jboss-as-7.1.1.Final/standalone/log/server.log:18:25:38,903 TRACE [org.infinispan.interceptors.DistCacheStoreInterceptor] (http--127.0.0.1-9080-1) Skipping cache store since the cache loader is shared and the caller is not the first owner of the key

                    node1/jboss-as-7.1.1.Final/standalone/log/server.log:18:25:41,418 TRACE [org.infinispan.interceptors.DistCacheStoreInterceptor] (OOB-17,null) Skipping cache store since the cache loader is shared and the caller is not the first owner of the key

                    node1/jboss-as-7.1.1.Final/standalone/log/server.log:18:25:41,419 TRACE [org.infinispan.interceptors.DistCacheStoreInterceptor] (OOB-17,null) Skipping cache store since the cache loader is shared and the caller is not the first owner of the key

                    [torben@jit] ~/dev/oss/ispn-shared-dist-cache$

                     

                    I've fixed that in run.sh.

                     

                    But here u can see: " ... and the caller is not the first owner of the key"

                     

                    It's not only about being owner, but being the first owner.

                    • 7. Re: data loss with a shared DIST cache store
                      galder.zamarreno

                      Torben, I've replicated this in a smaller scale , see https://issues.jboss.org/browse/ISPN-2089

                      1 of 1 people found this helpful
                      • 8. Re: data loss with a shared DIST cache store
                        galder.zamarreno

                        Guys, I've a patch that solves the issue in https://github.com/galderz/infinispan/tree/t_dist_shared_5 - in case you wanna give it a go

                        1 of 1 people found this helpful
                        • 9. Re: data loss with a shared DIST cache store
                          jicken

                          great, thx!!

                          • 10. Re: data loss with a shared DIST cache store
                            ferwasy

                            Galder: in the https://issues.jboss.org/browse/ISPN-2089 there is a reference to 5.1.x as one of the fix versions. There is a plan to release a 5.1.6 version that includes this patch. If there is, do you know when is planned to release it?

                            Kind regards.

                            Fernando.

                            • 11. Re: data loss with a shared DIST cache store
                              galder.zamarreno

                              I'm pretty sure we won't be doing any further 5.1.x releases. Either upgrade to 5.2.x, or buy a JBoss Data Grid (uses 5.1.x) support contract