3 Replies Latest reply on Feb 24, 2012 8:57 AM by galder.zamarreno

    StateRetrieval problem?

    feci

      Hi,

       

      I'm using new 5.1.1.FINAL version, and I have a problem with initial state retrieval.

      Sometimes not all values are transfered to new started cache - it's a behaviour, that I didn't see in earlier releases.

      It happens randomly - in some cache starts are ok, sometimes there are some values missing.

       

      global config:

      new GlobalConfigurationBuilder()

                          // JMX

                          .globalJmxStatistics()

                          .jmxDomain("org.infinispan")

                          .enabled(false)

                          // Transport

                          .transport()

                          .strictPeerToPeer(false) // can't be true in asymmetric clusters

                          .transport(new JGroupsTransport())

                          .distributedSyncTimeout(50000)

                          .clusterName("infinispan")

                          .addProperty("configurationFile", "infinispanJGroups.xml")

                          // Serialization

                          .serialization().marshaller(new VersionAwareMarshaller())

                          .build();

       

       

      default config:

      new ConfigurationBuilder()

                          // Locking

                          .locking()

                          .useLockStriping(false)

                          .isolationLevel(IsolationLevel.READ_COMMITTED)

                          .lockAcquisitionTimeout(10000)

                          .writeSkewCheck(false)

                          .concurrencyLevel(50)

                          // Transaction

                          .transaction()

                          .transactionManagerLookup(new TxManagerLookup(pTransactionHandler))

                          .transactionMode(TransactionMode.NON_TRANSACTIONAL)

                          .syncRollbackPhase(false)

                          .useSynchronization(false)

                          .syncCommitPhase(true)

                          .lockingMode(LockingMode.OPTIMISTIC)

                          .use1PcForAutoCommitTransactions(false)

                          .autoCommit(true)

                          .recovery().disable()

                          // Clustering

                          .clustering()

                          .cacheMode(CacheMode.REPL_SYNC)

                          .sync()

                          .replTimeout(20000)

                          // State transfer

                          .stateTransfer()

                          .fetchInMemoryState(true)

                          .timeout(300000)

                          // Loaders

                          .loaders()

                          .passivation(false)

                          .preload(false)

                          .shared(false)

                          // Expiration

                          .expiration()

                          .wakeUpInterval(10000)

                          // Deadlock detection

                          .deadlockDetection()

                          .build();

       

      cache config:

      new ConfigurationBuilder().read(getDefaultConfig())

                                              .build()

        • 1. Re: StateRetrieval problem?
          galder.zamarreno

          Hmmm, do you have a test for this?

           

          If you can replicate it relatively easily, it'd be handy to enable TRACE logging for org.infinispan package and attach logs for all nodes participating.

          • 2. Re: StateRetrieval problem?
            feci

            Hi,

            at first, sorry I'm providing only debug logs, but trace is way too chatty...

            here are some logs to clear out, what I mean:

             

            Node3 (started at the end):

             

            2012-02-23 16:49:08.166 DEBUG org.infinispan.cacheviews.CacheViewsManagerImpl - directoryService: Node fecko-38467 is joining

            2012-02-23 16:49:08.169 DEBUG org.infinispan.distribution.ch.DefaultConsistentHash - Using 1 virtualNodes to initialize consistent hash wheel

            2012-02-23 16:49:08.170 DEBUG org.infinispan.statetransfer.BaseStateTransferManagerImpl - Applying new state from fecko-22248: received 0 keys

            2012-02-23 16:49:08.170 DEBUG org.infinispan.statetransfer.ReplicatedStateTransferTask - Commencing state transfer 3 on node: fecko-38467. Before start, data container had 0 entries

            2012-02-23 16:49:08.170 DEBUG org.infinispan.statetransfer.StateTransferLockImpl - Blocking new write commands for cache view 3

            2012-02-23 16:49:08.176 DEBUG org.infinispan.statetransfer.BaseStateTransferManagerImpl - Applying new state from fecko-2413: received 11 keys

            2012-02-23 16:49:08.179 DEBUG org.infinispan.cacheviews.CacheViewsManagerImpl - directoryService: Committing cache view CacheView{viewId=3, members=[fecko-22248, fecko-2413, fecko-38467]}

            2012-02-23 16:49:08.179 DEBUG org.infinispan.statetransfer.BaseStateTransferTask - Node fecko-38467 completed state transfer for view 3 in 9 milliseconds!

            2012-02-23 16:49:08.179 DEBUG org.infinispan.statetransfer.StateTransferLockImpl - Unblocking write commands for cache view 3

            2012-02-23 16:49:08.180 DEBUG org.sors.rttp.infinispan.clusterUtils.ClusterUtilsHandler - Cache:directoryService started.

             

            Node2(started after node1, before node3):

             

            2012-02-23 16:49:08.169 DEBUG org.infinispan.statetransfer.ReplicatedStateTransferTask - Commencing state transfer 3 on node: fecko-2413. Before start, data container had 13 entries

            2012-02-23 16:49:08.169 DEBUG org.infinispan.statetransfer.StateTransferLockImpl - Blocking new write commands for cache view 3

            2012-02-23 16:49:08.169 DEBUG org.infinispan.statetransfer.BaseStateTransferManagerImpl - Pushing to nodes [fecko-38467] 11 keys

            2012-02-23 16:49:08.177 DEBUG org.infinispan.statetransfer.BaseStateTransferTask - Node finished pushing data for cache views 3.

            2012-02-23 16:49:08.179 DEBUG org.infinispan.cacheviews.CacheViewsManagerImpl - directoryService: Committing cache view CacheView{viewId=3, members=[fecko-22248, fecko-2413, fecko-38467]}

            2012-02-23 16:49:08.179 DEBUG org.infinispan.statetransfer.BaseStateTransferTask - Node fecko-2413 completed state transfer for view 3 in 10 milliseconds!

            2012-02-23 16:49:08.180 DEBUG org.infinispan.statetransfer.StateTransferLockImpl - Unblocking write commands for cache view 3

             

            Node1 (started first):

             

            2012-02-23 16:49:08.168 DEBUG org.infinispan.statetransfer.ReplicatedStateTransferTask - Commencing state transfer 3 on node: fecko-22248. Before start, data container had 13 entries

            2012-02-23 16:49:08.168 DEBUG org.infinispan.statetransfer.StateTransferLockImpl - Blocking new write commands for cache view 3

            2012-02-23 16:49:08.168 DEBUG org.infinispan.statetransfer.BaseStateTransferManagerImpl - Pushing to nodes [fecko-38467] 0 keys

            2012-02-23 16:49:08.171 DEBUG org.infinispan.statetransfer.BaseStateTransferTask - Node finished pushing data for cache views 3.

            2012-02-23 16:49:08.181 DEBUG org.infinispan.cacheviews.CacheViewsManagerImpl - directoryService: Committing cache view CacheView{viewId=3, members=[fecko-22248, fecko-2413, fecko-38467]}

            2012-02-23 16:49:08.181 DEBUG org.infinispan.statetransfer.BaseStateTransferTask - Node fecko-22248 completed state transfer for view 3 in 12 milliseconds!

            2012-02-23 16:49:08.181 DEBUG org.infinispan.statetransfer.StateTransferLockImpl - Unblocking write commands for cache view 3

            2012-02-23 16:49:08.181 DEBUG org.infinispan.cacheviews.CacheViewsManagerImpl - Successfully installed view CacheView{viewId=3, members=[fecko-22248, fecko-2413, fecko-38467]} for cache directoryService

             

             

            From the logs, you can see, that Node1 push 0 entries out of 13 to Node3, while Node2 push 11 entries out of 13.

            Where are the missing 2 entries?

            • 3. Re: StateRetrieval problem?
              galder.zamarreno

              Tomas Fecko wrote:

               

              Hi,

              at first, sorry I'm providing only debug logs, but trace is way too chatty...

              They might look very chatty to you but not to us. Simply log to a file to avoid the console going mental, zip it and attach it. And attach TRACE logs from all nodes in the cluster.

               

              Thanks