2 Replies Latest reply on Apr 24, 2017 2:25 PM by rjack2

    Replication timeout for localhost-44885 - Infinispan 6.0.1 Cluster

    rjack2

      Folks,

       

      We are seeing this error in our logs for 3 of our 4 servers using an embedded cache.

       

      org.infinispan.util.concurrent.TimeoutException: Replication timeout for localhost-44885

       

      Here is the code causing the error:

       

      protected Session getSession(String authKey) {

              Session session = cacheFactory.getCache().get(authKey);

              return session;

          }

       

      Here is our embedded cache manager setup code:

       

          public EmbeddedCacheManager embeddedCacheManager(List<ICacheConfigurer> configs) {

              log.debug("Building Cache Manager");

              //@formatter:off

              GlobalConfigurationBuilder globalConfigurationBuilder = new GlobalConfigurationBuilder();

              globalConfigurationBuilder

              .globalJmxStatistics().disable().allowDuplicateDomains(true)

              .transport().clusterName("green-prod-lqecommerce")

              .shutdown().hookBehavior(ShutdownHookBehavior.DEFAULT);

              globalConfigurationBuilder.transport().defaultTransport().addProperty("configurationFile", "infinispan/jgroups-ec2.xml");

       

       

              EmbeddedCacheManager manager = new DefaultCacheManager(globalConfigurationBuilder.build());

              for (ICacheConfigurer iCacheConfigurer : configs) {

                  log.debug("Found Config: {}", iCacheConfigurer.getClass());

                  iCacheConfigurer.configure(manager);

              }

       

      Here is our cache configuration:

       

                  public void configure(EmbeddedCacheManager manager) {

                      //@formatter:off

                      ConfigurationBuilder builder = new ConfigurationBuilder();

                      builder

                      .clustering().cacheMode(CacheMode.REPL_SYNC)

                      .invocationBatching().enable(true)

                      .persistence()

                      .passivation(false)

                      .addSingleFileStore()

                      .preload(true)

                      .shared(false)

                      .fetchPersistentState(true)

                      .ignoreModifications(false)

                      .purgeOnStartup(false)

                      .location(centralCachePath)

                      .eviction()

                      .strategy(EvictionStrategy.LIRS)

                      .maxEntries(20000)

                      .expiration()

                      .lifespan(6, TimeUnit.HOURS)

                      .maxIdle(1, TimeUnit.HOURS)

                      .wakeUpInterval(1, TimeUnit.MINUTES);

                      manager.defineConfiguration(ITOKEN_CACHE,

                              builder.build());

                      //@formatter:on

       

       

                  }

       

      Here is our jgroups file:

       

      <config xmlns="urn:org:jgroups"

              xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

              xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/JGroups-3.4.xsd">

         <TCP

              bind_addr="${jgroups.tcp.address:127.0.0.1}"

              bind_port="${jgroups.tcp.port:9806}"

              loopback="true"

              port_range="0"

              recv_buf_size="20000000"

              send_buf_size="640000"

              max_bundle_size="31k"

              use_send_queues="true"

              sock_conn_timeout="300"

              enable_diagnostics="false"

            

              bundler_type="old"

       

       

              thread_pool.enabled="true"

              thread_pool.min_threads="2"

              thread_pool.max_threads="30"

              thread_pool.keep_alive_time="60000"

              thread_pool.queue_enabled="false"

              thread_pool.queue_max_size="100"

              thread_pool.rejection_policy="Discard"

       

       

              oob_thread_pool.enabled="true"

              oob_thread_pool.min_threads="2"

              oob_thread_pool.max_threads="30"

              oob_thread_pool.keep_alive_time="60000"

              oob_thread_pool.queue_enabled="false"

              oob_thread_pool.queue_max_size="100"

              oob_thread_pool.rejection_policy="Discard"

       

       

              internal_thread_pool.enabled="true"

              internal_thread_pool.min_threads="1"

              internal_thread_pool.max_threads="10"

              internal_thread_pool.keep_alive_time="60000"

              internal_thread_pool.queue_enabled="true"

              internal_thread_pool.queue_max_size="100"

              internal_thread_pool.rejection_policy="Discard"

              />

       

       

         <S3_PING secret_access_key="${jgroups.s3.secret_access_key}" access_key="${jgroups.s3.access_key}" location="${jgroups.s3.bucket:jgroups}" />

       

       

         <MERGE2 max_interval="30000"

                 min_interval="10000"/>

         <FD_SOCK start_port="${jgroups.fd.port:9906}" port_range="0"/>

         <FD timeout="3000" max_tries="5"/>

         <VERIFY_SUSPECT timeout="1500"/>

         <pbcast.NAKACK2

          use_mcast_xmit="false"

                          xmit_interval="1000"

                          xmit_table_num_rows="100"

                          xmit_table_msgs_per_row="10000"

                          xmit_table_max_compaction_time="10000"

                          max_msg_batch_size="100"/>

       

       

         <UNICAST3

                    xmit_table_num_rows="20"

                    xmit_table_msgs_per_row="10000"

                    xmit_table_max_compaction_time="10000"

                    max_msg_batch_size="100"

                    conn_expiry_timeout="0"/>

         <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"

                        max_bytes="400000"/>

         <pbcast.GMS print_local_addr="false" join_timeout="7000" view_bundling="true"/>

         <tom.TOA/> <!-- the TOA is only needed for total order transactions-->

         <UFC max_credits="2m" min_threshold="0.40"/>

         <MFC max_credits="2m" min_threshold="0.40"/>

         <FRAG2 frag_size="30k"  />

         <RSVP timeout="60000" resend_interval="500" ack_on_delivery="false"/>

      </config>

       

      Snippets from Error Log

       

      org.infinispan.util.concurrent.TimeoutException: Replication timeout for localhost-44885

        at org.infinispan.remoting.transport.AbstractTransport.parseResponseAndAddToResponseList(AbstractTransport.java:77)

        at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:547)

        at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:281)

        at org.infinispan.interceptors.distribution.TxDistributionInterceptor.prepareOnAffectedNodes(TxDistributionInterceptor.java:219)

        at org.infinispan.interceptors.distribution.TxDistributionInterceptor.visitPrepareCommand(TxDistributionInterceptor.java:203)

        at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:125)

        at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)

        at org.infinispan.interceptors.CacheWriterInterceptor.visitPrepareCommand(CacheWriterInterceptor.java:103)

        at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:125)

        at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:98)

        at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:112)

        at org.infinispan.commands.AbstractVisitor.visitPrepareCommand(AbstractVisitor.java:96)

        at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:125)

        at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandIntercept

      at java.lang.Thread.run(Thread.java:745)

      Caused by: org.infinispan.commons.CacheException: Could not prepare.

      at org.infinispan.transaction.synchronization.SynchronizationAdapter.beforeCompletion(SynchronizationAdapter.java:46)
      at com.arjuna.ats.internal.jta.resources.arjunacore.SynchronizationImple.beforeCompletion(SynchronizationImple.java:76)
      at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.beforeCompletion(TwoPhaseCoordinator.java:368)
      at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.end(TwoPhaseCoordinator.java:91)
      at com.arjuna.ats.arjuna.AtomicAction.commit(AtomicAction.java:162)
      at com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionImple.commitAndDisassociate(TransactionImple.java:1200)
      ... 125 common frames omitted

      Caused by: javax.transaction.xa.XAException: null

      at org.infinispan.transaction.TransactionCoordinator.prepare(TransactionCoordinator.java:139)
      at org.infinispan.transaction.TransactionCoordinator.prepare(TransactionCoordinator.java:101)
      at org.infinispan.transaction.synchronization.SynchronizationAdapter.beforeCompletion(SynchronizationAdapter.java:44)
      ... 130 common frames omitted

      [timestamp=2017-04-20 16:41:13.017] [thread="http-nio-9006-exec-4"] [session="bc23cbc7-4b90-4f3a-f744-6707c7db72ec"] [request="589d2d7a-af99-45b0-91c2-0ce76552efcb"] [level=ERROR] [logger=c.l.s.e.UncheckedExceptionMapper] [class=com.laquinta.services.exception.UncheckedExceptionMapper] [line=49] - Could not commit implicit transaction

      org.infinispan.commons.CacheException: Could not commit implicit transaction

      at org.infinispan.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1319)

      Caused by: javax.transaction.RollbackException: ARJUNA016053: Could not commit transaction.

        at com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionImple.commitAndDisassociate(TransactionImple.java:1212)

        at com.arjuna.ats.internal.jta.transaction.arjunacore.BaseTransaction.commit(BaseTransaction.java:126)

        at org.infinispan.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1316)

        ... 123 common frames omitted

      Caused by: org.infinispan.commons.CacheException: Could not prepare.

        at org.infinispan.transaction.synchronization.SynchronizationAdapter.beforeCompletion(SynchronizationAdapter.java:46)

        at com.arjuna.ats.internal.jta.resources.arjunacore.SynchronizationImple.beforeCompletion(SynchronizationImple.java:76)

        at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.beforeCompletion(TwoPhaseCoordinator.java:368)

        at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.end(TwoPhaseCoordinator.java:91)

        at com.arjuna.ats.arjuna.AtomicAction.commit(AtomicAction.java:162)

        at com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionImple.commitAndDisassociate(TransactionImple.java:1200)

        ... 125 common frames omitted

      Caused by: javax.transaction.xa.XAException: null

        at org.infinispan.transaction.TransactionCoordinator.prepare(TransactionCoordinator.java:139)

        at org.infinispan.transaction.TransactionCoordinator.prepare(TransactionCoordinator.java:101)

        at org.infinispan.transaction.synchronization.SynchronizationAdapter.beforeCompletion(SynchronizationAdapter.java:44)

        ... 130 common frames omitted

        • 1. Re: Replication timeout for localhost-44885 - Infinispan 6.0.1 Cluster
          gustavonalle

          It's likely you have a low timeout value for your environment and/or load, you could try to increase it:

           

          builder.clustering().sync().replTimeout( <millis> )

          • 2. Re: Replication timeout for localhost-44885 - Infinispan 6.0.1 Cluster
            rjack2

            Gustavo,

             

            Thank you for responding. It looks the default is 15 seconds for 6.0.1. Shouldn't that be enough time.

             

            We aren't setting it anywhere.

             

            Regards,

             

            Robert

             

            public class LockingConfigurationBuilder extends AbstractConfigurationChildBuilder implements Builder<LockingConfiguration> {

             

             

               private static final Log log = LogFactory.getLog(LockingConfigurationBuilder.class);

             

             

               private int concurrencyLevel = 32;

               private IsolationLevel isolationLevel = IsolationLevel.READ_COMMITTED;

               private long lockAcquisitionTimeout = TimeUnit.SECONDS.toMillis(10);

             

            public class LockingConfigurationBuilder extends AbstractConfigurationChildBuilder implements Builder<LockingConfiguration> {

             

             

               private static final Log log = LogFactory.getLog(LockingConfigurationBuilder.class);

             

             

               private int concurrencyLevel = 32;

               private IsolationLevel isolationLevel = IsolationLevel.READ_COMMITTED;

               private long lockAcquisitionTimeout = TimeUnit.SECONDS.toMillis(10);