5 Replies Latest reply on Jul 15, 2015 9:16 AM by Enrico Olivelli

    Help debugging NearCache and multiple HotRod server architecture

    Enrico Olivelli Newbie

      Hi,

      I'm running Infinispan 7.2.3Final and I'm in trouble with the nearcache.

      That fact is that when running a cluster with more than one server after some time (I'm not yet able to reproduce the problem systematically) all the Java HotRod Clients clients stop behaving correctly, that is, it reads "stale" values from Infinispan.

      This is the case, in the simplest staging environment which reproduces the problem:

      - two java hotrod clients

      - two java hotrod servers (embedded in a java application, which does nothing but hosting the Infinispan server)

      - the clients are configured to connected to all the servers

       

      the client follows these steps:

      - writes an entry to a cache, with value=A

      - updates the entry to the cache, with value=B

      - reads the entry from the cache, the returned value is A

       

      of course this does not happens always, most of the time Infinispan works as expected, but after some time that the system is fully running every client stops behaving as expected and the buggy beheaviur  starts.

      Even after restarting the server nodes the clients are not able to recover anymore, the only resolution is to restart the clients.

       

      Even putting  all the loggers at DEBUG level I cannot find any "exception", both on servers and on clients.

       

      Any idea of how can I debug this situation ?

       

      client side bootstrap code:

                  org.infinispan.client.hotrod.configuration.ConfigurationBuilder clientBuilder = new org.infinispan.client.hotrod.configuration.ConfigurationBuilder().forceReturnValues(false);

                      clientBuilder.nearCache().mode(NearCacheMode.LAZY).maxEntries(100000);

                  clientBuilder

                          .connectionPool()

                          .exhaustedAction(ExhaustedAction.WAIT)

                          .maxActive(10)

                          .maxIdle(10)

                          .maxTotal(20)

                          .minIdle(1)

                          .minEvictableIdleTime(60000)

                          .timeBetweenEvictionRuns(60000)

                          .numTestsPerEvictionRun(20)

                          .testOnBorrow(false)

                          .testOnReturn(false

                          .testWhileIdle(true);

                  clientBuilder

                          .socketTimeout(sockettimeout)

                          .connectionTimeout(connecttimeout);

                      clientBuilder.

                              security()

                              .authentication()

                              .enable()

                              .serverName("localhost")

                              .saslMechanism("CRAM-MD5")

                              .callbackHandler(new SecurityCallbackHandler(username, password, realm));

                       for each server...{

                          clientBuilder

                                  .addServer()

                                  .host(serveraddress)

                                  .port(serverport);

                      }

                 

      RemoteCacheManager remoteCacheManager = new RemoteCacheManager(clientBuilder.build());

       

       

      Server side configuration:

      GlobalConfiguration globalConfig = new GlobalConfigurationBuilder()

                          .globalJmxStatistics()

                          .allowDuplicateDomains(true)

                          .cacheManagerName(instanceName)

                          .transport()

                          .defaultTransport()

                          .clusterName(clustername)

                          .addProperty("configurationFile", configurationFile)   // using default jgroups UDP multicast

                          .machineId(instanceName)

                          .siteId("xxxx")

                          .rackId("xxxxx")

                          .nodeName("uniquenamexxxx")

                          .build();

       

                  Configuration wildcard = new ConfigurationBuilder()

                          .locking().lockAcquisitionTimeout(lockAcquisitionTimeout)

                          .concurrencyLevel(10000).isolationLevel(IsolationLevel.READ_COMMITTED).useLockStriping(true)

                          .clustering()

                          .cacheMode(CacheMode.DIST_SYNC)

                          .l1().lifespan(l1ttl).enabled(l1ttl > 0)                            // problems happens both with l1 that without l1

                          .hash().numOwners(1).capacityFactor(1)

                          .partitionHandling().enabled(false)

                          .stateTransfer().awaitInitialTransfer(false).timeout(initialTransferTimeout).fetchInMemoryState(false)

                          .storeAsBinary().enabled(false).storeKeysAsBinary(false).storeValuesAsBinary(false)

                          .jmxStatistics().enable()

                          .unsafe().unreliableReturnValues(false)

                          .compatibility().enable()

                          .build();

      EmbeddedCacheManager manager = new DefaultCacheManager(globalConfig, wildcard, false);

      HotRodServer hotrodserver =new HotRodServer();

      HotRodServerConfigurationBuilder configBuilder = new HotRodServerConfigurationBuilder()

                              .host(address)

                              .port(port);

      SimpleServerAuthenticationProvider authProv = new SimpleServerAuthenticationProvider();

      authProv.addUser(username, realm, password);

      configBuilder.authentication().addAllowedMech("CRAM-MD5").serverAuthenticationProvider(authProv).serverName("localhost").enable();

      hotrodserver.start(configBuilder.build(),manager);