4 Replies Latest reply on Apr 16, 2014 8:24 AM by william.burns

    Data inconsistency

    maruta.s

      Hello all,

       

      we are facing the problem, that when using infinispan rest servers in cluster some nodes sometimes seems to get to state with inconsistent data.

      GET on one node returns incorrect data, other node have correct data.

       

      We are using following cluster configuration:

       

        <global>

              <transport clusterName="orscluster">

                  <properties>

                      <property name="configurationFile" value="ors-cluster-configuration.xml"/>

                  </properties>

              </transport>

              <globalJmxStatistics enabled="true" jmxDomain="distCache"/>

          </global>

       

          <default>

              <locking isolationLevel="REPEATABLE_READ"

                       lockAcquisitionTimeout="30000"

                       writeSkewCheck="false"

                       concurrencyLevel="512"

                       useLockStriping="false"/>

       

              <clustering mode="distribution">

                  <sync replTimeout="120000"/>

                  <l1 enabled="true"/>

                  <hash numOwners="2"/>

              </clustering>

       

              <jmxStatistics enabled="true"/>

              <invocationBatching enabled="true"/>

          </default>

       

      and also operations over rest api are synchronous:

       

      URL url = new URL(address + cache + key.toString());

                  connection = (HttpURLConnection) url.openConnection();

                  connection.setRequestMethod("PUT");

                  connection.setRequestProperty("Content-Type", "application/x-java-serialized-object");

                  connection.setDoOutput(true);

                  connection.setReadTimeout(readTimout);

                  outputStreamWriter = new ObjectOutputStream(connection.getOutputStream());

                  outputStreamWriter.writeObject(data);

                  connection.connect();

       

      We are using jgroups with udp configuration.

       

      Could you  provide us some hint how this inconsistent state can be invoked? Are there known any bugs or limitations about data inconsistency?

       

      Thanks

      Marta

        • 1. Re: Data inconsistency
          ajcmartins

          Have you tried to disable L1 cache? From what i could read here the invalidation of an stale entry from L1 cache is made by a multicast message. It may be possible that during brief moments you are getting the stale entry. Please note that i'am not sure about this and someone more experienced on ISPN may have a better answer. But meanwhile you can always test without L1 and see if you can replicate the issue.

           

          Cheers,

          • 2. Re: Data inconsistency
            william.burns

            What version are you running with?  Unfortunately versions that are older than 6.0 can have data inconsistencies when L1 is enabled as ajcmartins mentioned.  However 6.0 has extensive changes around L1 to ensure consistency.  Do you have access to any trace logs that show ISPN running when this occurs?

            • 3. Re: Data inconsistency
              maruta.s

              We are using 5.2.6 version and we have L1 cache enabled. The strange thing for me is that the inconsistent state persist until the "problematic" node restart. Could this be caused by L1 cache?

              We do not suffer for brief moments of inconsistency, but the problem persist until restart.

              I have just logs from jgroups, but in time where user perform the GET and returns invalid data, there is no logs about something problematic.

              • 4. Re: Data inconsistency
                william.burns

                Yes my guess is that you are seeing inconsistencies due to L1 then.  A restart of the node would clear out the L1 and start from scratch.  One thing is that L1 has an explicit lifespan value that it stores an entry for.  The default is 10 minutes so the worst case is having the value inconsistent for just under 10 minutes.  If the value is frequently accessed in a short time frame you could reduce the lifespan to something like 1 or 2 minutes instead (which reduce how long the inconsistency is around for).  Note the lifespan has to be in milliseconds.

                 

                <l1 enabled="true" lifespan="120000"/>