5 Replies Latest reply on Sep 2, 2013 8:23 AM by boss_mc

memcached read/write failures while growing distributed cache

mike-evans Aug 22, 2013 4:14 AM

We're using Infinispan server 5.3 as a scalable, redundant memcached cluster. We're running growth tests starting with a single node cluster, then trying to add a second node while the first node is handling a read/write load. We see all memcached read and write commands failing on the first node, starting approximately one minute after the view change to admit the new node and lasting approximately 5 minutes. gets commands return END responses (rather than the VALUE responses we were seeing up to that point) and all memcached_add commands fail with the NOT_STORED response.

Is this a scenario we should expect to work? We tested these growth scenarios with a test application on 5.2.1 and we didn't see the problems with that level of code, so either this is a recently introduced issue or there is some subtle difference between the behaviour of our test application and the behaviour of our real system.

1. Re: memcached read/write failures while growing distributed cache

mike-evans Aug 28, 2013 5:41 AM (in response to mike-evans)

We've been doing some more investigation of this problem, including dropping back to 5.2.1. It looks like the problem still exists in 5.2.1 - we can't provoke it with our test application driving the libmemcached API directly, but it is reproducible when running the full Clearwater system at high load. We're not yet sure whether the difference is the transaction load, the mix of reads and write in the transactions or the amount of data in the cache.

As I said above, the symptom is that all reads and writes start failing shortly after the view change. With debugging logging enabled, we see repeated logs from Jgroups on both the new and old node

2013-08-28 06:44:03,183 WARN (OOB-1,infinispan-cluster,ec2-54-211-174-135-48245) [org.jgroups.protocols.UNICAST] ec2-54-211-174-135-48245: sender window for ec2-54-227-89-77-36548 not found

We'd really appreciate some input on this, specifically whether this scenario is expected to work or whether this is a fundamental limitation of Infinispan.
Actions
2. Re: memcached read/write failures while growing distributed cache

mike-evans Aug 29, 2013 7:29 AM (in response to mike-evans)

Still no closer to a solution. We have tried enabling more trace (in jgroups and parts of Infinispan itself) but no smoking gun. The failures seem to happen as soon as Jgroups sends up a view change event. The behavior in 5.2.1 seems worse than in 5.3 in that the problem takes a lot longer to clear - in 5.3 it clears after ~5 minutes every time, in 5.2.1 we haven't been able to establish a consistent period after which the problem clears.

At the risk of repeating myself, can anyone tell me whether or not this scenario should work in Infinispan - should it continue to provide read/write access while the cluster grows, or does everyone turn down their application (and flush the cache) when adding nodes?
Actions
3. Re: memcached read/write failures while growing distributed cache

nadirx Aug 29, 2013 10:16 AM (in response to mike-evans)

Sorry for not replying any sooner. Yes the scenario is supposed to work. Can we have a look at the configuration ?
Actions
4. Re: memcached read/write failures while growing distributed cache

mike-evans Aug 29, 2013 10:25 AM (in response to nadirx)

5.2.1 cache configuration is as follows

<?xml version="1.0" encoding="UTF-8"?>
<infinispan
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="urn:infinispan:config:5.2 http://www.infinispan.org/schemas/infinispan-config-5.2.xsd"
      xmlns="urn:infinispan:config:5.2">

   <global>
      <transport clusterName="infinispan-cluster"
                 distributedSyncTimeout="50000"
                 transportClass="org.infinispan.remoting.transport.jgroups.JGroupsTransport">
         <properties>
            <property name="configurationFile" value="${CFGPath}/jgroups-tcpping.xml"/>
         </properties>
      </transport>
      <globalJmxStatistics enabled="true"/>
   </global>

   <default>
      <jmxStatistics enabled="true"/>
      <clustering mode="distribution">
         <l1 enabled="false" />
         <hash numOwners="2" />
         <async/>
      </clustering>
   </default>
</infinispan>

JGroups configuration is

<config xmlns="urn:org:jgroups"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/JGroups-3.0.xsd">
    <TCP bind_port="7800" />
    <TCPPING timeout="3000"
             initial_hosts="ec2-54-211-174-135.compute-1.amazonaws.com[7800],10.139.25.87[7800]"
             port_range="0"
             num_initial_members="2"/>
    <VERIFY_SUSPECT timeout="1500" />
    <pbcast.NAKACK use_mcast_xmit="false"
                   retransmit_timeout="300,600,1200,2400,4800"
                   discard_delivered_msgs="true"/>
    <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
                   max_bytes="400000"/>
    <pbcast.GMS print_local_addr="true" join_timeout="3000"
                view_bundling="true"/>
    <MERGE2 max_interval="30000"
            min_interval="10000"/>
    <FD_SOCK/>
    <FD_ALL/>
    <BARRIER />
    <UNICAST />
    <UFC max_credits="2M"
         min_threshold="0.4"/>
    <MFC max_credits="2M"
         min_threshold="0.4"/>
    <FRAG2 frag_size="60K" />
    <pbcast.STATE_TRANSFER />
</config>

We've tried with both L1 cached enabled and disabled, but the behaviour is the same.
Actions
5. Re: memcached read/write failures while growing distributed cache

boss_mc Sep 2, 2013 8:23 AM (in response to mike-evans)
Hi all,

I've taken over looking at this issue for Mike and I think I've made some progress on the issues we've been seeing.

The issue with the "sender window for ec2-54-227-89-77-36548 not found" error log was caused by a rogue instance of Infinispan running on one of the nodes and was cleared up by kill that instance (allowing the real instances to cluster properly).

Having cleared that up, I've been testing with version 5.2.7 and I can consistently replicate the issues with the following:
Start one node, let it startup completely (clustering with itself) turn on async.
Start the second node, let the two nodes find each other and split the hash buckets between themselves similarly turn on async.
Telnet into each node on port 11211 (memcached interface).
"add" a key on the original node (succeeds)
"gets" the key on either node (succeeds)
"cas" the key on the new node (fails with EXISTS)
At this point a "gets" on either node returns the new value (so the CAS actually succeeded despite the response).
Using "replace" on the second node also results in a failure message, but actually succeeds.

The logs indicate that this failure only occurs if the memcached request is to the node that does not own the given key, I see "I'm not the primary owner, so sending the command to the primary owner(ec2-54-224-166-225-53039) in order to be forwarded" in the log file. Looking at https://github.com/infinispan/infinispan/blob/5.2.7.Final/core/src/main/java/org/infinispan/interceptors/distribution/No… I see that the behaviour in this case is to send the request to the key owner to process, but to also perform the change locally and return the result of the local change to the user (even the comment says this will sometimes be the wrong result). In practice, the local return code is `false`, leading to the failures we've been seeing.

I think the behaviour here is just plain wrong, if we're going to talk to the primary, we should use their result (as they're the only node qualified to give a result for a given key) so we should always return getResponseFromPrimaryOwner(primaryOwner, addressResponseMap);. I've tried changing the code to do this, but, when we send our request to the primary, we don't specify that we're interested in the response so we receive a `null` response which is treated as false, leading to the failure return code again.

I think the correct behaviour here is to always wait for the lock owner's response when we're proxying the request, and to ask the lock owner to actually tell us his success/failure so we can correctly inform the client. Does this sound sensible? We don't want to disable async generally, since the performance seems to be massively impacted (at least a 20x reduction).

Thanks in advance for any help you can give on this.

Andy
Actions

Go to original post