7 Replies Latest reply on Jun 26, 2017 9:38 AM by rvansa

Weighing Replicated Mode Against Distributed Mode

thartwell Jun 2, 2017 12:59 AM

Currently we have an application that is using Infinispan in replicated mode. We started with replicated mode mainly for the ease of setup, but I'm not sure this mode is best for our layout.

Primary Node

Application 1 using

Cache A - read/write

Cache B - read/write

Cache C - read/write

Gossip Router - All communication from application 1 to application 2 happens over gossip router

Secondary Node 1 (SN1)

Application 2 using

Cache A - read only

SN2

Application 2 using

Cache B - read only

SN3

Application 2 using

Cache C - read only

...

SN80

Application 2 using

Cache Z - read only

We have less than 100 secondary nodes. Each secondary node only needs one cache's worth of data to be present. We use the cache on the secondary node as the only mechanism to access the data. If the data is not in the cache on the secondary node, it is not just a cache miss, it is an error in the application. This allows the secondary nodes application code to simply use Infinispan for data access and Secondary nodes never call into the primary node's application code for data. This is important, because application 2 cannot afford to be making remote calls for this information when it is needed at lookup time.

Secondary nodes are across a WAN (customer on premise application) in relation to the Primary Node. We've had issues with this Infinispan setup that are hard to debug. For example, Application 1 will fail to startup due to what seems to be state transfer. We were under the impression that the Primary Node, since it is doing all the writes, would always be the primary data owner, regardless of whether the Primary Node's application was not always on (e.g. during a restart).

The goal of this post is to gain a better understanding of the implications of this setup, and to then understand if we should migrate to a distributed mode setup.

Concerns with replicated mode

1. Is is possible, or even guaranteed, that the Primary Node running application 1 will not always be the primary data owner. If so, does that mean when application 1 starts up on the primary node, it will attempt to do a state transfer?

2. Is it possible that state transfers may fail entirely? We currently are seeing our application 1 getting into a state where the cache is empty for Cache A, for example, when it should not be, but it's not clear what might be the cause of the empty cache in this case.

3. The documentation is a bit vague, but it states that with more than 10 nodes, replicated mode is possibly not the best choice. Given that we are using gossip router and TCP between application 1 and all of the application 2, I would think replicated mode may not be the best for our use case.

4. If we move to distributed mode, Application 2 on the secondary nodes would be reaching across a WAN to talk to the Infinispan dedicated server cluster. Does the client/server model support the concept of the secondary nodes being clients, but having a copy of cache data locally so that lookups are from in memory and not from a remote machine? We need all of the data to be resident on the secondary node as it needs all lookups to be in memory for speed purposes.

5. When performing network tests regarding this current setup, we noticed that a Secondary Node is actually getting traffic on the wire for caches it is not concerned with. Is this by design? If so, this means with our secondary nodes, if there are 80 of them, will experience 80x the required traffic, since a secondary node will be getting traffic from all caches on the Primary Node, not just the single cache it is concerned with.

Thanks in advance for any insight,

Tom

1. Re: Weighing Replicated Mode Against Distributed Mode

rvansa Jun 2, 2017 7:07 AM (in response to thartwell)

Replicated mode is very close to distributed mode, it's in fact just scaling the numOwners to the number of cache members + couple of optimizations that probably don't apply much when you already have only 2 members of each cache. The concept of primary ownership determines the node that orders the update to a particular entry, and the segments are distributed evenly (or rather according to capacity factors) among the members.

1. Yes, but you have to adapt/write yourselves the ConsistentHashFactory implementation (this is a very advanced customization, though). You need to use programmatic implementation to set this up. Any time a node joins the cache as a member, it will do the state transfer unless you've disabled it.
2. Transfer should not fail unless there's a bug or the cluster undergoes a split and partition handling is not enabled. It may take a while. Log is your friend.
3. Infinispan supports 'asymmetric caches', cache being started on some nodes of the (JGroups) cluster but not another ones. If you start all caches on all secondary nodes, replicated mode slows down writes a lot, because a write on primary node waits until the write goes to primary owner and then the change is confirmed on all other nodes. Distributed mode would be better, but your secondary node of choice likely wouldn't be an owner (therefore, read would cause RPC) unless you tweak it in you ConsistentHashFactory. Since you use WAN, maybe a viable alternative to asymmetric caches would be cross-site r eplication.
4. Why do you mix client/server model in here? So far I understand you were using embedded mode (= peer-to-peer). But yes, Hot Rod clients support near caching.
5. As explained above, if the node is not concerned with that cache, it should not start it locally (cacheManager.getCache(name) starts a cache on this node).
1 of 1 people found this helpful
Actions
2. Re: Weighing Replicated Mode Against Distributed Mode

thartwell Jun 13, 2017 7:48 PM (in response to rvansa)

Radim, thank you for the response. Apologies on my delayed response, I was on vacation.

1. Regarding state transfer: we are ok with state transfer happening. Is this a partial transfer of data? I ask because in our use case with Application 1 and Application 2, Application 1 has its cache seeded from data in the database. Application 2 then relies on this populated cache in Application 1 to be replicated to the secondary node that it is running on. It is not able to populate the cache without Application 1 having populated it first. In our case it doesn't make sense for Application 1 to do a state transfer from Application 2, because on start-up, it could either populate its cache from the DB or populate for the cache filestore locally. Does that make sense? Is it possible to enforce behavior like this? Regarding the exception we are experience in production, it looks as though the CacheException we are experiencing during state transfer may be due to the default 4 minute timeout occurring. Is there anyway to see whether the state transfer is doing any work or if it might just be hanging? Our caches are relatively small (KBs to MBs), so the State Transfer failure doesn't seem to be related not transferring the data in time.
2. Is the best way to troubleshoot this to use TRACE level logging in the org.infinispan and org.jgroups packages? Are there other things we can do to debug this problem?
3. On application 1, we build essentially one EmbeddedCacheManager per secondary node, so approximately 50-100 managers. Each secondary node only has one EmbeddedCacheManager. We were under the impression that doing this way would isolate a StateTransfer between 2 nodes only. Do you know if that is the case?
4. Sorry for the confusion when mixing in client/server, I was more looking for some information IF we moved to distributed mode. Yes, we are currently using replicated mode solely. Thank you for the near caching link, that concept will be instrumental if we move to distributed mode.
5. We are doing what you say and only starting a cache if the application needs information from that cache. I'm confused by the wire traffic we see. What we see is that Application 1 will start caches A and B. Application 2 on node #1 will start cache A, and Application 2 on node #2 will start cache B. When a cache put happens on the primary node (Application 1) into cache A, over gossip router, we see wire traffic on both secondary nodes, node #1 and node #2. Does gossip router not have a concept of asymmetric caches built in and will it spam all nodes with traffic over the wire? That seems to be what we observe very consistently in our test environment.

Thank you for your expertise,
Tom
Actions
3. Re: Weighing Replicated Mode Against Distributed Mode

rvansa Jun 14, 2017 5:22 AM (in response to thartwell)

1. Do I understand that you want to make sure that if a secondary node starts before primary node, it won't do any state transfer? I don't think it's possible to disable the state transfer in one way, but if you don't write data on the secondary node, it will stay empty...
AFAIK it's not possible to observe the state transfer (otherwise than calling advancedCache.withFlags(CACHE_MODE_LOCAL).size() or advancedCache.getDataContainer().size() to find out number of entries stored locally) in any convenient way; there are means to hook into the process (we use that in the testsuite), and it is related to CacheTopologyControlCommand, StateRequestCommand and StateResponseCommand, but this is requires more thorough understanding of internals. If there is a bug (what version do you use?), the best way is to set up trace-level logging and dig.
2. That's the best. Regrettably even developers don't have magic crystal balls at hand.
3. Uuh, that's far from ideal. There are heavy resources per cache manager (JGroups and all the threads), so I wouldn't recommend such setup. You should have one cache manager on each node, and just don't start the unneeded caches on the nodes that don't access it.
4. You got it wrong. Distributed mode is not client/server, and it does not involve near caching. Distributed mode says that you'll have fixed number of copies of data no matter how the cluster big is. The terminology here is
Embedded = (replicated | distributed | invalidation | local)
Remote = Client/server = (Hot Rod | REST | Memcached) + servers running in (replicated | distributed | invalidation | local) mode.
Actions
4. Re: Weighing Replicated Mode Against Distributed Mode

rvansa Jun 14, 2017 5:27 AM (in response to rvansa)

5. Could you call AdvancedCache.getDistributionManager().getCacheTopology().getMembers() on primary node for all caches? Also, on secondary node enable DEBUG level trace logging and you should see log in format Started cache %s on %s
Actions
5. Re: Weighing Replicated Mode Against Distributed Mode

thartwell Jun 14, 2017 8:25 PM (in response to rvansa)

Regarding State Transfer
It's not that we want to prevent state transfer. We would like the secondary nodes (read-only) to settle in a state where they mirror the primary node (read/write), but we are seeing caches being partially populated or not populated at all. Perhaps we are hitting the timeout and that is just leaving the secondary node in a state where it does not finish state transfer and our application doesn't attempt any retries. If the state transfer does not complete before the default StateTransferConfiguration.TIMEOUT, is our application that is using Infinispan responsible for retrying this operation or is there a retry internal to Infinispan?

Regarding the EmbeddedCacheManager
I misstated our current cache layout. Application 1 only has one EmbeddedCacheManager. Application 1 has multiple caches, so between 50-100 caches. On the secondary nodes, Application 2 only ever cares about 1 of these caches, so these nodes have a single EmbeddedCacheManager with one cache. Is this what you would describe as an asymmetric cache? Essentially, there are only 2 nodes in the cluster that have any one cache. The primary node has one copy and one secondary node has a second copy.

Regarding Terminology
I believe I'm clear on terms now. We are trying to determine if we need to move away from Embedded and use a Remote setup instead

Regarding JGroups and Gossip Router
Do you know if data (e.g. in Cache A) will be sent to all secondary nodes over the wire? We are observing this behavior. So, for example, if secondary node #2 only starts Cache B, it will still receive data on the wire for all the other caches. I _believe_ node #2 does nothing with this data and it is simply dropped, but it's concerning that all secondary nodes would receive all traffic from the primary node, since that means if one cache update occurs on all the caches on the primary node it means that secondary node #2 will receive 50x-100x more wire traffic than we would desire.

Next Steps for Me
I'm going to run a large testbed scenario where I can setup a simple cache application using Infinispan in the manner I described in my original post. I'll be adding trace level logging and I'll look to reproduce the Exception we experience in production to gain better insight. You pointed out early on that cross-site replication might be more feasible given that the data we want on our secondary nodes is across a WAN. I will look into that.
Actions
6. Re: Weighing Replicated Mode Against Distributed Mode

rvansa Jun 15, 2017 9:38 AM (in response to thartwell)

ST: I think I understand what you're trying to achieve. If the cache is kBs/MBs, not finishing ST within timeout is fishy, and it probably signals some other problem. You shouldn't retry, there's no point in that - ST should just do its job. In fact, after the initial statetransfer there's no limit on the further ones, just some message timeouts. I wonder if you could create a reproducer, because starting cluster is the most basic feature that's tested all the time.

ECM: Okay, in that case it's fine, one cache manager per node. And yes - we call the cache an 'asymmetric cache' whenever it is not started on all nodes in the JGroups view.

Remote client/server will be always more limited in options and slower than embedded.

Actually it might be a bug in the implementation, where a replicated cache would send data to 'all owners' and that would be (incorrectly) translated as a JGroups broadcast to 'all nodes in the cluster'. Then the other nodes where the cache is not started should at least respond with CacheNotFoundException and that would cause an exception. I'll try to create a reproducer early next week. Even less likely (but possibly) similar problem could affect the state transfer, too.
Actions
7. Re: Weighing Replicated Mode Against Distributed Mode

rvansa Jun 26, 2017 9:38 AM (in response to rvansa)

I've tried the described case (cluster with more nodes, but cache running only on 2 of them) and issuing simple write, but the message is sent through JGroups using unicast, to single node only. Could you grab some trace logs to let me see what happened?
Actions

Go to original post