7 Replies Latest reply on Aug 17, 2010 9:36 AM by galder.zamarreno

Async Options

shane_dev Jul 21, 2010 12:17 PM

What the difference is between the various async options in Infinispan? FYI: We are using Infinispan in replicated mode, so that is what I am referencing below.

Here is what I found so far...

Put Async

CacheDelegate.putAsync
InterceptorChain.invoke
PutKeyValueCommand.acceptVisitor
ReplicationInterceptor.visitPutKeyValueCommand
ReplicationInterceptor.handleCrudMethod
RpcManagerImpl.broadcastRpcCommandInFuture
RpcManagerImpl.invokeRemotelyInFuture (Create Callable, Submit to ExecutorService)
RETURN

Aysnc Marshalling

CacheDelegate.put
InterceptorChain.invoke
PutKeyValueCommand.acceptVisitor
ReplicationInterceptor.visitPutKeyValueCommand
ReplicationInterceptor.handleCrudMethod
RpcManagerImpl.broadcastRpcCommand
RpcManagerImpl.invokeRemotely
JGroupsTransport.invokeRemotely
CommandAwareRpcDispatcher.invokeRemoteCommands (Create Replication Task, Submit to ExecutorService)
RETURN

Replication Queue

CacheDelegate.put
InterceptorChain.invoke
PutKeyValueCommand.acceptVisitor
ReplicationInterceptor.visitPutKeyValueCommand
ReplicationInterceptor.handleCrudMethod
RpcManagerImpl.broadcastRpcCommand
ReplicationQueue.add (Flush via Add or ScheduledExecutorService)
RETURN

The only practical difference I have seen is that putAsync returns a Future whereas the others do not. However, it looks like the performance difference is neglibile between it and the others.

While putAsync terminates with RpcManagerImpl, async marshalling terminates with CommandAwareRpcDispatcher. However, there seems to be very little going on between the rpc, transport, and dispatcher calls. Also, it doesn't seem to have anything to do with marshalling? The marshalling seems to take place in ReplicationTask.marshallCall via (ReplicationTask.call). Regardless of whether async marshalling is enabled or not, this call doesn't take place until the task is called/executed.

The replication queue seems to be the fastest in our testing. It actually returns sooner than putAsync. This seems to be the best option. Is there any reason to use async marshalling over the replication queue seeing as they are both async?

I noticed that putAsync begins passing along the sync flag with a value of true while the async marshalling path does not. Does this mean that you can use a replication queue with async marshalling? I'm pretty sure you wouldn't want to, but I'm just curious as to if the code would actually let you do that.

Finally, it looks entirely too dangerous to use more than one thread with the putAsync/async marshalling executors. So much so that it is never practical to do so seeing as the ordering may be lost, and in our test was. Is this something that can be looked at in the future? It would be nice to do some sort of locking at the key level so as to ensure ordering of sequential operations on a particular key while be able to process operations for multiple keys in parallel.

1. Re: Async Options

mircea.markus Jul 21, 2010 9:32 PM (in response to shane_dev)

Generally speaking async operations thread speed for consistency.
E.g. putAsync will return immediately, and the actual put with copying the data on the cluster will happen at a further point in time. There's no guarantee that this would be successfull.

The replication queue seems to be the fastest in our testing. It actually returns sooner than putAsync. This seems to be the best option.
Strange - repl queue should have aprox the same performance as async call. What is the difference?

Is there any reason to use async marshalling over the replication queue seeing as they are both async?
The object serialization would be performed async, and not in the calling thread which would return sooner. Note that if asyncMarshalling is used operations might be re-ordered.

Finally, it looks entirely too dangerous to use more than one thread with the putAsync/async marshalling executors. So much so that it is never practical to do so seeing as the ordering may be lost, and in our test was. Is this something that can be looked at in the future?
Afaik no.

1 of 1 people found this helpful
Actions
2. Re: Async Options

manik Jul 22, 2010 5:44 AM (in response to mircea.markus)
The replication queue seems to be the fastest in our testing. It actually returns sooner than putAsync. This seems to be the best option.
Strange - repl queue should have aprox the same performance as async call. What is the difference?

No, the repl queue should be faster since a single RPC call is broadcast with a batch of commands, rather than one command each time. Fewer envelopes, fewer RPCs to execute and respond to, etc. I'm not surprised that the repl queue is quicker.

Is there any reason to use async marshalling over the replication queue seeing as they are both async?
The object serialization would be performed async, and not in the calling thread which would return sooner. Note that if asyncMarshalling is used operations might be re-ordered.
In both cases marshalling would then be async.

The only difference is, in the case of async marshalling, the message is placed on the wire immediately, in the caller's thread. It's just that the caller's thread does not wait for a response. In the case of a repl queue, the call is placed in a queue, and is flushed periodically (say, every 2 minutes). It means the latter may take longer for the replication to be realised across the cluster. That's the only real tradeoff.

Finally, it looks entirely too dangerous to use more than one thread with the putAsync/async marshalling executors. So much so that it is never practical to do so seeing as the ordering may be lost, and in our test was. Is this something that can be looked at in the future?
Afaik no.

Actually there is something to consider here. The risk of reordering only exists in the case of (1) putAsync() and (2) async repl with async marshalling, since in both these cases the entire communication is handed off to a threadpool. (Actually if you just had 1 thread in this threadpool, there would be no risk, but that defeats the purpose!)

There are 2 other asynchronous mechanisms where there is no risk of reordering:

The repl queue. As its name suggests, this really is a queue which maintains order of operations.
Use async replication, but disable async marshalling for the call. This would mean than marshalling happens in the caller's thread (under a lock) as does the handoff to the network layer, but the caller's thread doesn't wait for an ACK. Now JGroups, which is our network layer, maintains message order, even for async messages. So again, no risk of reordering.

By the way, this is an excellent discussion - Shane, I would encourage you to do a writeup and comparison of the various async options, their pros and cons, on the Infinispan wiki at the end of your analysis. You'd make a lot of people in the community very happy.
Actions
3. Re: Async Options

cbo_ Jul 22, 2010 9:31 AM (in response to manik)
Manik,

There are 2 other asynchronous mechanisms where there is no risk of reordering:

The repl queue. As its name suggests, this really is a queue which maintains order of operations.

Use async replication, but disable async marshalling for the call. This would mean than marshalling happens in the caller's thread (under a lock) as does the handoff to the network layer, but the caller's thread doesn't wait for an ACK. Now JGroups, which is our network layer, maintains message order, even for async messages. So again, no risk of reordering.

Indeed we have done enough experimentation/discovery to settle in on just that combo. We have off-threadness via replication queue. We are able to maintain order (again via the queue in replication queue) and by avoiding the use of async marshalling and by only having a single flush thread being used by the replication queue. There are still 1 or 2 areas of improvement I plan to make to ReplicationQueue. The overall performance is actually good. However, if we want to improve things from where they are now while maintaining order I think it can be achieved if a sensitivity is maintained relative to the key. For example, on the receiving end of a replicated cache we could consider multi-threading that as long as we guarantee a given key will be serviced by only 1 threadqueue/thread. This concept can be applied to other bottleneck areas. The idea being that as long as we maintain order for a given key we should be able to say the order of the entire cache is guaranteed.

Shane and or myself will indeed be making contributions to the wiki.
Actions
4. Re: Async Options

shane_dev Jul 22, 2010 10:29 AM (in response to manik)

Hi Manik,

I was hoping this would lead to a more formal write up. I'd be more than happy to do so. If others are like me, it helps to tie the concepts with the code.

I don't happen to have the code with me at the moment, but I thought that the replication queue passes a 'multi' command at some point that ends up doing a loop such that the commands are still processed one at a time. Perhaps I missed something here?

Thanks for confirming my thoughts on async marshalling. That helps quite a bit.

I figured the repl queue may be faster seeing that it hands off to the thread pool fairly early in the process. Much sooner than the asyn marshalling, but at a similar point to asyn marshalling.

As you mentioned, I had considered potentially using thread pools with just 1 thread since it may not be the optimal solution but at least it is async and guaranteed. However, we are seeing great performance with the replication queue. The only adjustment I had to make was to reduce the flush interval to about 1.5 seconds. Our particular interval was simply too fast prior to that.
Actions
5. Re: Async Options

manik Jul 22, 2010 1:55 PM (in response to shane_dev)

I don't happen to have the code with me at the moment, but I thought that the replication queue passes a 'multi' command at some point that ends up doing a loop such that the commands are still processed one at a time. Perhaps I missed something here?
The queue is replicated as a single, multi-rpc command and the recipients then iterate over the contents. So in terms of RPC, you just get a single (large) RPC message. It still does save a lot though, in terms of message envelopes, acks, etc.

However, we are seeing great performance with the replication queue. The only adjustment I had to make was to reduce the flush interval to about 1.5 seconds. Our particular interval was simply too fast prior to that.
Excellent!
1 of 1 people found this helpful
Actions
6. Re: Async Options

mircea.markus Jul 23, 2010 9:46 AM (in response to manik)

No, the repl queue should be faster since a single RPC call is broadcast with a batch of commands, rather than one command each time.
In both situations (async or repl queue) the caller thread should return "instantly", and close to a call to a local cache, as the RPC call is performed at a further point and should not influence call's performance. The overall throughput should be indeed better with replication queue, as less envelops are created. Do you happen to have any figures for how long it takes to an async put vs a put with repl queue?
Actions
7. Re: Async Options

galder.zamarreno Aug 17, 2010 9:36 AM (in response to mircea.markus)

I've compiled some of the information in this thread and http://community.jboss.org/message/557641#557641 into http://community.jboss.org/docs/DOC-15725. Any extra information that can be added based of further testing would be of great help.
Actions

Go to original post