10 Replies Latest reply on Dec 15, 2011 11:55 AM by jaadi

Bulk Get from Remote Cache

jaadi Dec 12, 2011 1:25 PM

Is there a way to do a Bulk Get on Remote cache. I am looking for something like get(List of Keys). Purticularly interested in Hot Rod server. I am thinking this could perform better than getting the values one at a time.

1. Re: Bulk Get from Remote Cache

galder.zamarreno Dec 13, 2011 8:14 AM (in response to jaadi)

There's no bulk get, but you can call RemoteCache.getAsync(key) which returns a org.infinispan.util.concurrent.NotifyingFuture (extends java.util.concurrent.Future). So, if you wanna retrieve multiple keys in parallel, just call multiple time getAsync() and when you need the values, just call Future.get(), or attach a FutureListener to the NotifyingFuture to get notified when the value is ready.

See http://docs.jboss.org/infinispan/5.1/apidocs/org/infinispan/client/hotrod/RemoteCache.html and
http://docs.jboss.org/infinispan/5.1/apidocs/org/infinispan/util/concurrent/NotifyingFuture.html
Actions
2. Re: Bulk Get from Remote Cache

galder.zamarreno Dec 13, 2011 8:18 AM (in response to galder.zamarreno)

Added an FAQ entry for this: https://docs.jboss.org/author/pages/viewpage.action?pageId=15532052
Actions
3. Re: Bulk Get from Remote Cache

jaadi Dec 13, 2011 10:58 AM (in response to galder.zamarreno)

ok, But dont you think the cost of doing multiple gets are more than a doing a single get(List). We could avoid network latency..
Actions
4. Re: Bulk Get from Remote Cache

sannegrinovero Dec 13, 2011 11:18 AM (in response to jaadi)

since you won't wait for the first get to be completed before sending the next one, you're avoiding network latency.
Consider that at the network level close enough messages to the same node are likely to be aggregated by the transport (at Jgroups level or even lower on the stack), and also that each get() operation might need to be sent to a different address.
Actions
5. Re: Bulk Get from Remote Cache

galder.zamarreno Dec 13, 2011 12:11 PM (in response to sannegrinovero)

Sanne, JGroups has nothing to do here. It's down to the TCP layer in the Hot Rod client which would fire these in paralell.
Actions
6. Re: Bulk Get from Remote Cache

jaadi Dec 13, 2011 1:16 PM (in response to galder.zamarreno)

Does that mean , When we fire offf Multiple Asynch "gets" to Hot Rod Server, the server will somehow figure out to package this in "one or more " Units and send back the results to the Hot Rod Client (Or are we assuming that the TCP layer at the OS level agregates these packets? ).
Actions
7. Re: Bulk Get from Remote Cache

galder.zamarreno Dec 14, 2011 4:45 AM (in response to jaadi)

Messages are not aggregated on the client and the server will deal with them independently.

Whether the TCP layer on the client or server aggregates the packets will be down to the tcp_nodelay settings. Both the Hot Rod server and client by default do tcp_nodelay, so they'll send packets as soon as they're available. If tcp_nodelay is disabled, TCP packets are batched but the latency of response increases.
Actions
8. Re: Bulk Get from Remote Cache

jaadi Dec 14, 2011 10:31 AM (in response to galder.zamarreno)

We could probably see some performance improvement if the Infinispan server packs up the results for Bulk get () and send them back to the client as a Map or a List. This could be more reliable and will offer better control than relying on the underlying TCP layer.

I am thinking something like this.

List results = cache.get(List of keys);

Any thoughts on this?
Actions
9. Re: Bulk Get from Remote Cache

galder.zamarreno Dec 15, 2011 4:34 AM (in response to jaadi)

I'm not sure that's necessarly a performance improvement. Maybe in a single node, it'll give a bit more performance, but in a distributed infinispan cluster, each key might be located in different nodes, so it's simply better to do them in paralell and let the hashing logic in the client direct the individual get operations to the nodes that own that data.

We're open though to contributions, so if this is something you feel it's important for you, have a go, it's open source . Then you can give us some numbers to see when/how it makes sense to use get(K...) as opposed to paralellising them with getAsync().
Actions
10. Re: Bulk Get from Remote Cache

jaadi Dec 15, 2011 11:55 AM (in response to galder.zamarreno)

True, for 2GBx100 Node cluster this may not transform into an performance improvement. I was thinking more on the lines of 100GBx2(or 4) Nodes Supported by some "OFF HEAP" Container Spilling over to High Perf PCIE-SSD based storage. When each business operation involves thousands of gets,may be we could see some gains. I will try to give this a go and post some numbers.
Actions

Go to original post