4 Replies Latest reply on Jan 31, 2011 12:59 PM by galder.zamarreno

Increasing performance in bulk data inserts

havell Jan 20, 2011 9:28 AM

We are testing infinispan platform in order to sustitute Coherence (from Oracle). One of our apps consumes data directly from Cache, and it requires about 14.000.000 rows. In production environment, we managed to make about 4000 inserts/s (it takes one hour). I have seen some infinispan benchmarks close to that value.

We opted for 4 nodes topology with distributed policy (4 virtual machines running on ESXi servers, private LAN), L1 cache disabled and protocol hotrod:

<sync/>

</clustering>

</namedCache>

However, in our tests we only could achieve 400 inserts/s (for us, a poor value).

Maybe we have a mistaken focus. We are trying to insert the rows from hotrod-client (we tested "put" and "putAll" methods). Maybe there is a way to load all data directly from file (or DB...) to Cache, and that way would be quicker.

We reviewed all documentation, and we didn't find any clue, could you help us?

Thanks

1. Increasing performance in bulk data inserts

mircea.markus Jan 21, 2011 1:18 PM (in response to havell)

Go you have any contention on the keys? If not (or only slight contention) I suggest running the insert in parallel on the same hotrod client. Or even better, try running the insert on two clients, on each client having clusterSize threads doing the put.
Actions
2. Increasing performance in bulk data inserts

manik Jan 25, 2011 5:01 PM (in response to mircea.markus)

Also, is your loader code in the same JVM as your Infinispan instance? If so, you could use Infinispan's Cache API directly and not use the Hot Rod remote client.

However, if your loader is a separate JVM (or even separate physical machine), then Hot Rod is your best option, and as Mircea suggested you should make your loader multi-threaded to insert entries in parallel.
Actions
3. Increasing performance in bulk data inserts

mircea.markus Jan 26, 2011 6:46 AM (in response to manik)

Also, is your loader code in the same JVM as your Infinispan instance? If so, you could use Infinispan's Cache API directly and not use the Hot Rod remote client.

If you co-locate the loader with the ISPN instance and load data in that way you'll need to have your client in the same JVM as well: atm you cannot read through hotrod what was written in the cache directly (i.e. not through hotrod). And the other way around.
Actions
4. Increasing performance in bulk data inserts

galder.zamarreno Jan 31, 2011 12:59 PM (in response to havell)

It'd be interesting to find out as well which Coherence API you used to load data so that we can compare both methods.
Actions

Go to original post