4 Replies Latest reply on Jan 31, 2011 12:59 PM by galder.zamarreno

    Increasing performance in bulk data inserts

    havell

      We are testing infinispan platform in order to sustitute Coherence (from Oracle). One of our apps consumes data directly from Cache, and it requires about 14.000.000 rows. In production environment, we managed to make about 4000 inserts/s (it takes one hour). I have seen some infinispan benchmarks close to that value.

       

      We opted for 4 nodes topology with distributed policy (4 virtual machines running on ESXi servers, private LAN), L1 cache disabled and protocol hotrod:

       

      <namedCache name="test">

            <clustering mode="distribution">

               <sync/>

               <hash rehashEnabled="false" rehashRpcTimeout="60000" numOwners="2"/>

               <l1 enabled="false"/>

            </clustering>

        </namedCache>

       

      However, in our tests we only could achieve  400 inserts/s (for us, a poor value).

      Maybe we have a mistaken focus. We are trying to insert the rows from hotrod-client (we tested "put" and "putAll" methods). Maybe there is a way to load all data directly from file (or DB...) to Cache, and that way would be quicker.

       

      We reviewed all documentation, and we didn't find any clue, could you help us?

       

      Thanks

        • 1. Increasing performance in bulk data inserts
          mircea.markus

          Go you have any contention on the keys? If not (or only slight contention) I suggest running the insert in parallel on the same hotrod client. Or even better, try running the insert on two clients, on each client having clusterSize threads doing the put.  

          • 2. Increasing performance in bulk data inserts
            manik

            Also, is your loader code in the same JVM as your Infinispan instance?  If so, you could use Infinispan's Cache API directly and not use the Hot Rod remote client.

             

            However, if your loader is a separate JVM (or even separate physical machine), then Hot Rod is your best option, and as Mircea suggested you should make your loader multi-threaded to insert entries in parallel.

            • 3. Increasing performance in bulk data inserts
              mircea.markus

              Also, is your loader code in the same JVM as your Infinispan instance?  If so, you could use Infinispan's Cache API directly and not use the Hot Rod remote client.

               

              If you co-locate the loader with the ISPN instance and load data in that way you'll need to have your client in the same JVM as well: atm you cannot read through hotrod what was written in the cache directly (i.e. not through hotrod). And the other way around.

              • 4. Increasing performance in bulk data inserts
                galder.zamarreno

                It'd be interesting to find out as well which Coherence API you used to load data so that we can compare both methods.