6 Replies Latest reply on Mar 21, 2014 4:00 PM by nikhil_joshi

    Inserting millions of records in remote cache

    nikhil_joshi

      Hi,

       

      We are using Hot rod client to connect with remote infinispan server. We want to upload more than 3 million records in the cache.

      Can anyone suggest performant approch or best practices commonly used for this kind of scenario.

       

      Also just curious about one thing - We have one mysql table which has all these records, Is there any way to use that table as backing store to populate cache with preload option ?

       

      Thanks , Nikhil

        • 1. Re: Inserting millions of records in remote cache
          wdfink

          As the entities are filled into the cache and the store with special consistent hash you need to add these via HotRod client, it depends on the power of your server whether you have benefits if you do that with different threads in chunks.

           

          What you mean by "mysql table which has all these records"? Is that an existing table with the kye/values? AFAIK there is no possibility as ISPN store additional informations with the key and it is not possible to use a simple key/value table but this is subject to change.

          • 2. Re: Inserting millions of records in remote cache
            rpelisse

            Hi,

             

            If your data model allows it, you could try to leverage the "concurrent nature" of Infinispan, and simply fire several clients, and try to find the sweet spot (ie the number of concurrent clients that lead to the best performance). Also, if your data lives in a datastore (SQL or otherwise) you could try to load using the CacheStore API, it might just be more efficient.

             

            Out of my head, I'm not sure if the Batch API is supported remote, but if so, batching the insert (by chunk of 1000, 10000 or more) can also help performance.

             

            Other than that, usual JVM "tricks" may help the overall performance (large heap, GC collector settings, use large pages, and so on...). I can't put my hands on it right now, but there is a nice blog entry from Shane Johnson about those.

            • 3. Re: Inserting millions of records in remote cache
              nikhil_joshi

              What you mean by "mysql table which has all these records"? Is that an existing table with the kye/values? AFAIK there is no possibility as ISPN store additional informations with the key and it is not possible to use a simple key/value table but this is subject to change.

              Actually the way I imagined this was

               

              1) "string-keyed-jdbc-store" would store the key/values in table as human readable string not in binary or encrypted, case would be different if i use binary store.

               

              2) If I explicitly need to map id,data & timestamp with database columns in configurations then infinispan will not bother about other columns present in that table.

               

                                    <string-keyed-jdbc-store datasource="java:jboss/datasources/dbs" passivation="false" preload="true" purge="false">

                                    <string-keyed-table prefix="JDG">

                                     <id-column name="CACHE_KEY" type="VARCHAR(255)"/>

                                     <data-column name="CACHE_DATA" type="VARCHAR(255)"/>

                                     <timestamp-column name="CACHE_ENTRY_TIME" type="BIGINT"/>

                                    </string-keyed-table>

                                  </string-keyed-jdbc-store>

               

                BTW it looks like table name should be same as cache name with "prefix" configured. 

               

                Essentially we are looking something like overriding inifnispan's mechanism to preload cache from store (in our case its our database instead of standard JDBC cache store) 

              • 4. Re: Inserting millions of records in remote cache
                nikhil_joshi
                • 5. Re: Inserting millions of records in remote cache
                  wdfink

                  Yes you need to implement a custom store.

                   

                  The StringKeyed.... does not store simple readable key/value. The difference between this and binarykeyed is an implementation detail and String... is better for concurrent access

                  • 6. Re: Inserting millions of records in remote cache
                    nikhil_joshi

                    Using putAsync() and overriding Async executor factory, I am able to insert records to grid considerably fast in batches.