3 Replies Latest reply on Jan 9, 2014 12:14 PM by javadevmtl

    Couple thousands of puts/reads on 8 indexes NRT vs Ram Directory?

    javadevmtl

      Hi, using:


      Infinispan-6.0.0

      Java 1.7_17

      OpenSuse 11

      Server is 16 Core 128GB

      Network is 1Gigabit. All machines plugged to same switch.


      Reading through the Lucene docs NRT seems to be a wrapper around RamDirectory but for some reason enabling NRT is faster then just RAM directory.


      I.e:


      <namedCache name="myCache">
      <jmxStatistics enabled="true"/>
      <clustering mode="distribution">
      <async/>
      <hash numOwners="1"/>
      </clustering>
      <storeAsBinary enabled="true"/>
      <indexing enabled="true" indexLocalOnly="true">
      <properties>
      <property name="hibernate.search.default.indexmanager" value="near-real-time" />
      <property name="hibernate.search.default.directory_provider" value="filesystem" />
      <property name="hibernate.search.default.indexwriter.merge_factor" value="1024" />
      <property name="hibernate.search.default.indexwriter.ram_buffer_size" value="128" />
      <property name="hibernate.search.default.sharding_strategy.nbr_of_shards" value="8" />
      </properties>
      </indexing>
      </namedCache>


      Is faster then...


       

      <namedCache name="myCache">
      <jmxStatistics enabled="true"/>
      <clustering mode="distribution">
      <async/>
      <hash numOwners="1"/>
      </clustering>
      <storeAsBinary enabled="true"/>
      <indexing enabled="true" indexLocalOnly="true">
      <properties>
      <property name="hibernate.search.default.directory_provider" value="ram" />
      <property name="hibernate.search.default.indexwriter.merge_factor" value="1024" />
      <property name="hibernate.search.default.indexwriter.ram_buffer_size" value="128" />
      <property name="hibernate.search.default.sharding_strategy.nbr_of_shards" value="8" />
      </properties>
      </indexing>
      </namedCache>


      Note: Even though setup in distribution mode only starting single node to discount network as the problem.


      And y NRT being faster then RAM, way faster!


      In fact with NRT enabled I can do 16,000 puts per second while with RAM only max 750 puts per second. I checked the JMX stats and averageWriteTime with NRT is 0ms while with ram it's 25ms or higher.

       

      This is my model...

       

      @Indexed

      @ProvidedId

      @SerializeWith(...)

      public class MyModel

      {

        @DocumentId

        Integer id;

        //@Field

        Integer customerId;

        //@Field

        Integer code;

        @Field(analyze = Analyze.NO)

        String name;

        @Field(analyze = Analyze.NO)

        String acctHash;

        @Field(analyze = Analyze.NO)

        String address;

        @Field(analyze = Analyze.NO)

        String phoner;

        @Field(analyze = Analyze.NO)

        String email;

        @Field(analyze = Analyze.NO)

        String shipTo;

        @Field(analyze = Analyze.NO)

        Long ip;

        long lease;

       

      // Get setters and the marshallers here...

      }

       

       

      Also query performance is like 40 queries per second. So that's really slow. I know the query is a heavy one, but I would still expect better performance. I tested same query using CQEngine which is a JAVA collections indexing API and it was verry fast. But I'm not here to compare. Because infinispan has all the extra goodies I need liek distribution. So I want to figure out how to tune it all right

       

      The query...

       

      Query query = qf.from(MyModel.class)
      .maxResults(20000)
      .having("acctHash").eq(trxRequest.getAcctHash())
      .or().having("phone").eq(trxRequest.getPhone())
      .or().having("email").eq(trxRequest.getEmail())
      .or().having("ip").eq(trxRequest.getIp())
      .or().having("name").eq(trxRequest.getName())
      .or().having("address").eq(trxRequest.getAddress())
      .or().having("shipTo").eq(trxRequest.getShipTo())
      .toBuilder().build();

       

      How I tested it all...

       

      Basically I put infinispan in my vertx.io web application for each HTTP POST it does the following...

       

      1- Receive POST

      2- Parse POST params

      3- Cache PUT

      4- Build Query

      5- Execute query

      6- Return response.

       

      I got my numbers mentioned above in 2 methods...

      1- Looking at the JMX stats

      2- Visually looking at JMeter reports

       

      The test is setup as follows...

       

      JMeter(200 users) ----> Vertx ---> Infinispan embeded in vertx app.

      1- Puts only (NRT) 16,0000 requets/sec (JMeter Reports) averageWriteLatency 0ms (in JMX)

      1- Puts only (RAM) 750ms requets/sec (JMeter Reports) averageWriteLatency 25+ms (in JMX)

      1- Puts + Query either NRT or RAM 40 requests/sec (JMeter Reports) didn't check JMX cause I figured I try to tune indexing properly first andf hope it works out later...

       

      Here are the snapshots...

       

      https://dl.dropboxusercontent.com/u/27413499/NRT.nps

       

      https://www.dropbox.com/s/116lna3dksfflzd/RAMDirectory.nps

        • 1. Re: Couple thousands of puts/reads on 8 indexes NRT vs Ram Directory?
          sannegrinovero

          Hi,

          nice to see someone testing performance ;-)

           

          I see I need to clarify some things.

          • The RAM based Directory is configured as a directory_provider and represents the "storage" of the index.
          • The NRT backend is configured as an indexmanager. It is a strategy in which the writes are buffered before getting flushed to permanent storage.

          So they are two different things, and you could also use them combined: use NRT on an in-memory Directory.

          However, that combination doesn't make much sense because you would be buffering before writing to memory.


          So NRT has some drawbacks: since it's buffering the writes and not flushing them immediately, you have to consider that

          1. in case the writing node crashes, you might lose some indexing operations.
          2. if you where to use the Infinispan IndexManager (to share the index state in realtime with other nodes), the writing node would be able to "see" changes immediately (if you happen to run a query on that node) but other nodes will "see" the changes only on a flush.

          I guess for this scenario it might get interesting to be able to control the flush trigger manually, but that's not the case today as you're not expected to run NRT on a non-local index (it's not a widely tested setup).


          In case you're wondering WHY the ram based directory is so much slower than the NRT approach: The RAMDirectory is meant for unit tests, not for production performance.

          The Infinispan Directory, which stores the index in an Infinspan cache, is generally a better choice compared to RAMDirectory: it's of course a bit slower when the underlying cache is configured in such a way to need synchronous network RPCs, but if you configure your caches as local only it's generally faster than the RAMDirectory (Also I'd venture stating that it would otherwise be an unfair comparison). The RAMDirectory makes heavy usage of synchronization, while the Infinispan implementation uses more modern concurrency patterns ;-) Of course, for small tests the simplicity of RAMDirectory is great.


          40 Queries/second is quite poor indeed. Are you running the queries at the same time as the put operations? If so, it would be interesting to compare with the performance of queries when there are no writes happening: most of the performance hit of a Query is the need of re-opening a fresh Index to guarantee latest results after a write. Also, have a look at enabling FieldCache usage and to use custom cacheable Filter implementations.

          Hibernate Search Documentation

           

          BTW interesting that you use JMeter: I suspect it's an all right strategy to verify that the configuration is ok, but it's going to be hard to actually measure peak performance of a well tuned system as you would need lots of clients running on different machines to properly stress Infinispan's query engine.

          • 2. Re: Couple thousands of puts/reads on 8 indexes NRT vs Ram Directory?
            javadevmtl

            Hi thanks for the reply!

             

            Ok I will take a look into using infinispan as the directory. I want the fastest possible right. I want to be able to do thousands of writes per second at about 1ms latency.

             

            The application works as follows...

            1- Jmeter (200) users: Generate random POST params and send HTTP POST

            2- Server (Vertx): Receives POST

            1. Write to cache Then
            2. Query cache Then
            3. Do business logic

            3- Server: Return response to Jmeter

             

            For now lets make sure writes are working proper before moving to reads.

             

            By the way I have tested CQEngine: http://code.google.com/p/cqengine/ at 18,0000 writes + reads a second at 7ms response time (Over 30,000,000 records on one box) including network time and Jmeter (data generation) Of course it doesn't have all the fail-over goodies. But I'm looking for those kind of numbers. Which brings me to JMeter...

             

            As for the Jmeter, you would be surprised what an 8-core desktop can dish out hehe! But in reality the final latency doesn't concern me as much. An additional 4ms latency is good enough for me...

             

            The above application in a Hello World scenario. The server receives and HTTP POST and returns "Hello World!" response, the average latency is 2ms (visually inspected in Jmeter) at 40,000 requests/sec

            When adding POST parameters (Jmeter Generates them using the build in scripts) the average latency is 7ms this includes the network time and data generation at 25,0000 request/sec.

            I'm expecting infinispan to add maybe another 5ms on top of that. So if my average latency is about 15ms total which includes Jmeter, infinispan, network etc.... Then I'm a happy camper. The problem is that with the current configuration the average latency of infinispan is 200ms + and gets worse as more data is added.

             

            If we can figure out these issues then it would be amazing. I also explored building a map reduce solution with Vertx.io and CQEngine and on 7 machines I have tested 35,000,000 records at 15ms response times But I rather NOT go down that route lol

            • 3. Re: Couple thousands of puts/reads on 8 indexes NRT vs Ram Directory?
              javadevmtl

              I tried....

               

               

               

                      <namedCache name="myMap">

                              <clustering mode="distribution">

                                      <async/>

                                      <hash numOwners="2"/>

                              </clustering>

               

               

                              <!-- <eviction strategy="LIRS" maxEntries="30000000" /> -->

                              <indexing enabled="true" indexLocalOnly="true">

                                      <properties>

                                              <property name="default.directory_provider" value="knfinispan" />

                                      </properties>

                              </indexing>

                      </namedCache>

               

              This is still to slow it takes 8ms to write

               

              just doing simple cache.put based on the model in the original post.