1 2 Previous Next 26 Replies Latest reply on Jul 3, 2011 9:03 PM by kodadma

    That is just too much overhead

    kodadma

      Hi, I am working on integration of a our Cache Store with ISPN 5. It took a couple days to get it up and running but ... our code (CacheStoreImpl.store(key, value) and CacheLoaderImpl.load(key) ) tooks only 9% of the overal  Cache.put/Cache.get call.  Most of the overhead can be attributed to inefficient implementation of LRU eviction algorithms and excecive locking. Guys, according YourKit profiler 66% of time is spent inside org.infinispan.util.concurrent.BoundedConcurrentHashMap$LRU.execute() method and 47% inside ReentrantLock.lock().  At the moment, there are not much we can do about it, so probaly we will be moving forward w/o ISPN integration unless we get this issue resolved somehow. Do not know yet - how. This is not a question per se but some facts FYI.


        • 1. Re: That is just too much overhead
          brackxm

          Maybe using LIRS would help.

          • 2. Re: That is just too much overhead
            dan.berindei

            Vladimir, could you tell us more about the setup of your test? Infinispan configuration, number of threads, access pattern... working code would be the best, so we can replicate your exact conditions and profile it ourselves.

             

            47% inside ReentrantLock.lock() would suggest a large number of writers with a very small concurrency level, but it could also be a genuine problem in Infinispan.

            • 3. Re: That is just too much overhead
              mircea.markus

              Vladimir you might want to consider a different integration approach: given that your store is in-memory it might be better to integrate throught the DataContainer  API rather than as a cache loader. The DataContainer is the place where ISPN keeps all its in-memory data. The calls you saw in org.infinispan.util.concurrent.BoundedConcurrentHashMap are actually work done by the current DataContainer implementation. In other words, the 66% time spent there would be something you can controll/improve.

              This doesn't mean that we shouldn't look at the issues you raiased, and, as Dan mentioned, looking forward to get more imput from you in that direction.

              • 4. Re: That is just too much overhead
                kodadma

                This is my config

                 

                EmbeddedCacheManager manager = new DefaultCacheManager();

                 

                        Configuration config = new Configuration().fluent()

                          .eviction()

                            .maxEntries(20).strategy(EvictionStrategy.LRU)

                            .wakeUpInterval(5000L)

                          .expiration()

                            .maxIdle(120000L)

                            .loaders()

                                .shared(false).passivation(false).preload(false)

                            .addCacheLoader(

                                    cfg

                            ).build();

                 

                        manager.defineConfiguration("name", config);

                        ispnCache  = manager.getCache("name");

                 

                 

                Test is multithreaded get/put (90% reads/ 10 % writes(inserts) ops) very eviction intensive. Keys and Values are byte arrays. Size of key is ~18 bytes , size of value is random between 500 and 1000 bytes.

                • 5. Re: That is just too much overhead
                  kodadma

                  Thanks for advice. I will take a look at DataContainer.

                  • 6. Re: That is just too much overhead
                    kodadma

                    Some observations:

                     

                    GETs:

                     

                    56% is spent inside DefaultDataContainer.put()

                     

                    which is called in LockingInterceptor.commitEntry()

                     

                    Can somebody explain why put is called during get operation?

                    • 7. Re: That is just too much overhead
                      kodadma

                      I just ran our performance test with vanilla ISPN

                       

                      Cache configuration:

                       

                            Configuration config = new Configuration().fluent()

                                .eviction()

                                  .maxEntries(1000000).strategy(EvictionStrategy.LRU)

                                  .wakeUpInterval(5000L)

                                .expiration()

                                  .maxIdle(120000L)

                                  .build();

                       

                      Yes, 1M entries

                       

                      OS: Mac OSX 10.6

                      RAM : 4GB

                      CPU: Intel Core Duo 2.5Gh

                      Java: 1.6 (latest)

                       

                      -server -Xms1500M -Xmx1500M -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:NewSize=64M -XX:SurvivorRatio=16 -XX:+CMSIncrementalMode

                       

                      Test description: 2 threads do GET/PUT (90/10) with Key=18 bytes and Value=[500,1000] bytes (byte arrays) continuosly.

                       

                       

                      These are numbers aftre approx 10min run:

                       

                      RPS=57336.240379784125

                      MAX=1.4162271E7

                      AVG=35.0263816094324

                      MEDIAN=0.7612096526397748

                      99%=1.9409821753803937

                      99.9%=11.276986989380372

                      99.99%=105793.00100340837

                      SIZE=N/A

                      ITEMS=538805

                       

                      Decypher:

                       

                      ~57K request per sec (Only!!!)

                      Max latency = 14 sec (!!!) - I have no idea how it is possible only with 256M heap (during all run I had ~600M free memory)

                      Avg request latency = 35 microseconds

                      Median latency = 0.7 microseconds

                      99% latency = 1.9 microsecond

                      99.9% = 11 microseconds

                      99.99% = 105 millisecond

                       

                      To give you some numbers for comparison:

                       

                      Ehcache with 1M item limit and on the same test (with LRU eviction enabled) gives:

                       

                      RPS=530324.4058968113

                      MAX=172420.0

                      AVG=3.7638782342298684

                      MEDIAN=1.21882038338227

                      99%=24.80277869707409

                      99.9%=35.70542538684415

                      99.99%=1211.0108030386218

                      SIZE=N/A

                      ITEMS=1000000

                       

                      Order of magnitude better.

                       

                      Possibly I am running ISPN with not an optimal configuration but Ehcache uses default one (out of box). My preliminary observations on ISPN :

                       

                      1. Absolutely inefficient EvictionAlgo implementation - very slow. (This is inside DataContainer and can be fixed probably).

                      2. Very high pressure on GC (it seems that there are memory leaks as well - need to verify it)

                      2. Some very strange code inside LockingInterceptor which calls put during get operations (see my previous comment).

                       

                      Unfortunately, in a current version 5RC3 ISPN does not seem to be a production ready product and needs additional cycles of tuning and QA testing.

                      • 8. Re: That is just too much overhead
                        manik

                        Could you try with 5.0.0.CR6?  A lot has changed since CR3, specifically around eviction, etc.  Also if you have profiler snapshots you can share with us, we'd appreciate that...

                        • 9. Re: That is just too much overhead
                          manik

                          This could happen if the get() triggers a load() from the CacheStore since the key wasn't in memory but did exist in the store.  You are using a cache store, correct?

                           

                          Your configuration has some very aggressive eviction (just 20 entries in memory?) this would cause a lot of thrashing as stuff is paged to disk and back into memory again - not dissimilar to an operating system's virtual memory thrashing.  Is this a mistake (max entries = 20) or is this intentional?

                          • 10. Re: That is just too much overhead
                            dan.berindei

                            Manik, I think the aggresive eviction was intentional, as he basically wants to keep all the data in his off-heap cache store implementation.

                            In a later post he did change his configuration to 1000000 entries though, and our LRU still performed pretty bad.

                             

                            Vladimir, did you make any progress in implementing your own DataContainer? I didn't get to run your test yet, but if you still want to integrate in ISPN as a cache store I would suggest using the UNORDERED strategy instead of LRU - since you're also storing things in memory it shouldn't matter exactly which elements get passivated.

                            • 11. Re: That is just too much overhead
                              kodadma

                              Hi, Manik

                               

                              I will ty CR6 later on this week. I just ran some stress tests yestreday on a quite big server box (8CPU cores + 32GB memory) to compare different caches with very large heaps (or off-heaps). The Java heap was set to 28Gb for those w/o off-heap storages. ISPN CR3 failed to give me any meaningfull performance numbers. It worked perfectly (3.5M requests per sec) until it reached the item limit (30M) than all thread got stuck (I think because eviction kicked in) indefinetely. I waited for 10 minutes and then shut it down. This is LRU Eviction again?    Another observation - only 2 threads out of 16 were active at that time.

                               

                              Integrating off-heap storage as a DataContainer will make sense when either eviction will be fixed or you will get me the easy route to bypass ISPN eviction at all but we can not turn it off as far as I understood internals.

                               

                              How about FastLRU? Make entries sampling and keep only 20-30 random entries in  a eviction candidate list. This is pretty lightweight algo.

                               

                              I ran all tests with the following HotSpot parameters:

                               

                              -server -Xms28G -Xmx28G -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:SurvivorRatio=16

                              • 12. Re: That is just too much overhead
                                kodadma

                                Dan, I have tried all possible eviction algorithms with the same result.

                                • 13. Re: That is just too much overhead
                                  vblagojevic

                                  Hi Vladimir,

                                   

                                  I get a feeling that you are changing too many test variables without first establishing a baseline of a test we can all agree with. How about we do the following. Lets use CR6, I'll redo the  performance tests we have on BCHM to see if there are any regressions. In the meantime would you please see what is the performance impact  when no cache loader is configured in your setup. I would like to repeat your tests setup so would you post your test files?

                                   

                                  Regards,

                                  Vladimir

                                  • 14. Re: That is just too much overhead
                                    kodadma

                                    Not really, I am testing vanilla ISPN with a configuration of a cache posted in this thread on June 7th. You can easily repeat this test, just do not forget to replace maxEntries with a number of your choice. I can not post any files related to the test case right now (will do it later) but I can describe it:

                                     

                                    1. It is continuosly running multi-threaded (R/W) test. R and W are configurable. I am running with 90% reads and 10% writes. Number of threads is configurable as well.

                                    2. keys are byte[] arrays of approx 18 bytes long (size varies a little bit)

                                    3. values are byte[] arrays of 500-1000 bytes long (uniformly distributed)

                                     

                                    private static void initCacheInfinispan() {

                                            EmbeddedCacheManager manager = new DefaultCacheManager();

                                            Configuration config = new Configuration().fluent()

                                              .eviction()

                                                .maxEntries((int)sCacheItemsLimit).strategy(EvictionStrategy.LRU)

                                                .wakeUpInterval(5000L)

                                              .expiration()

                                                .maxIdle(120000L)

                                                .build();

                                     

                                            manager.defineConfiguration("name", config);

                                            ispnCache = manager.getCache("name");

                                     

                                     

                                        }

                                     

                                    In a ineer loop, you first decide on Read or Write ops then, create Key or Key-Value pair and perform op.

                                     

                                     

                                    I need time to remove from test our specific code then I will post sources.

                                    1 2 Previous Next