1 2 Previous Next 26 Replies Latest reply on Jul 3, 2011 9:03 PM by kodadma

That is just too much overhead

kodadma Jun 6, 2011 8:48 PM

Hi, I am working on integration of a our Cache Store with ISPN 5. It took a couple days to get it up and running but ... our code (CacheStoreImpl.store(key, value) and CacheLoaderImpl.load(key) ) tooks only 9% of the overal Cache.put/Cache.get call. Most of the overhead can be attributed to inefficient implementation of LRU eviction algorithms and excecive locking. Guys, according YourKit profiler 66% of time is spent inside org.infinispan.util.concurrent.BoundedConcurrentHashMap$LRU.execute() method and 47% inside ReentrantLock.lock(). At the moment, there are not much we can do about it, so probaly we will be moving forward w/o ISPN integration unless we get this issue resolved somehow. Do not know yet - how. This is not a question per se but some facts FYI.

1. Re: That is just too much overhead

brackxm Jun 7, 2011 4:38 AM (in response to kodadma)

Maybe using LIRS would help.
Actions
2. Re: That is just too much overhead

dan.berindei Jun 7, 2011 7:50 AM (in response to kodadma)

Vladimir, could you tell us more about the setup of your test? Infinispan configuration, number of threads, access pattern... working code would be the best, so we can replicate your exact conditions and profile it ourselves.

47% inside ReentrantLock.lock() would suggest a large number of writers with a very small concurrency level, but it could also be a genuine problem in Infinispan.
Actions
3. Re: That is just too much overhead

mircea.markus Jun 7, 2011 8:20 AM (in response to dan.berindei)

Vladimir you might want to consider a different integration approach: given that your store is in-memory it might be better to integrate throught the DataContainer API rather than as a cache loader. The DataContainer is the place where ISPN keeps all its in-memory data. The calls you saw in org.infinispan.util.concurrent.BoundedConcurrentHashMap are actually work done by the current DataContainer implementation. In other words, the 66% time spent there would be something you can controll/improve.
This doesn't mean that we shouldn't look at the issues you raiased, and, as Dan mentioned, looking forward to get more imput from you in that direction.
Actions
4. Re: That is just too much overhead

kodadma Jun 7, 2011 2:08 PM (in response to dan.berindei)

This is my config

EmbeddedCacheManager manager = new DefaultCacheManager();

        Configuration config = new Configuration().fluent()
          .eviction()
            .maxEntries(20).strategy(EvictionStrategy.LRU)
            .wakeUpInterval(5000L)
          .expiration()
            .maxIdle(120000L)
            .loaders()
                .shared(false).passivation(false).preload(false)
            .addCacheLoader(
                    cfg
            ).build();

        manager.defineConfiguration("name", config);
        ispnCache = manager.getCache("name");

Test is multithreaded get/put (90% reads/ 10 % writes(inserts) ops) very eviction intensive. Keys and Values are byte arrays. Size of key is ~18 bytes , size of value is random between 500 and 1000 bytes.
Actions
5. Re: That is just too much overhead

kodadma Jun 7, 2011 2:09 PM (in response to mircea.markus)

Thanks for advice. I will take a look at DataContainer.
Actions
6. Re: That is just too much overhead

kodadma Jun 7, 2011 3:52 PM (in response to kodadma)

Some observations:

GETs:

56% is spent inside DefaultDataContainer.put()

which is called in LockingInterceptor.commitEntry()

Can somebody explain why put is called during get operation?
Actions
7. Re: That is just too much overhead

kodadma Jun 7, 2011 6:12 PM (in response to kodadma)

I just ran our performance test with vanilla ISPN

Cache configuration:

      Configuration config = new Configuration().fluent()
          .eviction()
            .maxEntries(1000000).strategy(EvictionStrategy.LRU)
            .wakeUpInterval(5000L)
          .expiration()
            .maxIdle(120000L)
            .build();

Yes, 1M entries

OS: Mac OSX 10.6
RAM : 4GB
CPU: Intel Core Duo 2.5Gh
Java: 1.6 (latest)

-server -Xms1500M -Xmx1500M -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:NewSize=64M -XX:SurvivorRatio=16 -XX:+CMSIncrementalMode

Test description: 2 threads do GET/PUT (90/10) with Key=18 bytes and Value=[500,1000] bytes (byte arrays) continuosly.

These are numbers aftre approx 10min run:

RPS=57336.240379784125
MAX=1.4162271E7
AVG=35.0263816094324
MEDIAN=0.7612096526397748
99%=1.9409821753803937
99.9%=11.276986989380372
99.99%=105793.00100340837
SIZE=N/A
ITEMS=538805

Decypher:

~57K request per sec (Only!!!)
Max latency = 14 sec (!!!) - I have no idea how it is possible only with 256M heap (during all run I had ~600M free memory)
Avg request latency = 35 microseconds
Median latency = 0.7 microseconds
99% latency = 1.9 microsecond
99.9% = 11 microseconds
99.99% = 105 millisecond

To give you some numbers for comparison:

Ehcache with 1M item limit and on the same test (with LRU eviction enabled) gives:

RPS=530324.4058968113
MAX=172420.0
AVG=3.7638782342298684
MEDIAN=1.21882038338227
99%=24.80277869707409
99.9%=35.70542538684415
99.99%=1211.0108030386218
SIZE=N/A
ITEMS=1000000

Order of magnitude better.

Possibly I am running ISPN with not an optimal configuration but Ehcache uses default one (out of box). My preliminary observations on ISPN :

1. Absolutely inefficient EvictionAlgo implementation - very slow. (This is inside DataContainer and can be fixed probably).
2. Very high pressure on GC (it seems that there are memory leaks as well - need to verify it)
2. Some very strange code inside LockingInterceptor which calls put during get operations (see my previous comment).

Unfortunately, in a current version 5RC3 ISPN does not seem to be a production ready product and needs additional cycles of tuning and QA testing.
Actions
8. Re: That is just too much overhead

manik Jun 23, 2011 11:26 AM (in response to kodadma)

Could you try with 5.0.0.CR6? A lot has changed since CR3, specifically around eviction, etc. Also if you have profiler snapshots you can share with us, we'd appreciate that...
Actions
9. Re: That is just too much overhead

manik Jun 23, 2011 11:29 AM (in response to kodadma)

This could happen if the get() triggers a load() from the CacheStore since the key wasn't in memory but did exist in the store. You are using a cache store, correct?

Your configuration has some very aggressive eviction (just 20 entries in memory?) this would cause a lot of thrashing as stuff is paged to disk and back into memory again - not dissimilar to an operating system's virtual memory thrashing. Is this a mistake (max entries = 20) or is this intentional?
Actions
10. Re: That is just too much overhead

dan.berindei Jun 23, 2011 12:37 PM (in response to manik)

Manik, I think the aggresive eviction was intentional, as he basically wants to keep all the data in his off-heap cache store implementation.
In a later post he did change his configuration to 1000000 entries though, and our LRU still performed pretty bad.

Vladimir, did you make any progress in implementing your own DataContainer? I didn't get to run your test yet, but if you still want to integrate in ISPN as a cache store I would suggest using the UNORDERED strategy instead of LRU - since you're also storing things in memory it shouldn't matter exactly which elements get passivated.
Actions
11. Re: That is just too much overhead

kodadma Jun 23, 2011 1:17 PM (in response to manik)

Hi, Manik

I will ty CR6 later on this week. I just ran some stress tests yestreday on a quite big server box (8CPU cores + 32GB memory) to compare different caches with very large heaps (or off-heaps). The Java heap was set to 28Gb for those w/o off-heap storages. ISPN CR3 failed to give me any meaningfull performance numbers. It worked perfectly (3.5M requests per sec) until it reached the item limit (30M) than all thread got stuck (I think because eviction kicked in) indefinetely. I waited for 10 minutes and then shut it down. This is LRU Eviction again? Another observation - only 2 threads out of 16 were active at that time.

Integrating off-heap storage as a DataContainer will make sense when either eviction will be fixed or you will get me the easy route to bypass ISPN eviction at all but we can not turn it off as far as I understood internals.

How about FastLRU? Make entries sampling and keep only 20-30 random entries in a eviction candidate list. This is pretty lightweight algo.

I ran all tests with the following HotSpot parameters:

-server -Xms28G -Xmx28G -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:SurvivorRatio=16
Actions
12. Re: That is just too much overhead

kodadma Jun 23, 2011 1:18 PM (in response to dan.berindei)

Dan, I have tried all possible eviction algorithms with the same result.
Actions
13. Re: That is just too much overhead

vblagojevic Jun 23, 2011 3:19 PM (in response to kodadma)

Hi Vladimir,

I get a feeling that you are changing too many test variables without first establishing a baseline of a test we can all agree with. How about we do the following. Lets use CR6, I'll redo the performance tests we have on BCHM to see if there are any regressions. In the meantime would you please see what is the performance impact when no cache loader is configured in your setup. I would like to repeat your tests setup so would you post your test files?

Regards,
Vladimir
Actions
14. Re: That is just too much overhead

kodadma Jun 23, 2011 4:42 PM (in response to vblagojevic)

Not really, I am testing vanilla ISPN with a configuration of a cache posted in this thread on June 7th. You can easily repeat this test, just do not forget to replace maxEntries with a number of your choice. I can not post any files related to the test case right now (will do it later) but I can describe it:

1. It is continuosly running multi-threaded (R/W) test. R and W are configurable. I am running with 90% reads and 10% writes. Number of threads is configurable as well.
2. keys are byte[] arrays of approx 18 bytes long (size varies a little bit)
3. values are byte[] arrays of 500-1000 bytes long (uniformly distributed)

private static void initCacheInfinispan() {
        EmbeddedCacheManager manager = new DefaultCacheManager();
        Configuration config = new Configuration().fluent()
          .eviction()
            .maxEntries((int)sCacheItemsLimit).strategy(EvictionStrategy.LRU)
            .wakeUpInterval(5000L)
          .expiration()
            .maxIdle(120000L)
            .build();

        manager.defineConfiguration("name", config);
        ispnCache = manager.getCache("name");

    }

In a ineer loop, you first decide on Read or Write ops then, create Key or Key-Value pair and perform op.

I need time to remove from test our specific code then I will post sources.
Actions

1 2 Previous Next

Go to original post