1 2 Previous Next 19 Replies Latest reply on Jul 26, 2008 6:41 PM by phpguy99

    JBoss Cache performance looks pretty poor :(

    phpguy99

      Hi,
      I'm evaluating various java-based distributed caching solutions: JBoss Cache (version 2.1.1 GA), EHCache, and TerraCotta.
      My data size is 10 million objects and using 3 nodes each with 10GB HeapSize.
      So far EHCache is proven to be very fast, reliable (never goes down for many tests iteration for many hours), and uses small footprint. I can do 40,000 puts/second on 3 nodes cluster.

      Ok - I'm here to ask about JBoss Cache.
      Putting 1 million Objects (small ones - consisting of 3-4 Strings), the rate is like 2000/second, and once I start the second node the second node just dies right away giving me this error:
      org.jboss.cache.CacheException: Unable to fetch state on startup
      .....

      The memory usage is way too high. 1 million objects give require 2GB of Heap (after I GC'd of course and watched this from JConsole).

      They way I'm using the tree cache is I create *all* 1 million objects on their own node/fqn. So I have ROOT/Object-1, ROOT/Object-2 ... ROOT/Object-1000000.

      My configuration file:
      <?xml version="1.0" encoding="UTF-8"?>


      jboss:service=Naming
      jboss:service=TransactionManager

      org.jboss.cache.transaction.GenericTransactionManagerLookup


      READ_COMMITTED

      false

      REPL_SYNC

      JBossCache-Cluster


      <UDP mcast_addr="228.1.2.3" mcast_port="48866"
      ip_ttl="64" ip_mcast="true"
      mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
      ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
      loopback="false"/>
      <PING timeout="2000" num_initial_members="3"/>
      <MERGE2 min_interval="10000" max_interval="20000"/>

      <FD_SOCK/>
      <VERIFY_SUSPECT timeout="1500"/>
      <pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800" />

      <pbcast.STABLE desired_avg_gossip="400000"/>
      <FC max_credits="2000000" min_threshold="0.10"/>
      <FRAG2 frag_size="8192"/>
      <pbcast.GMS join_timeout="5000" shun="true" print_local_addr="true"/>
      <pbcast.STATE_TRANSFER/>


      20000
      20000
      15000



      I believe I'm missing something here so be great if anybody can help.
      Thanks a lot.

        • 1. Re: JBoss Cache performance looks pretty poor :(
          phpguy99

          my xml configuration didn't get posted correctly. :(

          • 2. Re: JBoss Cache performance looks pretty poor :(
            jason.greene

            You have to wrap xml in a code bbcode tag for it to look right.

            You probably need to up the StateRetrievalTimeout if you have that much state. Could you do a jmap -histo on the process to see whats taking up all that space?

            Thanks

            • 3. Re: JBoss Cache performance looks pretty poor :(
              jason.greene

              Also could you tell us about your EHCache config, where you using asynchronous replication for example?

              • 4. Re: JBoss Cache performance looks pretty poor :(
                phpguy99

                Reposting my config again now with code-tag:

                <?xml version="1.0" encoding="UTF-8"?>


                jboss:service=Naming
                jboss:service=TransactionManager

                org.jboss.cache.transaction.GenericTransactionManagerLookup


                READ_COMMITTED

                false

                REPL_SYNC

                JBossCache-Cluster


                <UDP mcast_addr="228.1.2.3" mcast_port="48866"
                ip_ttl="64" ip_mcast="true"
                mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
                ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
                loopback="false"/>
                <PING timeout="2000" num_initial_members="3"/>
                <MERGE2 min_interval="10000" max_interval="20000"/>

                <FD_SOCK/>
                <VERIFY_SUSPECT timeout="1500"/>
                <pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800" />

                <pbcast.STABLE desired_avg_gossip="400000"/>
                <FC max_credits="2000000" min_threshold="0.10"/>
                <FRAG2 frag_size="8192"/>
                <pbcast.GMS join_timeout="5000" shun="true" print_local_addr="true"/>
                <pbcast.STATE_TRANSFER/>



                20000

                20000

                15000




                $ jmap -histo 14359 ## for 200,000 of my object in the cache.

                num #instances #bytes class name
                ----------------------------------------------
                1: 6058302 290798496 java.util.concurrent.locks.ReentrantLock$NonfairSync
                2: 6058260 290796480 java.util.concurrent.ConcurrentHashMap$Segment
                3: 6058260 198060352 [Ljava.util.concurrent.ConcurrentHashMap$HashEntry;
                4: 379178 57632968 [Ljava.util.HashMap$Entry;
                5: 378642 57553488 [Ljava.util.concurrent.ConcurrentHashMap$Segment;
                6: 765551 57436024 [C
                7: 378601 33316888 org.jboss.cache.UnversionedNode
                8: 763508 30540320 java.lang.String
                9: 378601 30288080 org.jboss.cache.invocation.NodeInvocationDelegate
                10: 378642 27262224 java.util.concurrent.ConcurrentHashMap
                11: 378600 27259200 org.jboss.cache.lock.NonBlockingWriterLock
                12: 379058 24259712 java.util.HashMap
                13: 383949 18429552 java.util.concurrent.ConcurrentHashMap$HashEntry
                14: 379889 18234672 java.util.HashMap$Entry
                15: 378600 18172800 org.jboss.cache.lock.IdentityLock
                16: 396538 15881688 [Ljava.lang.Object;
                17: 384436 15377440 java.util.ArrayList
                18: 381480 15259200 org.jboss.cache.Fqn
                19: 378599 15143960 com.ssn.jbosscache.Meter
                20: 378600 12115200 org.jboss.cache.lock.LockMap
                21: 378600 9086400 org.jboss.cache.lock.ReadWriteLockWithUpgrade$WriterLock
                22: 378600 9086400 org.jboss.cache.lock.ReadWriteLockWithUpgrade$ReaderLock
                23: 378600 9086400 org.jboss.cache.util.concurrent.ConcurrentHashSet
                24: 378600 9086400 org.jboss.cache.lock.LockStrategyReadCommitted
                25: 21274 2804184
                26: 21274 2560560
                27: 1838 2036776
                28: 33111 1620128
                29: 1838 1356816
                30: 1612 1315584
                31: 2919 552944 [I
                32: 2123 544336 [B
                33: 2989 454328 java.lang.reflect.Method
                34: 2008 369472 java.lang.Class


                • 5. Re: JBoss Cache performance looks pretty poor :(
                  phpguy99

                  I'm using:
                  CacheMode=REPL_SYNC
                  IsolationLevel=READ_COMMITTED
                  LockParentForChildInsertRemove=false
                  StateRetrievalTimeout=20000
                  SyncReplTimeout=20000=
                  LockAcquisitionTimeout=15000

                  I put each of my object into a Fqn. So the ROOT has 1,000,000 direct children.

                  jdk: 1.6.0_07
                  OS: RHEL 5.1 on 8 cores Xeon
                  all my nodes are on the same subnet (and same switch).

                  • 6. Re: JBoss Cache performance looks pretty poor :(
                    manik

                    Is all your data placed under the same Fqn? All locking and replication granularity happens on a per-Node (Fqn) basis. I would recommend making better use of the tree structure of the cache and spreading your state around a bit better.

                    Also, if you don't need the atomicity guarantees of REPL_SYNC (and there is not much you can do with it if you don't use transactions anyway) you're better off using REPL_ASYNC.

                    Cheers,
                    Manik

                    • 7. Re: JBoss Cache performance looks pretty poor :(
                      phpguy99

                      That is why I place all my data into their own Fqn. I have 1,000,000 objects (Meters) and each of them is constructed using:

                      Fqn fqn = new Fqn("root", "MeterID" + i);
                      Node<Object,Object> node = rootNode.addChild(fqn);
                      node.put("data", meter);

                      Does it mean I *over* do it? Instead of using a sunshine structure (1 level), should I structure my objects to fit into something like radix-tree?

                      The reason I'm using REPL_SYNC is to be fair to EHCache since I'm using SYNC, too, as well as Terracotta (write-lock).

                      Thanks for the quick response :)

                      • 8. Re: JBoss Cache performance looks pretty poor :(
                        jason.greene

                        You have to use the squary brackets with code since its bbcode see:
                        http://en.wikipedia.org/wiki/BBCode

                        BTW in addition to Manik's suggestions, you are also experiencing this bug (will be fixed in the next 2.2 release), which is why your memory usage is so high:

                        http://jira.jboss.org/jira/browse/JBCACHE-1383
                        http://www.jboss.com/index.html?module=bb&op=viewtopic&t=138338

                        -Jason

                        • 9. Re: JBoss Cache performance looks pretty poor :(
                          manik

                          spreading your stuff across the tree structure will help. The node structure is maintained using a CHM per Node to hold references to its children. And these CHMs are tuned for a lower-than-normal memory footprint so this means having lots of children per node will hurt concurrency. I'd recommend not putting more than 50 children per node and going as deep as you have to.

                          Also, re: your state retrieval, 20000 is pretty low (20 seconds) and if you have a lot of state, there is no way you will be able to transfer all that in 20 secs! :-)

                          • 10. Re: JBoss Cache performance looks pretty poor :(
                            phpguy99

                            That is the kind of advice I liked to hear. I may have missed it - but I don't recall reading about the proper way to spread objects into the tree to get maximum performance.
                            Thanks. I'll modify the code and the configuration and download 2.2 beta and test it again.

                            • 11. Re: JBoss Cache performance looks pretty poor :(
                              manik

                              2.2 is in CR6 and very close to a GA release. :-)

                              If you want to have some more fun, check out 3.0.0.ALPHA which I recently released. Early benchmarks show that it is *much* faster.

                              http://jbosscache.blogspot.com/2008/07/jboss-cache-300-naga-first-alpha-now.html

                              • 12. Re: JBoss Cache performance looks pretty poor :(
                                phpguy99

                                Really would like to have that fun but I'm evaluating it for production use in the next 3-4 months.
                                I downloaded 2.2 CR6 and changed the:
                                StateRetrievalTimeout=600000 (5 minutes)
                                pbcast.GMS join_timeout="60000"

                                I haven't changed my code to spread objects further down the tree.
                                BTW, this seems odd or I may have missed something, but shouldn't the cache system do this spreading behind the scene? Depends on the key of the objects to cache, it could be difficult to balance the tree. And to know the "path" before I can do a "get". It's much simpler to do straight "key" lookup. (just my 2cents)

                                Back to performance and memory.
                                It's stable now with 2 nodes. The rate of my insert is increased from 2000/s to 4000/s. This is one at a time and SYNC that is 0.25ms/operation which is very good and multithreaded should increase this by a lot (I hope).
                                But the memory consumption is still very high (maybe b/c I put everything right under "root"). 4GB for my 1M objects

                                 num #instances #bytes class name
                                ----------------------------------------------
                                 1: 16000897 768043056 java.util.concurrent.locks.ReentrantLock$NonfairSync
                                 2: 16000820 768039360 java.util.concurrent.ConcurrentHashMap$Segment
                                 3: 16000820 528806520 [Ljava.util.concurrent.ConcurrentHashMap$HashEntry;
                                 4: 43792 323829848 [I
                                 5: 2021461 153453344 [C
                                 6: 1002432 152294120 [Ljava.util.HashMap$Entry;
                                 7: 1000052 152007808 [Ljava.util.concurrent.ConcurrentHashMap$Segment;
                                 8: 1000002 88000176 org.jboss.cache.UnversionedNode
                                 9: 2016545 80661800 java.lang.String
                                 10: 1000002 80000160 org.jboss.cache.invocation.NodeInvocationDelegate
                                 11: 1000052 72003744 java.util.concurrent.ConcurrentHashMap
                                 12: 1000002 72000144 org.jboss.cache.lock.NonBlockingWriterLock
                                 13: 1002113 64135232 java.util.HashMap
                                 14: 1001773 48085104 java.util.HashMap$Entry
                                 15: 1000281 48013488 java.util.concurrent.ConcurrentHashMap$HashEntry
                                 16: 1000002 48000096 org.jboss.cache.lock.IdentityLock
                                 17: 1008529 40619728 [Ljava.lang.Object;
                                 18: 1000771 40030840 java.util.ArrayList
                                 19: 1000004 40000160 org.jboss.cache.Fqn
                                 20: 1000002 40000080 java.util.RegularEnumSet
                                 21: 1000000 40000000 com.ssn.jbosscache.Meter (my objects)
                                
                                I constantly see:
                                
                                2008-07-23 10:09:47,929 [Incoming,JBossCache-Cluster,10.57.132.54:38174] WARN org.jgroups.protocols.pbcast.NAKACK.handleMessage - 10.57.132.54:38174] discarded message from non-member 10.57.132.53:33187, my view is [10.57.132.54:38174|0] [10.57.132.54:38174]
                                
                                




                                • 13. Re: JBoss Cache performance looks pretty poor :(
                                  manik

                                   

                                  "phpguy99" wrote:
                                  Really would like to have that fun but I'm evaluating it for production use in the next 3-4 months.


                                  I would still recommend trying it out - I may push out 3.0.0 fairly quickly (next 2 mths), the major bits are ready and a lot of people are keen start using it.

                                  Either way, it should be a painless upgrade path from 2.2.0.

                                  "phpguy99" wrote:

                                  I haven't changed my code to spread objects further down the tree.
                                  BTW, this seems odd or I may have missed something, but shouldn't the cache system do this spreading behind the scene? Depends on the key of the objects to cache, it could be difficult to balance the tree. And to know the "path" before I can do a "get". It's much simpler to do straight "key" lookup. (just my 2cents)


                                  I agree - but there are always 2 sides to that argument. Some people want the more direct control, some don't. It is on our roadmap as an option, and we do have an implementation that someone contributed that may even make it into 3.0.0.

                                  See https://jira.jboss.org/jira/browse/JBCACHE-67
                                  and
                                  https://jira.jboss.org/jira/browse/JBCACHE-941.

                                  Cheers
                                  Manik



                                  • 14. Re: JBoss Cache performance looks pretty poor :(
                                    jason.greene

                                     

                                    "phpguy99" wrote:

                                    I haven't changed my code to spread objects further down the tree.
                                    BTW, this seems odd or I may have missed something, but shouldn't the cache system do this spreading behind the scene? Depends on the key of the objects to cache, it could be difficult to balance the tree. And to know the "path" before I can do a "get". It's much simpler to do straight "key" lookup. (just my 2cents)


                                    Could you tell us about your access patterns? How often do you insert? How many simultaneous writers will you have on the same server/process? A node currently has 4 segments, so it's tuned for allowing 4 simultaneous child node inserts. So, only if you have > 4 concurrent threads (more than 4 cpu cores) all inserting at the same time, that could become a bottleneck.

                                    If this is the case, and you need to spread, you don't really need to do active balancing, just a simple modulus of a spread would work fine. Like
                                    x = ID % 10000
                                    fqn = /x/ID
                                    


                                    We do plan to make this concurrency level configurable in 3.0, so you won't need to spread things out if you don't need to.


                                    Back to performance and memory.
                                    It's stable now with 2 nodes. The rate of my insert is increased from 2000/s to 4000/s. This is one at a time and SYNC that is 0.25ms/operation which is very good and multithreaded should increase this by a lot (I hope).
                                    But the memory consumption is still very high (maybe b/c I put everything right under "root"). 4GB for my 1M objects


                                    You are still experiencing JBCACHE-1383. Which causes 16 CHM segments and locks to be created per node. The update reduced it to 4. It is not yet in a release, although you can build the latest 2.2.x branch if you want. The MVCC locking mode in 3.0 completely eliminates the 4 CHM segment and lock overhead, so you might want to give that a try.


                                    I constantly see:
                                    2008-07-23 10:09:47,929 [Incoming,JBossCache-Cluster,10.57.132.54:38174] WARN org.jgroups.protocols.pbcast.NAKACK.handleMessage - 10.57.132.54:38174] discarded message from non-member 10.57.132.53:33187, my view is [10.57.132.54:38174|0] [10.57.132.54:38174]


                                    This could indicate you have other traffic on the same multicast address. You might want to make sure the nodes in your cluster are the only ones using that address.

                                    1 2 Previous Next