9 Replies Latest reply on Dec 1, 2005 7:40 PM by jlukar

    putObject slows  gradually with high number of objects in ca

    jlukar

      Hi, sorry for double post, but first one results in forum software croaking for some reason.


      I am trying to do some performance benchmark testing and POC to see if I can use JBossCache for a high performance stock order management system that has some semi-realtime performance requirements.

      I have business objects that I push into the cache and timing the putObject(). I am seeing the gradual slow down of these puts as the number of objects in the cache increase.

      is this expected behaviour ?

      The first fill-up-loop that inserted 10000 objects into cache had following stats: Average PUT time for 10000 objects: 12 millseconds, Smallest PUT time: 8 milliseconds, Largest PUT time: 1718 milliseconds, Total PUT time: 121295 millisconds.


      In the same test case further down, I tried another fill-up-loop into the same cache for 100 business objects. The stats: Average PUT time for 100 objects: 10 millseconds, Smallest PUT time: 8 milliseconds, Largest PUT time: 42 milliseconds, Total PUT time: 1027 millisconds.



      The getObject() were very fast and typcialy under 1 milisecond, which is good.


      So what I am wondering is if putObject() graducal slowdown is expected behaviour or not ? If so, a 2 second put time for a business object is not acceptable in any application, regardless of real-time performance requirements or not. Can someone shed some light on this.

      I have included full output below to see if anyone can spot any missteps with my simple benchmarking. Maybe someone can suggest improvements.




      JUNIT OUTPUT:
      [11:34:59.227] [INFO ] [cache.PropertyConfigurator][] - Found existing property editor for org.w3c.dom.Element: org.jboss.util.propertyeditor.ElementEditor@1236cd5
      [11:34:59.243] [INFO ] [cache.PropertyConfigurator][configure] - configure(): attribute size: 14
      [11:34:59.248] [INFO ] [cache.TreeCache ][setClusterConfig] - setting cluster properties from xml to: UDP(ip_mcast=true;ip_ttl=64;loopback=false;mcast_addr=228.1.2.3;mcast_port=48866;mcast_recv_buf_size=80000;mcast_send_buf_size=150000;ucast_recv_buf_size=80000;ucast_send_buf_size=150000):PING(down_thread=false;num_initial_members=3;timeout=2000;up_thread=false):MERGE2(max_interval=20000;min_interval=10000):FD_SOCK:VERIFY_SUSPECT(down_thread=false;timeout=1500;up_thread=false):pbcast.NAKACK(down_thread=false;gc_lag=50;max_xmit_size=8192;retransmit_timeout=600,1200,2400,4800;up_thread=false):UNICAST(down_thread=false;min_threshold=10;timeout=600,1200,2400;window_size=100):pbcast.STABLE(desired_avg_gossip=20000;down_thread=false;up_thread=false):FRAG(down_thread=false;frag_size=8192;up_thread=false):pbcast.GMS(join_retry_timeout=2000;join_timeout=5000;print_local_addr=true;shun=true):pbcast.STATE_TRANSFER(down_thread=true;up_thread=true)
      [11:34:59.253] [INFO ] [cache.TreeCache ][createPessimisticInterceptorChain] - interceptor chain is:
      class org.jboss.cache.interceptors.CallInterceptor
      class org.jboss.cache.interceptors.PessimisticLockInterceptor
      class org.jboss.cache.interceptors.UnlockInterceptor
      class org.jboss.cache.interceptors.ReplicationInterceptor
      [11:34:59.255] [INFO ] [cache.TreeCache ][_createService] - cache mode is REPL_SYNC
      [11:34:59.270] [INFO ] [protocols.UDP ][createSockets] - sockets will use interface 10.58.8.75
      [11:34:59.274] [INFO ] [protocols.UDP ][createSockets] - socket information:
      local_addr=10.58.8.75:32883, mcast_addr=228.1.2.3:48866, bind_addr=/10.58.8.75, ttl=64
      sock: bound to 10.58.8.75:32883, receive buffer size=80000, send buffer size=131071
      mcast_recv_sock: bound to 10.58.8.75:48866, send buffer size=131071, receive buffer size=80000
      mcast_send_sock: bound to 10.58.8.75:32884, send buffer size=131071, receive buffer size=80000

      -------------------------------------------------------
      GMS: address is 10.58.8.75:32883
      -------------------------------------------------------
      [11:35:01.285] [INFO ] [cache.TreeCache ][startService] - my local address is 10.58.8.75:32883
      [11:35:01.286] [INFO ] [cache.TreeCache ][viewAccepted] - viewAccepted(): [10.58.8.75:32883|0] [10.58.8.75:32883]
      [11:35:01.288] [INFO ] [cache.TreeCache ][setState] - new cache is null (may be first member in cluster)
      [11:35:01.288] [INFO ] [cache.TreeCache ][fetchStateOnStartup] - state could not be retrieved (must be first member in group)
      [11:35:01.290] [INFO ] [cache.TreeCache ][startService] - Cache is started!!
      [11:35:01.293] [INFO ] [cache.PropertyConfigurator][] - Found existing property editor for org.w3c.dom.Element: org.jboss.util.propertyeditor.ElementEditor@be49e0
      [11:35:01.328] [INFO ] [cache.PropertyConfigurator][configure] - configure(): attribute size: 14
      [11:35:01.345] [INFO ] [cache.TreeCache ][setClusterConfig] - setting cluster properties from xml to: UDP(ip_mcast=true;ip_ttl=64;loopback=false;mcast_addr=228.1.2.3;mcast_port=48866;mcast_recv_buf_size=80000;mcast_send_buf_size=150000;ucast_recv_buf_size=80000;ucast_send_buf_size=150000):PING(down_thread=false;num_initial_members=3;timeout=2000;up_thread=false):MERGE2(max_interval=20000;min_interval=10000):FD_SOCK:VERIFY_SUSPECT(down_thread=false;timeout=1500;up_thread=false):pbcast.NAKACK(down_thread=false;gc_lag=50;max_xmit_size=8192;retransmit_timeout=600,1200,2400,4800;up_thread=false):UNICAST(down_thread=false;min_threshold=10;timeout=600,1200,2400;window_size=100):pbcast.STABLE(desired_avg_gossip=20000;down_thread=false;up_thread=false):FRAG(down_thread=false;frag_size=8192;up_thread=false):pbcast.GMS(join_retry_timeout=2000;join_timeout=5000;print_local_addr=true;shun=true):pbcast.STATE_TRANSFER(down_thread=true;up_thread=true)
      [11:35:01.354] [INFO ] [cache.TreeCache ][createPessimisticInterceptorChain] - interceptor chain is:
      class org.jboss.cache.interceptors.CallInterceptor
      class org.jboss.cache.interceptors.PessimisticLockInterceptor
      class org.jboss.cache.interceptors.UnlockInterceptor
      class org.jboss.cache.interceptors.ReplicationInterceptor
      [11:35:01.357] [INFO ] [cache.TreeCache ][_createService] - cache mode is REPL_SYNC
      [11:35:01.392] [INFO ] [protocols.UDP ][createSockets] - sockets will use interface 10.58.8.75
      [11:35:01.395] [INFO ] [protocols.UDP ][createSockets] - socket information:
      local_addr=10.58.8.75:32885, mcast_addr=228.1.2.3:48866, bind_addr=/10.58.8.75, ttl=64
      sock: bound to 10.58.8.75:32885, receive buffer size=80000, send buffer size=131071
      mcast_recv_sock: bound to 10.58.8.75:48866, send buffer size=131071, receive buffer size=80000
      mcast_send_sock: bound to 10.58.8.75:32886, send buffer size=131071, receive buffer size=80000

      -------------------------------------------------------
      GMS: address is 10.58.8.75:32885
      -------------------------------------------------------
      [11:35:03.407] [INFO ] [cache.TreeCache ][startService] - my local address is 10.58.8.75:32885
      [11:35:03.409] [INFO ] [cache.TreeCache ][viewAccepted] - viewAccepted(): [10.58.8.75:32883|1] [10.58.8.75:32883, 10.58.8.75:32885]
      [11:35:03.411] [INFO ] [cache.TreeCache ][viewAccepted] - viewAccepted(): [10.58.8.75:32883|1] [10.58.8.75:32883, 10.58.8.75:32885]
      [11:35:03.414] [INFO ] [cache.TreeCache ][_getState] - locking the / subtree to return the in-memory (transient) state
      [11:35:03.416] [INFO ] [statetransfer.StateTransferGenerator_124][generateStateTransfer] - returning the in-memory state (97 bytes)
      [11:35:03.417] [INFO ] [statetransfer.StateTransferGenerator_124][generateStateTransfer] - returning the associated state (4 bytes)
      [11:35:03.420] [INFO ] [cache.TreeCache ][_setState] - received the state (size=167 bytes)
      [11:35:03.422] [INFO ] [cache.TreeCache ][fetchStateOnStartup] - state was retrieved successfully (in 13 milliseconds)
      [11:35:03.423] [INFO ] [cache.TreeCache ][startService] - Cache is started!!
      [11:35:03.427] [INFO ] [PrimaryCacheTest ][testPopulatingCache] - Creating 10000 Order objects!
      [11:35:03.699] [INFO ] [PrimaryCacheTest ][testPopulatingCache] - Finished Creating Order Objects. Took 270 milliseconds to create 10000 Orders!
      [11:35:03.700] [INFO ] [PrimaryCacheTest ][testPopulatingCache] - Filling up cache with 10000 Orders!
      [11:37:05.052] [INFO ] [PrimaryCacheTest ][testPopulatingCache] - Finished filling up cache with 10000 Orders. Took 121 Seconds to fill the cache
      [11:37:05.055] [INFO ] [PrimaryCacheTest ][testPopulatingCache] - Average PUT time for 10000 objects: 12 millseconds, Smallest PUT time: 8 milliseconds, Largest PUT time: 1718 milliseconds, Total PUT time: 121295 millisconds.
      [11:37:05.058] [INFO ] [PrimaryCacheTest ][testPopulatingCache] - Largest put time occured at Order Id: 9018
      [11:37:05.059] [INFO ] [PrimaryCacheTest ][testPopulatingCache] - Trying another Batch Insert for 100 records!
      [11:37:06.095] [INFO ] [PrimaryCacheTest ][testPopulatingCache] - Average PUT time for 100 objects: 10 millseconds, Smallest PUT time: 8 milliseconds, Largest PUT time: 42 milliseconds, Total PUT time: 1027 millisconds.
      [11:37:06.309] [INFO ] [PrimaryCacheTest ][testPopulatingCache] - Size of Cache now: 10100
      [11:37:06.555] [INFO ] [PrimaryCacheTest ][testPopulatingCache] - Average ACCESS time: 0Smallest ACCESS time: 0Largest ACCESS time: 77Total ACCESS time: 228
      [11:37:06.923] [INFO ] [cache.TreeCache ][stopService] - stopService(): closing the channel
      [11:37:06.930] [INFO ] [cache.TreeCache ][stopService] - stopService(): stopping the dispatcher
      [11:37:06.932] [INFO ] [cache.TreeCache ][stopService] - stopService(): closing the channel
      [11:37:06.937] [INFO ] [cache.TreeCache ][viewAccepted] - viewAccepted(): [10.58.8.75:32883|2] [10.58.8.75:32885]
      [11:37:06.945] [INFO ] [cache.TreeCache ][stopService] - stopService(): stopping the dispatcher


        • 1. Re: putObject slows  gradually with high number of objects i
          jlukar



          I should also note:

          the benchmark is simply using the same configuration as PropogationManager example from Ben's article.

          So there is no persistance,

          • 2. Re: putObject slows  gradually with high number of objects i
            jlukar



            forgive me, there might be persistance but what ever replSync-service.xml is using for the PropagationManager example, is used in my bench mark.

            The provided output shows the exact configuration.

            Missed the hardware configs as well:

            - Pentium 4
            - Linux 2.6, Debian install
            - 2 gig RAM (5400 rpm harddrive - this is a laptop powerstation)
            - JVM startup param for this unit test is -Xms200m -Xmx300m

            I know this is not the most scientific benchmark, but it might just confirm that putObject does increase gradually as number of objects increase.

            At some point I would expect 100's of thousands of objects in a managed cache being clustered being updated regularly at a rate of 500/second.









            k.

            • 3. Re: putObject slows  gradually with high number of objects i
              belaban

              Can you create a JIRA issue and attach your test to it so we can take a look ?
              This needs to be a self-sufficient test (no dependent libs), and please also include the configuration used, and a description of how to run the test

              • 4. Re: putObject slows  gradually with high number of objects i

                When you do putObject test, do you use the same POJO over and over again? If it is, it should be (almost) an no-op underneath. So it should be very fast actually.

                -Ben

                • 5. Re: putObject slows  gradually with high number of objects i
                  brian.stansberry

                  When you run this it would also be good to create a garbage collection log (-Xloggc:somefile). I'm wondering if your 1718ms result is due to a full gc.

                  • 6. Re: putObject slows  gradually with high number of objects i
                    jlukar

                    Bela,
                    I use a different object every time. To ensure that object creation does not factor into the benchmark, I create all the 10000 objects before hand and maintain in a array which I then pluck objects from and put into cache.


                    Ben,
                    I ran the gc stats and sure enough there are bunch of full GC going on.:

                    65.557: [Full GC 191655K->135047K(204416K), 0.9030130 secs]
                    92.674: [Full GC 228937K->228937K(244088K), 0.9122410 secs]
                    105.148: [Full GC 283698K->267961K(304896K), 1.0661370 secs]
                    107.074: [Full GC 286969K->260993K(304896K), 1.6936240 secs]
                    111.455: [Full GC 286582K->270856K(304896K), 0.9772830 secs]
                    113.319: [Full GC 289864K->274151K(304896K), 1.0449530 secs]
                    115.512: [Full GC 293159K->149095K(304896K), 1.2714720 secs]
                    157.541: [Full GC 284661K->71844K(304896K), 0.7851200 secs]
                    221.847: [Full GC 283623K->79095K(304896K), 0.4558300 secs]

                    the rest were all minor gc collections like:
                    257.283: [GC 214653K->198459K(302776K), 0.0237150 secs]
                    258.099: [GC 217339K->201325K(302776K), 0.0243710 secs]
                    etc.....


                    I was very cognizant of GC issues when I ran the test that is why I started the test with: -Xms200m -Xmx300m. Apparently this wasn't enough. I bumped up the mem params to: -Xms500m and -Xmx600m and that got rid of the full GC's.

                    Doing "top" and seeing process stats, I noticed that the resident memory started to creep up into high 400M so that helped. This had the side effect of making the incremental GC collections( v.s. full gc's) take longer, so definitly more fine-tunning required.

                    Now the stats for 10000 objects is at:
                    [13:37:43.658] [INFO ] [PrimaryCacheTest ][testPopulatingCache] - Average PUT time for 10000 objects: 13 millseconds, Smallest PUT time: 9 milliseconds, Largest PUT time: 211 milliseconds, Total PUT time: 135350 millisconds.

                    you see the total hasn't shrunk as delay times are now spread more evenly. This is much better. It still won't match the speeds of caching object references. I am going to turn off REPL_SNYC and use LOCAL and see how things go there.



                    • 7. Re: putObject slows  gradually with high number of objects i
                      belaban

                      I'd suggest REPL_ASYNC (not REPL_SYNC) and fc-fast-minimalthreads.xml from the JGroups distro. That may make a diff if you replicate many modifications

                      • 8. Re: putObject slows  gradually with high number of objects i

                        GC was suggested by Brian. :-)

                        Since the major cost of POJO cache (TreeCacheAop) is in putObject (it has to map the object recursively into primitive fields), I'd say benchmark putObject with different POJO every time is probably not a good idea. It is probably not a good use case anyway.

                        Ideal case is putObject mostly while do POJO get/set methods.

                        If you are interested to cook up a more complete test case, let me know.

                        -Ben

                        • 9. Re: putObject slows  gradually with high number of objects i
                          jlukar


                          sorry about that Brian. No injustice intended.

                          Ben, I actually was wondering about the test case. I hardly see my own application doing things this way but was trying to see where the achiles hill was before I jump into adopting it.

                          YES, I'd like to get a better, complete, test case for benchmarking purposes. Please let me know what would be better.

                          ...keeping in mind that I have performance in mind for a near real-time trading application that will use this.

                          k.