1 2 Previous Next 20 Replies Latest reply on Mar 11, 2005 6:04 AM by belaban

    Memory problem with JBossCache

    henriknyberg

      Hello,
      we are currently evaluating JBossCache 1.2 and have run into serious and reproducable memory problems when stress testing the cache with a professional load tool (PureLoad). Something in the replication module is eating memory at an incredibly fast pace. After only minutes of testing the garbage collection is activated and a full garbage collect is performed after about 15 minutes. During garbage collection the entire JVM is not responding and in some cases automatically restarted by the Oracle Process Manager.

      When testing with a legacy local cache solution the JVM uses about 150-200 MB of memory. Using the JBossCache in asynchronous replication mode the JVM steadily increases its memory up to about 500 MBs at which point a full gc is invoked (the allocated JVM memory is 512 MB). If replication is turned off and the JBossCache is only running as a local cache, the JVM memory usage is back to the normal level of 150-200 MB.

      Question: is there a known memory leak or memory management problem with JBossCache, and in that case what can be done to stabilize the memory consumption?


      Settings for the cache are:
      * Asynchronous replication (replAsync-service.xml out of the box with the suggested modification for windows, i.e. loopback = true).
      * no transactions (IsolationLevel.NONE)

      Settings for the test environment are:
      * Windows cluster with two W2K servers, 2 JVMs per server
      * JDK 1.4.2_05
      * Application server is Oracle 9.0.3 AS
      * one database server (Oracle)

        • 1. Re: Memory problem with JBossCache
          slaboure

          Hello,

          Just out of curiosity, could you please post your JGroups stack configuration. Also could you please try with a TCP-based configuration, you can find one in the conf folder of the JGroups distribution.

          Thanks. Cheers,


          sacha

          • 2. Re: Memory problem with JBossCache
            henriknyberg

            Is this the configuration you are referring to? (straight from our replAsync-service.xml file).

            I will look into the tcp configuration for JGroups.


            <attribute name="ClusterConfig">
             <config>
             <!-- UDP: if you have a multihomed machine,
             set the bind_addr attribute to the appropriate NIC IP address -->
             <!-- UDP: On Windows machines, because of the media sense feature
             being broken with multicast (even after disabling media sense)
             set the loopback attribute to true -->
             <UDP mcast_addr="228.1.2.3" mcast_port="48866"
             ip_ttl="64" ip_mcast="true"
             mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
             ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
             loopback="true"/>
             <PING timeout="2000" num_initial_members="3"
             up_thread="false" down_thread="false"/>
             <MERGE2 min_interval="10000" max_interval="20000"/>
             <!-- <FD shun="true" up_thread="true" down_thread="true" />-->
             <FD_SOCK/>
             <VERIFY_SUSPECT timeout="1500"
             up_thread="false" down_thread="false"/>
             <pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800"
             max_xmit_size="8192" up_thread="false" down_thread="false"/>
             <UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10"
             down_thread="false"/>
             <pbcast.STABLE desired_avg_gossip="20000"
             up_thread="false" down_thread="false"/>
             <FRAG frag_size="8192"
             down_thread="false" up_thread="false"/>
             <pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
             shun="true" print_local_addr="true"/>
             <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>
             </config>
             </attribute>


            • 3. Re: Memory problem with JBossCache
              belaban

               

              "Henrik Nyberg" wrote:


              Settings for the cache are:
              * Asynchronous replication (replAsync-service.xml out of the box with the suggested modification for windows, i.e. loopback = true).
              * no transactions (IsolationLevel.NONE)


              Have no transactions is very bad, as the cache replicates after *every single* modification.

              If you cannot be forced to use transactions, at least enable queueing, so modifications are replicated in batches.

              Also, if you stress-test the cache, you might be better off using the fc-fast config from JGroups/conf.

              If this doesn't solve your problems, please open a bug report on JIRA (http://jira.jboss.com/jira/browse/JBCACHE).

              • 4. Re: Memory problem with JBossCache
                slaboure

                yes, that is the one I am speaking about. Please try the TCP one, I would be very interested to see the difference of behaviour you see, from both a memory and performance standpoint.

                cheers,


                sacha

                • 5. Re: Memory problem with JBossCache
                  belaban

                  Either TCP or fc-fast, here's tcp:


                  <TCP bind_addr="127.0.0.1" start_port="7800" loopback="true"/>
                  <TCPPING timeout="20000" num_ping_requests="5"
                  initial_hosts="127.0.0.1[7800]" port_range="2"
                  num_initial_members="2" />
                  <!-- MERGE2 min_interval="5000" max_interval="10000" / -->
                  <FD timeout="1000" max_tries="3"/>
                  <VERIFY_SUSPECT timeout="5500" down_thread="false" up_thread="false"/>
                  <pbcast.NAKACK gc_lag="100" retransmit_timeout="600,1200,2400,4800"/>
                  <pbcast.STABLE stability_delay="1000" desired_avg_gossip="20000"
                  down_thread="false" max_bytes="100000" up_thread="false"/>
                  <pbcast.GMS print_local_addr="true" join_timeout="5000"
                  join_retry_timeout="2000" shun="true"/>



                  and here's fc-fast:


                  <UDP mcast_send_buf_size="10000000" mcast_port="45566" ucast_recv_buf_size="10000000" mcast_addr="228.8.8.8" loopback="false" mcast_recv_buf_size="10000000" max_bundle_size="64000" max_bundle_timeout="30" use_incoming_packet_handler="false" use_outgoing_packet_handler="true" ucast_send_buf_size="10000000" ip_ttl="32" enable_bundling="true"/>
                  <PING timeout="2000" down_thread="false" num_initial_members="3"/>
                  <MERGE2 max_interval="10000" down_thread="false" min_interval="5000"/>
                  <FD_SOCK down_thread="false"/>
                  <VERIFY_SUSPECT timeout="1500" down_thread="false"/>
                  <pbcast.NAKACK max_xmit_size="60000" down_thread="false" use_mcast_xmit="true" gc_lag="50" retransmit_timeout="300,600,1200,2400,4800"/>
                  <UNICAST timeout="300,600,1200,2400,3600" down_thread="false"/>
                  <pbcast.STABLE stability_delay="1000" desired_avg_gossip="5000" down_thread="false" max_bytes="250000"/>
                  <pbcast.GMS print_local_addr="true" join_timeout="3000" down_thread="false" join_retry_timeout="2000" shun="true"/>
                  <FC max_credits="1000000" down_thread="false" min_threshold="0.10"/>
                  <FRAG frag_size="60000" down_thread="false" up_thread="true"/>
                  <COMPRESS down_thread="false" min_size="500" compression_level="3" up_thread="true"/>
                  <pbcast.STATE_TRANSFER down_thread="false" up_thread="false"/>

                  • 6. Re: Memory problem with JBossCache
                    henriknyberg

                    Hello,
                    we have done some more testing (using the standard UDP configuration). It seems like the memory problem is partly explained by a periodic PUT on the same key (and node) in the TreeCache. Every 1000 GETs we PUT a cache status object into the cache to let all other caches know the status of this TreeCache. We use the same node and the same key for one unique instance of a TreeCache (one per JVM). Somehow, this repeated PUT into the same node/key causes the cache to consume memory.

                    Question: if a PUT is repeatedly made on the same node/key in a replicated TreeCache, is it necessary to do a REMOVE on the same node/key before the PUT is made?

                    • 7. Re: Memory problem with JBossCache
                      belaban

                      No, absolutely not !
                      If you have a sample program that reproduces this problem, please submit it attached to a JIRA bug report, and I'll take a look.

                      • 8. Re: Memory problem with JBossCache
                        henriknyberg

                        Ok,
                        now I am pretty sure of what the problem is. JBossCache does not properly garbage collect objects that are replicated through the network and that replaces an already existing object in the local cache. The existing local cache object that is being replaced by the remote PUT is not garbage collected.

                        We have four caches in our setup, A, B, C, D. We describe the status for cache A after the first iteration using an object sA1. We PUT that object after every iteration into A using the same key ksA.

                        Iteration 1:
                        A updates status -> A(ksA,sA1) replicated to => B(ksA, sA1), C(ksA, sA1), D(ksA, sA1)

                        Iteration 2:
                        A updates status -> A(ksA,sA2) replicated to => B(ksA, sA2), C(ksA, sA2), D(ksA, sA2)

                        After iteration 2 the local object A(ksA,sA1) is probably correctly gc:ed, since if we run this test in local mode the cache works reliably. The remote objects B(ksA, sA1), C(ksA, sA1), D(ksA, sA1) however seem not to be correctly gc:ed since the cache constantly eats memory when running in replicated mode.

                        If we turn off the status replication, i.e. the repeated PUT of the status object A(ksA, saN), the cache works like a charm. The only difference between the two test cases is the PUT.


                        (BTW, I am not using all of the jars in the distribution, only the ones that are needed to actually run the cache. I don't think it should have any bearing on this but nevertheless I'll list the ones that I am using:)

                        jboss-cache.jar
                        jboss-system.jar
                        jboss-jmx.jar
                        jgroups.jar
                        jboss-common.jar
                        jboss-j2ee.jar
                        jboss-remoting.jar
                        log4j.jar
                        commons-logging.jar
                        concurrent.jar

                        • 9. Re: Memory problem with JBossCache
                          belaban


                          Are these assumption ? Or did you see those objects not being GC'ed in a profiler ?

                          All we do on async repl without transactions and isolation level=NONE is to send the Method to B, C and D (wrapped in a MethodCall), and execute it there. So the behavior you get on A is what you get on B,C and D as well.

                          What's the frequency of your updates ?

                          You haven't answered my questions regarding
                          - your config: did fc-fast or tcp.xml help ?
                          - Can you submit a simple program the reproduces this behavior ?

                          • 10. Re: Memory problem with JBossCache
                            henriknyberg

                            Hello,
                            we have tried both tcp and fc-fast. Tcp was significantly slower and as a result the replication ratio (relative share of objects in a cache received from other caches and not from local PUT) went down (which decreases the usefulness of the cache). FC-fast provided the fastest replication and is a good candidate for a production environment. However, I do not think that the network configuration has any bearing on the memory problem.

                            Regarding updates, each cache updates the cluster about 20 times per second (at every 1000 GET).

                            I'll put together a simplified testbench (including our JBossCache wrapper class and the status class) for you. Can you point me to a location where I can submit the code package?

                            • 11. Re: Memory problem with JBossCache
                              belaban

                              Okay, create a bug report under JIRA (http://jira.jboss.com/jira/browse/JBCACHE), attach the 2 files. The case will be automatically be assigned to me then.

                              If I can reproduce this, I will hold the 1.2.1 release until I have fixed this.

                              • 12. Re: Memory problem with JBossCache
                                belaban

                                Apart from the bug report: have you considered enabling a queue ? Say you replicate when you have 100 elements in the queue or 1 second has elapsed, whichever occurs first.
                                Since you have 20 updates/sec, 1 replication message would bundle ca 20 updates each.

                                • 13. Re: Memory problem with JBossCache
                                  slaboure

                                  Bela, do you understand why TCP would be significantly slower in that scenario with not so many nodes?

                                  • 14. Re: Memory problem with JBossCache
                                    belaban

                                    No, not at all. But he created a JIRA bug report with a test case attached, and I'll look at it as soon as I've release JBossCache 1.2.1

                                    1 2 Previous Next