1 Reply Latest reply on Aug 29, 2007 9:33 AM by brian.stansberry

    State transfer out of Memory.

    fdifonzo

      I'm using jboss-cache1.4.1-SPA (TreeCache) and jgroups 1.4.1 which it is shipped with, ina cluster environment.
      My application run with a heap size of 2G. When I have a cache size of
      372529026 Bytes (JBoss cache prints this value on my log) while slave node is
      fetching state I get an outofmemory error due to JGroups. Here's my log:

      2007-08-23 12:41:59,164 INFO [ztc.cache.JBossCachePool] FREE MEM: 1209736176
      2007-08-23 12:41:59,164 INFO [ztc.cache.JBossCachePool] TOTAL MEM: 2029518848
      2007-08-23 12:42:04,071 INFO [ztc.tftpd.TFTPServerWrapper] processRequest(), RRQ by 127.0.0.1
      2007-08-23 12:42:04,073 INFO [PROFILE] configure:1 msecs
      2007-08-23 12:42:04,073 INFO [ztc.tftpd.TFTPServerWrapper] File "/000000000000" not found for client 127.0.0.1.34616
      2007-08-23 12:42:14,081 INFO [ztc.tftpd.TFTPServerWrapper] processRequest(), RRQ by 127.0.0.1
      2007-08-23 12:42:14,083 INFO [PROFILE] configure:1 msecs
      2007-08-23 12:42:14,083 INFO [ztc.tftpd.TFTPServerWrapper] File "/000000000000" not found for client 127.0.0.1.34616
      2007-08-23 12:42:24,007 INFO [org.jboss.cache.TreeCache] viewAccepted(): [192.168.1.249:34224|3] [192.168.1.249:34224, 192.168.1.250:32789]
      2007-08-23 12:42:24,115 INFO [org.jboss.cache.TreeCache] locking the subtree at / to transfer state
      2007-08-23 12:42:29,457 INFO [org.jboss.cache.statetransfer.StateTransferGenerator_140] returning the state for tree rooted in /(372529026 bytes)
      2007-08-23 12:42:34,514 ERROR [org.jgroups.stack.DownHandler] DownHandler (FRAG) caught exception
      java.lang.OutOfMemoryError
      2007-08-23 12:42:34,514 INFO [ztc.tftpd.TFTPServerWrapper] processRequest(), RRQ by 127.0.0.1
      2007-08-23 12:42:34,538 INFO [PROFILE] configure:23 msecs

      Note that before getting error my memory checker thread states there's nearly
      1.2G of memory!!!

      Working with smaller cache, everything works fine.

      Debugging your code I found the slave hangs on the following jGroups method:

      boolean rc = channel.getState(null, state_fetch_timeout);

      So, at first glance, it seems to me jgroups introduces a memory leak, but it may be a protocol problem
      In my configuration file, the part related to jgroups looks like this:



      <UDP mcast_addr="229.1.2.4" mcast_port="45555"
      ip_ttl="64" ip_mcast="true"
      bind_addr="192.168.1.250"
      mcast_send_buf_size="150000" mcast_recv_buf_size="80000"
      ucast_send_buf_size="150000" ucast_recv_buf_size="80000"
      loopback="false" />
      <PING timeout="2000" num_initial_members="3"
      up_thread="true" down_thread="true" />
      <MERGE2 min_interval="5000" max_interval="10000" />
      <FD_SOCK/>
      <VERIFY_SUSPECT timeout="3000" num_msgs="3"
      up_thread="true" down_thread="true" />
      <pbcast.NAKACK gc_lag="50" retransmit_timeout="300,600,1200,2400,4800"
      up_thread="true" down_thread="true" />
      <pbcast.STABLE desired_avg_gossip="20000"
      up_thread="true" down_thread="true" />
      <UNICAST timeout="5000" window_size="100" min_threshold="10"
      down_thread="true" />
      <FRAG frag_size="8192"
      down_thread="true" up_thread="true" />
      <pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
      shun="true" print_local_addr="true" />
      <pbcast.STATE_TRANSFER down_thread="false" up_thread="false"/>



      I read on jgroups user guide, pbcast.STATE_TRANSFER consumes a lot of memory, so STREAMING_STATE_TRANFER is better for big caches.
      I replaced <pbcast.STATE_TRANSFER down_thread="false" up_thread="false"/> with <pbcast.STREAMING_STATE_TRANSFER down_thread="false" up_thread="false"/>,
      but slave hangs and I see no attempt to tranfer state on master log (Consider that now I have no cache, so by using STATE_TRASFER everything works fine).

      Have you got any suggestion?

      Many thanks, Fabrizio