7 Replies Latest reply on Mar 14, 2008 9:00 PM by genman

    Random 1.6gb object allocation attempt when using tcpping

    youngm

      We are using jbossCache 1.4.1.SP8 jgroups 2.4.1.SP4 on websphere 6.1 (IBM JDK AIX)

      This is our config string:

      <config>
       <TCP start_port="58000" sock_conn_timeout="500" send_buf_size="150000" recv_buf_size="80000" loopback="false"
       use_send_queues="false" />
       <TCPPING timeout="2000" down_thread="false" up_thread="false" initial_hosts="host1[58000],host2[58000]"
       port_range="100" num_initial_members="1" />
       <MERGE2 min_interval="10000" max_interval="20000" />
       <FD_SOCK />
       <VERIFY_SUSPECT timeout="1500" up_thread="false" down_thread="false" />
       <pbcast.NAKACK gc_lag="50" retransmit_timeout="600,1200,2400,4800" max_xmit_size="8192" up_thread="false"
       down_thread="false" />
       <UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10" down_thread="false" />
       <pbcast.STABLE desired_avg_gossip="20000" up_thread="false" down_thread="false" />
       <FRAG frag_size="8192" down_thread="false" up_thread="false" />
       <pbcast.GMS join_timeout="5000" join_retry_timeout="2000" shun="true" print_local_addr="true" />
       <pbcast.STATE_TRANSFER up_thread="true" down_thread="true" />
      </config>
      


      Our current configuration contains only 2 nodes.

      We are seeing a problem where after about a week of normal operation we see some random 1.6gb byte[] allocation attempts. One of our apps attempted to allocate 33 of these 1.6gb byte[]s and it hung the app. We took a thread and GC dump from our of our frozen applications and noticed the following.

      1. We had 33 jgroups send threads hung. 32 of these thread dumps had the following stacktrace:

      3XMTHREADINFO "ConnectionTable.Connection.Sender [10.98.111.61:58001 - 10.98.111.61:58001]" (TID:0x36E07400, sys_thread_t:0x37379850, state:CW, native ID:0x001D20B5) prio=5
      4XESTACKTRACE at java/lang/Object.wait(Native Method)
      4XESTACKTRACE at java/lang/Object.wait(Object.java:199(Compiled Code))
      4XESTACKTRACE at org/jgroups/util/Queue.remove(Queue.java:257(Compiled Code))
      4XESTACKTRACE at org/jgroups/blocks/BasicConnectionTable$Connection$Sender.run(BasicConnectionTable.java:686(Compiled Code))
      4XESTACKTRACE at java/lang/Thread.run(Thread.java:810(Compiled Code))
      


      The other thread was:

      3XMTHREADINFO "ConnectionTable.Connection.Receiver [10.98.111.61:58000 - 10.98.111.62:52906]" (TID:0x36CEAA00, sys_thread_t:0x36D12D08, state:R, native ID:0x00125049) prio=5
      4XESTACKTRACE at java/net/SocketInputStream.socketRead0(Native Method)
      4XESTACKTRACE at java/net/SocketInputStream.read(SocketInputStream.java:155(Compiled Code))
      4XESTACKTRACE at java/io/BufferedInputStream.fill(BufferedInputStream.java:229(Compiled Code))
      4XESTACKTRACE at java/io/BufferedInputStream.read1(BufferedInputStream.java:267(Compiled Code))
      4XESTACKTRACE at java/io/BufferedInputStream.read(BufferedInputStream.java:324(Compiled Code))
      4XESTACKTRACE at java/io/DataInputStream.readFully(DataInputStream.java:202(Compiled Code))
      4XESTACKTRACE at java/io/DataInputStream.readInt(DataInputStream.java:380(Compiled Code))
      4XESTACKTRACE at org/jgroups/blocks/BasicConnectionTable$Connection.run(BasicConnectionTable.java:575)
      4XESTACKTRACE at java/lang/Thread.run(Thread.java:810)
      


      The 32 threads are associated with sequential ports 58001-58033 so it appears jgroups is scanning the ports to determine if there are any new nodes in the cluster?

      We have not been able to duplicate this problem when using "mping" instead of "tcpping" for member finding however we are not allowed to use multicast in our production environment.

      We are going to try and change our port_range to smaller number to see if that helps. does anyone on the board has any other ideas?

      Mike