0 Replies Latest reply on Nov 8, 2007 5:05 PM by ges

    JBoss Cache Syncing issues

    ges

      Hi,

      I am using the JBoss Cache that comes with JBoss 4.0.4GA. I have 2 nodes (on 2 different physical machines) that was setup to replicate async with each other. I am using the TreeCache.

      I was having no issues at all, till one fine morning they stop syncing with each other. I turned on the Jgroups tracing to be TRACE. I can see that one of the nodes drop packets from the other node. I keep seeing the message "discarded message from non-member" in my log files. I also see messages that say that the requested packets are not in the current sequence number window.

      The following is my config:

      <UDP mcast_addr="239.255.2.9" mcast_port="45566" ip_ttl="64" ip_mcast="true"
      mcast_send_buf_size="150000" mcast_recv_buf_size="150000" ucast_send_buf_size="150000"
      ucast_recv_buf_size="150000" loopback="true" />
      <PING timeout="2000" num_initial_members="3" up_thread="false" down_thread="false" />
      <MERGE2 min_interval="10000" max_interval="20000" />
      <FD shun="true" up_thread="true" down_thread="true" />
      <VERIFY_SUSPECT timeout="1500" up_thread="false" down_thread="false" />
      <pbcast.NAKACK gc_lag="50" max_xmit_size="50000" retransmit_timeout="600,1200,2400,4800" up_thread="false"
      down_thread="false" />
      <UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10" down_thread="false" />
      <pbcast.STABLE desired_avg_gossip="20000" up_thread="false" down_thread="false" />
      <FRAG frag_size="50000" down_thread="false" up_thread="false" />
      <pbcast.GMS join_timeout="5000" join_retry_timeout="2000" shun="true" print_local_addr="true" />
      <pbcast.STATE_TRANSFER up_thread="false" down_thread="false" />


      Currently, I am able to run both the nodes on the same physical machine and they seem to work fine though I need to bring up one node into a consistent state before I can start the other node. If not, the Cache never syncs up. Even here I see intermittently message that say ""discarded message from non-member". Is there any reason for this?

      The multicast addresses I am using work fine (since the config used to work before and I also tested a multicast stream using the VLC media player).

      The amount of data that the cache holds is pretty big - > 200 Megs. The only thing that I can imagine has changed is the data size, but I see no log messages that say the messages are too long.

      Is there something that I should be looking for specifically that would help me understand what is going on? Any leads would be useful.

      Thanks
      Gesly