JBoss Cache Syncing issues

ges Nov 8, 2007 5:05 PM

Hi,

I am using the JBoss Cache that comes with JBoss 4.0.4GA. I have 2 nodes (on 2 different physical machines) that was setup to replicate async with each other. I am using the TreeCache.

I was having no issues at all, till one fine morning they stop syncing with each other. I turned on the Jgroups tracing to be TRACE. I can see that one of the nodes drop packets from the other node. I keep seeing the message "discarded message from non-member" in my log files. I also see messages that say that the requested packets are not in the current sequence number window.

The following is my config:

<UDP mcast_addr="239.255.2.9" mcast_port="45566" ip_ttl="64" ip_mcast="true"
mcast_send_buf_size="150000" mcast_recv_buf_size="150000" ucast_send_buf_size="150000"
ucast_recv_buf_size="150000" loopback="true" />
<PING timeout="2000" num_initial_members="3" up_thread="false" down_thread="false" />
<MERGE2 min_interval="10000" max_interval="20000" />
<FD shun="true" up_thread="true" down_thread="true" />
<VERIFY_SUSPECT timeout="1500" up_thread="false" down_thread="false" />
<pbcast.NAKACK gc_lag="50" max_xmit_size="50000" retransmit_timeout="600,1200,2400,4800" up_thread="false"
down_thread="false" />
<UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10" down_thread="false" />
<pbcast.STABLE desired_avg_gossip="20000" up_thread="false" down_thread="false" />
<FRAG frag_size="50000" down_thread="false" up_thread="false" />
<pbcast.GMS join_timeout="5000" join_retry_timeout="2000" shun="true" print_local_addr="true" />
<pbcast.STATE_TRANSFER up_thread="false" down_thread="false" />

Currently, I am able to run both the nodes on the same physical machine and they seem to work fine though I need to bring up one node into a consistent state before I can start the other node. If not, the Cache never syncs up. Even here I see intermittently message that say ""discarded message from non-member". Is there any reason for this?

The multicast addresses I am using work fine (since the config used to work before and I also tested a multicast stream using the VLC media player).

The amount of data that the cache holds is pretty big - > 200 Megs. The only thing that I can imagine has changed is the data size, but I see no log messages that say the messages are too long.

Is there something that I should be looking for specifically that would help me understand what is going on? Any leads would be useful.

Thanks
Gesly