I have two JBoss 3.2.5 under Redhat Enterprise Server R3 (Kernel 2.4.21) running in cluster mode (two machines in the same subnetwork 192.168.124.xxx).
All works fine unless I activate Multipathing on both machines. It looks like that the UDP communication fails, because both JBoss started without any errors but the instances doesn't see each other. If I deactivate the MP all works fine again.
The config looks like this:
<!-- The JGroups protocol configuration -->
<attribute name="PartitionConfig">
<Config>
<!-- UDP: if you have a multihomed machine,
set the bind_addr attribute to the appropriate NIC IP address -->
<!-- UDP: On Windows machines, because of the media sense feature
being broken with multicast (even after disabling media sense)
set the loopback attribute to true -->
<UDP mcast_addr="228.1.2.3" mcast_port="35566"
ip_ttl="32" ip_mcast="true"
mcast_send_buf_size="800000" mcast_recv_buf_size="150000"
ucast_send_buf_size="800000" ucast_recv_buf_size="150000"
loopback="false" />
<PING timeout="2000" num_initial_members="3"
up_thread="true" down_thread="true" />
<MERGE2 min_interval="10000" max_interval="20000" />
<FD shun="true" up_thread="true" down_thread="true"
timeout="2500" max_tries="5" />
<VERIFY_SUSPECT timeout="3000" num_msgs="3"
up_thread="true" down_thread="true" />
<pbcast.NAKACK gc_lag="50" retransmit_timeout="300,600,1200,2400,4800"
max_xmit_size="8192"
up_thread="true" down_thread="true" />
<UNICAST timeout="300,600,1200,2400,4800" window_size="100" min_threshold="10"
down_thread="true" />
<pbcast.STABLE desired_avg_gossip="20000"
up_thread="true" down_thread="true" />
<FRAG frag_size="8192"
down_thread="true" up_thread="true" />
<pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
shun="true" print_local_addr="true" />
<pbcast.STATE_TRANSFER up_thread="true" down_thread="true" />
</Config>
</attribute>
This is a known problem, if you google for "multipath jgroups" you'll find a JIRA issue.
May be the same for IP bonding on Linux. Speaking of which, I didn't know Linux has IP Multipathing ? I thought this was a Solaris feature ?