7 Replies Latest reply on Nov 12, 2006 4:53 PM by belaban

jgroups tcp_nio configuration

mjtodd Nov 8, 2006 6:23 AM

I have a jgroups configuration successfully working using tcp. I am trying to change this to tcp_nio as I understand this will give better performance on large clusters. I am testing my configuration using the jgroups demo draw program. If I start up my 3 nodes one by one then everything works fine. However if I start up node 1, then attempt to start node 2 and 3 in parallel then only node 2 will work. Node 3 will be isolated and not see the other nodes and logs the following message:

org.jgroups.protocols.pbcast.ClientGmsImpl join
WARNING: join(192.158.70.200:7802) sent to 192.158.70.200:7800 timed out, retrying

Here is the configuration for one of my nodes:

<config>
 <TCP_NIO
 bind_addr="192.158.70.200"
 recv_buf_size="20000000"
 send_buf_size="640000"
 loopback="false"
 discard_incompatible_packets="true"
 max_bundle_size="64000"
 max_bundle_timeout="30"
 use_incoming_packet_handler="true"
 use_outgoing_packet_handler="true"
 down_thread="false" up_thread="false"
 enable_bundling="true"
 start_port="7800"
 end_port="7800"
 use_send_queues="false"
 sock_conn_timeout="300" skip_suspected_members="true"


 />

 <MPING timeout="2000" num_initial_members="3" mcast_addr="229.6.7.8"

bind_addr="192.158.70.200" down_thread="false" up_thread="false"/>

 <MERGE2 max_interval="100000"
 down_thread="false" up_thread="false" min_interval="20000"/>
 <FD_SOCK down_thread="false" up_thread="false"/>

 <VERIFY_SUSPECT timeout="1500" down_thread="false" up_thread="false"/>
 <pbcast.NAKACK max_xmit_size="60000"
 use_mcast_xmit="false" gc_lag="0"
 retransmit_timeout="300,600,1200,2400,4800"
 down_thread="true" up_thread="true"
 discard_delivered_msgs="true"/>
 <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
 down_thread="false" up_thread="false"
 max_bytes="400000"/>
 <pbcast.GMS print_local_addr="true" join_timeout="3000"
 down_thread="true" up_thread="true"
 join_retry_timeout="2000" shun="true"
 view_bundling="true"/>
 <!-- <FC max_credits="2000000" down_thread="false" up_thread="false"
 min_threshold="0.10"/>
 <FRAG2 frag_size="60000" down_thread="false" up_thread="false"/> -->
<pbcast.STATE_TRANSFER/>
<!-- <pbcast.FLUSH down_thread="false" up_thread="false"/>-->
</config>

Node 2 and 3 have the same configuration except the port they bind to has been changed

Any help would be appreciated

1. Re: jgroups tcp_nio configuration

belaban Nov 8, 2006 8:02 AM (in response to mjtodd)

We will look into this (I've created some JIRA tasks), but the slightly modified config below does work for me (I removed FLUSH and replaced STREAMING_STATE_TRANSFER with STATE_TRANSFER)

<TCP_NIO
bind_addr="127.0.0.1"
recv_buf_size="20000000"
send_buf_size="640000"
loopback="false"
discard_incompatible_packets="true"
max_bundle_size="64000"
max_bundle_timeout="30"
use_incoming_packet_handler="true"
use_outgoing_packet_handler="true"
down_thread="false" up_thread="false"
enable_bundling="true"
start_port="7800"
end_port="7805"
use_send_queues="false"
sock_conn_timeout="300" skip_suspected_members="true"

/>

<MPING timeout="2000" num_initial_members="3" mcast_addr="229.6.7.8"

bind_addr="127.0.0.1" down_thread="false" up_thread="false"/>

<MERGE2 max_interval="100000"
down_thread="false" up_thread="false" min_interval="20000"/>
<FD_SOCK down_thread="false" up_thread="false"/>

<VERIFY_SUSPECT timeout="1500" down_thread="false" up_thread="false"/>
<pbcast.NAKACK max_xmit_size="60000"
use_mcast_xmit="false" gc_lag="0"
retransmit_timeout="300,600,1200,2400,4800"
down_thread="true" up_thread="true"
discard_delivered_msgs="true"/>
<pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
down_thread="false" up_thread="false"
max_bytes="400000"/>
<pbcast.GMS print_local_addr="true" join_timeout="3000"
down_thread="true" up_thread="true"
join_retry_timeout="2000" shun="true"
view_bundling="true"/>

<pbcast.STATE_TRANSFER/>

Actions
2. Re: jgroups tcp_nio configuration

mjtodd Nov 8, 2006 9:07 AM (in response to mjtodd)

I have just tried your configuration and unfortunately I still get the same behavior. The only change I have noticed is that on node 1 shortly after I get the join timeout on node 3 I get the following message

SEVERE: exception is java.lang.reflect.InvocationTargetException
Actions
3. Re: jgroups tcp_nio configuration

belaban Nov 8, 2006 9:15 AM (in response to mjtodd)

That might be an issue in JBoss, try with JGroups standalone: http://wiki.jboss.org/wiki/Wiki.jsp?page=TestingJBoss
Actions
4. Re: jgroups tcp_nio configuration

mjtodd Nov 8, 2006 9:45 AM (in response to mjtodd)

I have tried JGroups standalone and still get the same issue when starting node 2 and 3 at the same time. Node 3 gets the join timeout and does not become part of the view. It is all fine if I use tcp instead of tcp_nio. Is there any other logging I can enable that would be useful?
Actions
5. Re: jgroups tcp_nio configuration

belaban Nov 8, 2006 12:16 PM (in response to mjtodd)

Did you remove FLUSH and set TCP_NIO.enable_bundling to false ?
Actions
6. Re: jgroups tcp_nio configuration

mjtodd Nov 9, 2006 3:55 AM (in response to mjtodd)

yes, FLUSH was removed and TCP_NIO.enable_bundling was set to false
Actions
7. Re: jgroups tcp_nio configuration

belaban Nov 12, 2006 4:53 PM (in response to mjtodd)

Can you create a JIRA issue and attach your config and exact description how to reproduce this to it ?
The JIRA for JGroups is at
http://jira.jboss.com/jira/browse/JGRP
Actions

Go to original post