7 Replies Latest reply on Nov 12, 2006 4:53 PM by belaban

    jgroups tcp_nio configuration

    mjtodd

      I have a jgroups configuration successfully working using tcp. I am trying to change this to tcp_nio as I understand this will give better performance on large clusters. I am testing my configuration using the jgroups demo draw program. If I start up my 3 nodes one by one then everything works fine. However if I start up node 1, then attempt to start node 2 and 3 in parallel then only node 2 will work. Node 3 will be isolated and not see the other nodes and logs the following message:

      org.jgroups.protocols.pbcast.ClientGmsImpl join
      WARNING: join(192.158.70.200:7802) sent to 192.158.70.200:7800 timed out, retrying
      


      Here is the configuration for one of my nodes:

      <config>
       <TCP_NIO
       bind_addr="192.158.70.200"
       recv_buf_size="20000000"
       send_buf_size="640000"
       loopback="false"
       discard_incompatible_packets="true"
       max_bundle_size="64000"
       max_bundle_timeout="30"
       use_incoming_packet_handler="true"
       use_outgoing_packet_handler="true"
       down_thread="false" up_thread="false"
       enable_bundling="true"
       start_port="7800"
       end_port="7800"
       use_send_queues="false"
       sock_conn_timeout="300" skip_suspected_members="true"
      
      
       />
      
       <MPING timeout="2000" num_initial_members="3" mcast_addr="229.6.7.8"
      
      bind_addr="192.158.70.200" down_thread="false" up_thread="false"/>
      
       <MERGE2 max_interval="100000"
       down_thread="false" up_thread="false" min_interval="20000"/>
       <FD_SOCK down_thread="false" up_thread="false"/>
      
       <VERIFY_SUSPECT timeout="1500" down_thread="false" up_thread="false"/>
       <pbcast.NAKACK max_xmit_size="60000"
       use_mcast_xmit="false" gc_lag="0"
       retransmit_timeout="300,600,1200,2400,4800"
       down_thread="true" up_thread="true"
       discard_delivered_msgs="true"/>
       <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
       down_thread="false" up_thread="false"
       max_bytes="400000"/>
       <pbcast.GMS print_local_addr="true" join_timeout="3000"
       down_thread="true" up_thread="true"
       join_retry_timeout="2000" shun="true"
       view_bundling="true"/>
       <!-- <FC max_credits="2000000" down_thread="false" up_thread="false"
       min_threshold="0.10"/>
       <FRAG2 frag_size="60000" down_thread="false" up_thread="false"/> -->
      <pbcast.STATE_TRANSFER/>
      <!-- <pbcast.FLUSH down_thread="false" up_thread="false"/>-->
      </config>
      

      Node 2 and 3 have the same configuration except the port they bind to has been changed

      Any help would be appreciated

        • 1. Re: jgroups tcp_nio configuration
          belaban

          We will look into this (I've created some JIRA tasks), but the slightly modified config below does work for me (I removed FLUSH and replaced STREAMING_STATE_TRANSFER with STATE_TRANSFER)


          <TCP_NIO
          bind_addr="127.0.0.1"
          recv_buf_size="20000000"
          send_buf_size="640000"
          loopback="false"
          discard_incompatible_packets="true"
          max_bundle_size="64000"
          max_bundle_timeout="30"
          use_incoming_packet_handler="true"
          use_outgoing_packet_handler="true"
          down_thread="false" up_thread="false"
          enable_bundling="true"
          start_port="7800"
          end_port="7805"
          use_send_queues="false"
          sock_conn_timeout="300" skip_suspected_members="true"


          />

          <MPING timeout="2000" num_initial_members="3" mcast_addr="229.6.7.8"

          bind_addr="127.0.0.1" down_thread="false" up_thread="false"/>

          <MERGE2 max_interval="100000"
          down_thread="false" up_thread="false" min_interval="20000"/>
          <FD_SOCK down_thread="false" up_thread="false"/>

          <VERIFY_SUSPECT timeout="1500" down_thread="false" up_thread="false"/>
          <pbcast.NAKACK max_xmit_size="60000"
          use_mcast_xmit="false" gc_lag="0"
          retransmit_timeout="300,600,1200,2400,4800"
          down_thread="true" up_thread="true"
          discard_delivered_msgs="true"/>
          <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
          down_thread="false" up_thread="false"
          max_bytes="400000"/>
          <pbcast.GMS print_local_addr="true" join_timeout="3000"
          down_thread="true" up_thread="true"
          join_retry_timeout="2000" shun="true"
          view_bundling="true"/>
          <!-- <FC max_credits="2000000" down_thread="false" up_thread="false"
          min_threshold="0.10"/>
          <FRAG2 frag_size="60000" down_thread="false" up_thread="false"/> -->
          <pbcast.STATE_TRANSFER/>
          <!-- <pbcast.FLUSH down_thread="false" up_thread="false"/>-->

          • 2. Re: jgroups tcp_nio configuration
            mjtodd

            I have just tried your configuration and unfortunately I still get the same behavior. The only change I have noticed is that on node 1 shortly after I get the join timeout on node 3 I get the following message

            SEVERE: exception is java.lang.reflect.InvocationTargetException

            • 3. Re: jgroups tcp_nio configuration
              belaban

              That might be an issue in JBoss, try with JGroups standalone: http://wiki.jboss.org/wiki/Wiki.jsp?page=TestingJBoss

              • 4. Re: jgroups tcp_nio configuration
                mjtodd

                I have tried JGroups standalone and still get the same issue when starting node 2 and 3 at the same time. Node 3 gets the join timeout and does not become part of the view. It is all fine if I use tcp instead of tcp_nio. Is there any other logging I can enable that would be useful?

                • 5. Re: jgroups tcp_nio configuration
                  belaban

                  Did you remove FLUSH and set TCP_NIO.enable_bundling to false ?

                  • 6. Re: jgroups tcp_nio configuration
                    mjtodd

                    yes, FLUSH was removed and TCP_NIO.enable_bundling was set to false

                    • 7. Re: jgroups tcp_nio configuration
                      belaban

                      Can you create a JIRA issue and attach your config and exact description how to reproduce this to it ?
                      The JIRA for JGroups is at
                      http://jira.jboss.com/jira/browse/JGRP