7 Replies Latest reply on Nov 12, 2006 4:53 PM by Bela Ban

    jgroups tcp_nio configuration

    Matthew Todd Newbie

      I have a jgroups configuration successfully working using tcp. I am trying to change this to tcp_nio as I understand this will give better performance on large clusters. I am testing my configuration using the jgroups demo draw program. If I start up my 3 nodes one by one then everything works fine. However if I start up node 1, then attempt to start node 2 and 3 in parallel then only node 2 will work. Node 3 will be isolated and not see the other nodes and logs the following message:

      org.jgroups.protocols.pbcast.ClientGmsImpl join
      WARNING: join(192.158.70.200:7802) sent to 192.158.70.200:7800 timed out, retrying
      


      Here is the configuration for one of my nodes:

      <config>
       <TCP_NIO
       bind_addr="192.158.70.200"
       recv_buf_size="20000000"
       send_buf_size="640000"
       loopback="false"
       discard_incompatible_packets="true"
       max_bundle_size="64000"
       max_bundle_timeout="30"
       use_incoming_packet_handler="true"
       use_outgoing_packet_handler="true"
       down_thread="false" up_thread="false"
       enable_bundling="true"
       start_port="7800"
       end_port="7800"
       use_send_queues="false"
       sock_conn_timeout="300" skip_suspected_members="true"
      
      
       />
      
       <MPING timeout="2000" num_initial_members="3" mcast_addr="229.6.7.8"
      
      bind_addr="192.158.70.200" down_thread="false" up_thread="false"/>
      
       <MERGE2 max_interval="100000"
       down_thread="false" up_thread="false" min_interval="20000"/>
       <FD_SOCK down_thread="false" up_thread="false"/>
      
       <VERIFY_SUSPECT timeout="1500" down_thread="false" up_thread="false"/>
       <pbcast.NAKACK max_xmit_size="60000"
       use_mcast_xmit="false" gc_lag="0"
       retransmit_timeout="300,600,1200,2400,4800"
       down_thread="true" up_thread="true"
       discard_delivered_msgs="true"/>
       <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
       down_thread="false" up_thread="false"
       max_bytes="400000"/>
       <pbcast.GMS print_local_addr="true" join_timeout="3000"
       down_thread="true" up_thread="true"
       join_retry_timeout="2000" shun="true"
       view_bundling="true"/>
       <!-- <FC max_credits="2000000" down_thread="false" up_thread="false"
       min_threshold="0.10"/>
       <FRAG2 frag_size="60000" down_thread="false" up_thread="false"/> -->
      <pbcast.STATE_TRANSFER/>
      <!-- <pbcast.FLUSH down_thread="false" up_thread="false"/>-->
      </config>
      

      Node 2 and 3 have the same configuration except the port they bind to has been changed

      Any help would be appreciated

        • 1. Re: jgroups tcp_nio configuration
          Bela Ban Master

          We will look into this (I've created some JIRA tasks), but the slightly modified config below does work for me (I removed FLUSH and replaced STREAMING_STATE_TRANSFER with STATE_TRANSFER)


          <TCP_NIO
          bind_addr="127.0.0.1"
          recv_buf_size="20000000"
          send_buf_size="640000"
          loopback="false"
          discard_incompatible_packets="true"
          max_bundle_size="64000"
          max_bundle_timeout="30"
          use_incoming_packet_handler="true"
          use_outgoing_packet_handler="true"
          down_thread="false" up_thread="false"
          enable_bundling="true"
          start_port="7800"
          end_port="7805"
          use_send_queues="false"
          sock_conn_timeout="300" skip_suspected_members="true"


          />

          <MPING timeout="2000" num_initial_members="3" mcast_addr="229.6.7.8"

          bind_addr="127.0.0.1" down_thread="false" up_thread="false"/>

          <MERGE2 max_interval="100000"
          down_thread="false" up_thread="false" min_interval="20000"/>
          <FD_SOCK down_thread="false" up_thread="false"/>

          <VERIFY_SUSPECT timeout="1500" down_thread="false" up_thread="false"/>
          <pbcast.NAKACK max_xmit_size="60000"
          use_mcast_xmit="false" gc_lag="0"
          retransmit_timeout="300,600,1200,2400,4800"
          down_thread="true" up_thread="true"
          discard_delivered_msgs="true"/>
          <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
          down_thread="false" up_thread="false"
          max_bytes="400000"/>
          <pbcast.GMS print_local_addr="true" join_timeout="3000"
          down_thread="true" up_thread="true"
          join_retry_timeout="2000" shun="true"
          view_bundling="true"/>
          <!-- <FC max_credits="2000000" down_thread="false" up_thread="false"
          min_threshold="0.10"/>
          <FRAG2 frag_size="60000" down_thread="false" up_thread="false"/> -->
          <pbcast.STATE_TRANSFER/>
          <!-- <pbcast.FLUSH down_thread="false" up_thread="false"/>-->

          • 2. Re: jgroups tcp_nio configuration
            Matthew Todd Newbie

            I have just tried your configuration and unfortunately I still get the same behavior. The only change I have noticed is that on node 1 shortly after I get the join timeout on node 3 I get the following message

            SEVERE: exception is java.lang.reflect.InvocationTargetException

            • 3. Re: jgroups tcp_nio configuration
              Bela Ban Master

              That might be an issue in JBoss, try with JGroups standalone: http://wiki.jboss.org/wiki/Wiki.jsp?page=TestingJBoss

              • 4. Re: jgroups tcp_nio configuration
                Matthew Todd Newbie

                I have tried JGroups standalone and still get the same issue when starting node 2 and 3 at the same time. Node 3 gets the join timeout and does not become part of the view. It is all fine if I use tcp instead of tcp_nio. Is there any other logging I can enable that would be useful?

                • 5. Re: jgroups tcp_nio configuration
                  Bela Ban Master

                  Did you remove FLUSH and set TCP_NIO.enable_bundling to false ?

                  • 6. Re: jgroups tcp_nio configuration
                    Matthew Todd Newbie

                    yes, FLUSH was removed and TCP_NIO.enable_bundling was set to false

                    • 7. Re: jgroups tcp_nio configuration
                      Bela Ban Master

                      Can you create a JIRA issue and attach your config and exact description how to reproduce this to it ?
                      The JIRA for JGroups is at
                      http://jira.jboss.com/jira/browse/JGRP