0 Replies Latest reply on Aug 8, 2016 11:08 PM by MOHIL KHARE

    (Wildfly HA) FD_SOCK issue: Unable to reconnect and form tcp ring after firewall snaps tcp connection

    MOHIL KHARE Newbie



      I am on wildfly 9 and running cluster of 3 nodes. We have following jgroups config:


                 <stack name="tcp">

                          <transport socket-binding="jgroups-tcp" type="TCP"/>

                          <protocol type="TCPPING">

                              <property name="initial_hosts">


                              <property name="port_range">




                          <protocol type="MERGE2"/>

                          <protocol socket-binding="jgroups-tcp-fd" type="FD_SOCK"/>

                          <protocol type="FD"/>

                          <protocol type="VERIFY_SUSPECT"/>

                          <protocol type="pbcast.NAKACK2"/>

                          <protocol type="UNICAST3"/>

                          <protocol type="pbcast.STABLE"/>

                          <protocol type="pbcast.GMS">

                              <property name="join_timeout">




                          <protocol type="MFC"/>

                          <protocol type="FRAG2"/>

                          <protocol type="RSVP"/>




              <interface name="management">

                  <inet-address value="${jboss.bind.address.management:}"/>


              <interface name="public">

                  <inet-address value="${jboss.bind.address:}"/>


              <interface name="unsecure">

                  <inet-address value="${jboss.bind.address.unsecure:}"/>


              <interface name="jgroup-tcp-interface">

              <inet-address value=""/>





          <socket-binding-group default-interface="public" name="standard-sockets" port-offset="${jboss.socket.binding.port-offset:0}">

              <socket-binding interface="management" name="management-http" port="${jboss.management.http.port:9990}"/>

              <socket-binding interface="management" name="management-https" port="${jboss.management.https.port:9993}"/>

              <socket-binding name="ajp" port="${jboss.ajp.port:8009}"/>

              <socket-binding name="http" port="${jboss.http.port:8080}"/>

              <socket-binding name="https" port="${jboss.https.port:8443}"/>

              <socket-binding interface="jgroup-tcp-interface" name="jgroups-tcp" port="7600"/>

              <socket-binding interface="jgroup-tcp-interface" name="jgroups-tcp-fd" port="57600"/>

              <socket-binding name="txn-recovery-environment" port="4712"/>

              <socket-binding name="txn-status-manager" port="4713"/>

              <outbound-socket-binding name="mail-smtp">

                  <remote-destination host="localhost" port="25"/>




      Our kernel's tcp keep-alive is 2 hours. We deployed our cluster in an environment where there is a firewall between two cluster nodes. Since tcp connection to port 57600 is only used for Fd_SOCK and remains idle for most of the time, firewall rule broke that connection, thereby disrupting tcp socket ring.  After keep_alive got elapsed I was expecting socket reconnection, thereby reestablishing  socket ring; instead I ending up getting linear chain of sockets i.e.


      Before Firewall broke incoming and outgoing connection of A


      A <----B <---C --



      After firewall broke connection ( before keep alive got elapsed and before "Received new cluster view" messages were seen because of MERGE)


      A    B<---C          


      After firewall broke connection ( after keep alive got elapsed and after "Received new cluster view" messages were seen because of MERGE)


      This looks like some bug. Am I missing something here ?