3 Replies Latest reply on Mar 25, 2008 12:37 PM by brian.stansberry

    Channel.AUTO_RECONNECT

    brian.stansberry

      I noticed that GroupMember does not make these calls on its channels:

      controlChannel.setOpt(Channel.AUTO_RECONNECT, Boolean.TRUE);
      dataChannel.setOpt(Channel.AUTO_RECONNECT, Boolean.TRUE);
      


      This means if these channels detect they've been shunned (kicked out of the group) they won't automatically disconnect() themselves and then connect() themselves to rejoin the group. Is this intentional?

      One reason I ask is http://jira.jboss.com/jira/browse/JGRP-651 changed the default behavior of a channel to auto_reconnect=true from false. For 2.4.2 and 2.6.2. So if you didn't want channels to autoreconnect, they now will. In that case, better to set channel.setOpt(Channel.AUTO_RECONNECT, Boolean.FALSE); Either way, probably better to explicitly set the option rather than relying on a default.

      That same JIRA inadvertently changed the default for auto_getstate to true as well for 2.4.2 (not for 2.6.2). Having that true will cause the channel to not only reconnect, but also attempt a state transfer. Don't think you want that for your channel that doesn' use STATE_TRANSFER. Don't know about the other one. JIRA to revert the changed 2.4.2 default is http://jira.jboss.com/jira/browse/JGRP-720.

      To explicitly set auto_getstate you call channel.setOpt(Channel.AUTO_GET_STATE, Boolean.XXX); Also a good idea to do this explicitly.

      Because of these issues I'm recommending that a requested upgrade to JGroups 2.4.2 not be made for EAP 4.3.0.GA_CP01.

        • 1. Re: Channel.AUTO_RECONNECT
          clebert.suconic

           

          This means if these channels detect they've been shunned (kicked out of the group) they won't automatically disconnect() themselves and then connect() themselves to rejoin the group. Is this intentional?



          Brian.. in what situation they can be kicked out?

          • 2. Re: Channel.AUTO_RECONNECT
            brian.stansberry

            The failure detection protocol on another node considers the node to have failed [1] and the VERIFY_SUSPECT protocol on the coordinator node concurs [2]. The coordinator then publishes a new view that doesn't contain the suspect node. But the node hasn't entirely failed; it begins working again and tries to send messages to the group, of which it is no longer a member.

            Typically this would be a case of something happening on the node that prevents it responding to FD heartbeat requests for a period. Maybe a 100% CPU situation or something blocking all the threads that pass messages up from the transport protocol. (In JG 2.4 there is just one such thread, so if it gets blocked for a while in the app, the node can be suspected.)


            [1] Details on the most common failure detection protocols:
            http://wiki.jboss.org/wiki/Wiki.jsp?page=JGroupsFD
            http://wiki.jboss.org/wiki/Wiki.jsp?page=JGroupsFD_SOCK
            http://wiki.jboss.org/wiki/Wiki.jsp?page=FDVersusFD_SOCK

            [2] http://wiki.jboss.org/wiki/Wiki.jsp?page=JGroupsVERIFY_SUSPECT

            • 3. Re: Channel.AUTO_RECONNECT
              brian.stansberry

              Some more comments on this:

              The current default for for JG 2.6.3 and the proposed default for JG 2.4.3 is auto_reconnect=true and auto_getstate=false.

              If you have channel that uses state transfer and accepts those defaults, if the channel gets shunned and reconnects, a state transfer will not happen after the reconnect. As a result, the app's state on that node will be out of sync with the rest of the cluster. Probably not a good thing.

              Recommend JBM never just accept the defaults. Always specifically configure what you want via channel.setOpt(...). If you guys are doing a CP release for EAP 4.3 CP01 and can add that in, then you are set no matter what the defaults are in a particular JGroups release.