1 2 Previous Next 15 Replies Latest reply on Aug 29, 2006 3:54 PM by brian.stansberry

    Does clustering require traffic over 1098 as well as 11800?

    annecotter

      I apologize if this is a basic question, but I cannot find a clear answer in any of the documentation. I have JGroups configured to use TCP. I notice that when a new member joins the cluster, there seems to be traffic generated from port 1098, as well as the expected JGroups traffic on 11800. Is there an RMI call made when a new member joins the cluster? I'm suspecting this is how the new member registers itself with the existing member's HA partition?

      Any info would be much appreciated, thank you in advance.

      regards,
      Anne

        • 1. Re: Does clustering require traffic over 1098 as well as 118
          marcreis

          Hi, I dont know if you have seen it already, but it comes in handy:

          http://wiki.jboss.org/wiki/Wiki.jsp?page=UsingJBossBehindAFirewall.

          The mentioned port belongs to the naming service, defined in the /conf/jboss-service.xml

          ....
          <!-- The port of the RMI naming service, 0 == anonymous -->
          <attribute name="RmiPort">1098</attribute>
          ...
          

          Sincerly
          Marc

          • 2. Re: Does clustering require traffic over 1098 as well as 118
            annecotter

            Hi Marc, thank you I have seen that document. I wonder if I haven't phrased my question well, but what I'm trying to find out is when a new member joins a cluster, does it make an RMI call to the existing member? So do the members of a cluster use RMI for any communication with each other?

            thanks
            Anne

            • 3. Re: Does clustering require traffic over 1098 as well as 118
              brian.stansberry

              The cluster code itself doesn't, but perhaps some component you've deployed does?

              • 4. Re: Does clustering require traffic over 1098 as well as 118

                The naming service is what uses port 1098. This port is used even if you start the "default" server (i.e., no cluster) so it's not associated with cluster communications.

                • 5. Re: Does clustering require traffic over 1098 as well as 118
                  annecotter


                  Thanks for the answer Brian.

                  I have two machines with multiple interfaces. Clustering traffic is configured to go over bge2, and all other traffic (JNDI/RMI/JMS) is over bge1. If the two machines don't have IP connectivity over bge1 I get the following messages when a new member joins the cluster:

                  2006.07.21 12:51:20 WARN [org.jgroups.protocols.pbcast.NAKACK] [dino2:11800 (additional data: 17 bytes)] discarded message from non-member 10.10.10.10:11800

                  When the two machines can see each other over bge1 (as well as bge2), clustering works fine. In this case when a new member joins the cluster, I see traffic over 1098 at the beginning of the join. This is why I was suspecting RMI calls between the servers. Am I missing any obvious reasons why the first case doesn't work? What causes my second machine (10.10.10.10) to go from being a non-member, to being considered a member so that the first machine won't discard it's messages?

                  Thanks
                  Anne

                  • 6. Re: Does clustering require traffic over 1098 as well as 118
                    brian.stansberry

                    Can you post your JGroups protocol stack config? (IIRC you're using TCP).

                    • 7. Re: Does clustering require traffic over 1098 as well as 118
                      annecotter

                      JGroups config:

                       <mbean code="org.jboss.ha.framework.server.ClusterPartition" name="jboss:service=${jboss.partition.name:5620SAMPartition_SAM_4_0_B2_16}">
                       <!-- Name of the partition being built -->
                       <attribute name="PartitionName">${jboss.partition.name:5620SAMPartition_SAM_4_0_B2_16}</attribute>
                       <!-- The address used to determine the node name -->
                       <attribute name="NodeAddress">${jboss.bind.address}</attribute>
                       <!-- Determine if deadlock detection is enabled -->
                       <attribute name="DeadlockDetection">False</attribute>
                       <!-- Max time (in ms) to wait for state transfer to complete. Increase for large states -->
                       <attribute name="StateTransferTimeout">30000</attribute>
                       <!-- The JGroups protocol configuration -->
                       <attribute name="PartitionConfig">
                       <!--
                       The default UDP stack:
                       - If you have a multihomed machine, set the UDP protocol's bind_addr attribute to the
                       appropriate NIC IP address, e.g bind_addr="192.168.0.2".
                       - On Windows machines, because of the media sense feature being broken with multicast
                       (even after disabling media sense) set the UDP protocol's loopback attribute to true
                       -->
                       <Config>
                       <TCP bind_addr="172.16.22.11" start_port="11800" loopback="true" />
                       <TCPPING initial_hosts="10.10.10.10[11800]" port_range="1" timeout="3000" num_initial_members="3" up_thread="true" down_thread="true" />
                       <MERGE2 min_interval="5000" max_interval="10000" />
                       <FD shun="false" up_thread="true" down_thread="true" timeout="5000" max_tries="9" />
                       <VERIFY_SUSPECT timeout="15000" up_thread="true" down_thread="true" />
                       <pbcast.NAKACK gc_lag="100" retransmit_timeout="3000" up_thread="true" down_thread="true" />
                       <pbcast.STABLE desired_avg_gossip="20000" up_thread="true" down_thread="true" />
                       <pbcast.GMS join_timeout="5000" join_retry_timeout="2000" shun="false" print_local_addr="true"
                      up_thread="true" down_thread="true" />
                       <pbcast.STATE_TRANSFER up_thread="true" down_thread="true" />
                       </Config>
                       </attribute>
                       </mbean>
                      


                      • 8. Re: Does clustering require traffic over 1098 as well as 118
                        brian.stansberry

                        A support case for this issue has been opened, so the discussion is continuing there.

                        • 9. Re: Does clustering require traffic over 1098 as well as 118
                          abosch

                          hi!

                          im interested in this topic as im facing same problem.

                          where can i find more info?

                          thanks

                          • 10. Re: Does clustering require traffic over 1098 as well as 118
                            brian.stansberry

                            Thanks for asking; we resolved this on our support portal and I forgot to post the result here.

                            The issue is the RMI stub for the HA-JNDI service. Clients that connect to HA-JNDI use RMI as the transport, so each server running HA-JNDI creates an RMI stub for use in communicating with that server. When a client contacts the HA-JNDI service and downloads a proxy for that service, the proxy includes the RMI stubs for *all* the servers in the cluster. The presence of these stubs is what allows the proxy to fail over from one server to another if needed.

                            So, say you have a 3 server cluster, A, B and C. A client contacts C (e.g. at port 1100) and downloads an HA-JNDI proxy. The proxy includes separate RMI stubs for A, B and C.

                            How does C have a stub for A and B? When C joins the cluster, it fetches A and B's stubs, and sends its stub to A and B.

                            The traffic Anne was seeing was the normal traffic of an RMI stub communicating back to its origin server for the purpose of maintaining a "lease" on the RMI server object. This "lease" is used by RMI's distributed garbage collection; without it the RMI server object could be gc'd. By default, this communication happens when the stub is first deserialized, and every 5 minutes thereafter. This communication is part of standard RMI stub behavior; it's not something JBoss wrote.

                            The communication occurred on 1098 because that's the RMI port they had configured for HA-JNDI. By default in current versions of the AS, this port is 1101.

                            • 11. Re: Does clustering require traffic over 1098 as well as 118
                              abosch

                              thanks for the response but i dont really get it.

                              in a multi-ethernet scenario where all clustering traffic is supossed to go through second/internal interface, why do i need to open also communication between first/external interface?

                              is this by desing or is configurable anywhere?

                              my goal is to separate all cluster related traffic from external traffic.

                              maybe im not heading correctly the problem so i'll be glad to hear your comments.

                              • 12. Re: Does clustering require traffic over 1098 as well as 118
                                brian.stansberry

                                (Reposted with a typo fixed)

                                The traffic will go over whatever interface the HA-JNDI service is configured to use (which is typically an external interface, as HA-JNDI is used by clients).

                                I wouldn't say this was by design; it's more a side effect of using RMI. To make it go away you would need to:

                                1) Configure HA-JNDI to use the internal interface (set the BindAddress attribute in the HA-JNDI section of cluster-service.xml.) Obviously this is only an option if you don't have external clients that need HA-JNDI.

                                2) Prevent exchange of external interface RMI stubs for clustered EJBs:

                                a) Use the PooledInvokerHA instead of the JRMPInvokerHA for clustered EJBs. Simplest is to edit conf/standardjboss.xml looking for occurences of

                                <invoker-mbean>jboss:service=invoker,type=jrmpha</invoker-mbean>


                                and replacing them with
                                <invoker-mbean>jboss:service=invoker,type=pooledha</invoker-mbean>


                                OR b) Configure the JRMPInvokerHA (in cluster-service.xml) to use the internal address (set the "ServerAddress" attribute.) Again, this is only an option if you don't have external clients that need the EJBs.


                                There is a JIRA for 5.0 to convert HA-JNDI to use Remoting, which will remove the RMI issue for that service. For 5.0 clustered EJBs already use Remoting.

                                • 13. Re: Does clustering require traffic over 1098 as well as 118
                                  abosch

                                  thanks a lot brian, your post is very insightful.

                                  i'll try that and post the results.

                                  • 14. Re: Does clustering require traffic over 1098 as well as 118
                                    abosch

                                    hi again!

                                    i've changed BindAddress in HA-JNDI and ServerAddress in JRMPInvokerHA.

                                    with iptables disabled (traffic can go freely) everything starts smothly but when i limit external interface to only accept petitions from outside i get some errors.

                                    actually, servers get clustered but i receive some bad messages from second server (first starts with no problem):


                                    [org.jboss.ha.framework.interfaces.HAPartition.SiapPartition] Fetching state (will wait for 60000 milliseconds):


                                    here it get freezes for some time and then continues. should i receive some "state was retrieved successfully" message?

                                    can i ignore this issue or must i dig more?

                                    if you want further info, please just ask

                                    1 2 Previous Next