11 Replies Latest reply on Nov 2, 2006 1:00 PM by somejunk

    TCP Clustering problem

    somejunk

      Hi

      I have two instances of JBoss 4.0.2 running on two different machines in a cluster. One node is automatically detecting the other node if I use UDP stack. If I use TCP stack, they are not communicating. I am getting the following messages frequently.

      2006-11-01 16:41:04,975 INFO [org.jboss.system.server.Server] JBoss (MX MicroKernel) [4.0.2 (build: CVSTag=JBoss_4_0_2 date=200505022023)] Started in 31s:439ms
      2006-11-01 16:41:31,413 WARN [org.jgroups.protocols.TCP] discarded message from different group (sb1585). Sender was APP2:7800 (additional data: 20 bytes)
      2006-11-01 16:41:37,476 WARN [org.jgroups.protocols.TCP] discarded message from different group (tomcat=sb1585). Sender was APP2:7810
      2006-11-01 16:41:42,757 WARN [org.jgroups.protocols.TCP] discarded message from different group (sb1585). Sender was APP2:7800 (additional data: 20 bytes)
      2006-11-01 16:41:47,789 WARN [org.jgroups.protocols.TCP] discarded message from different group (tomcat=sb1585). Sender was APP2:7810
      2006-11-01 16:41:54,523 WARN [org.jgroups.protocols.TCP] discarded message from different group (sb1585). Sender was APP2:7800 (additional data: 20 bytes)
      2006-11-01 16:41:59,664 WARN [org.jgroups.protocols.TCP] discarded message from different group (tomcat=sb1585). Sender was APP2:7810
      2006-11-01 16:42:04,492 WARN [org.jgroups.protocols.TCP] discarded message from different group (sb1585). Sender was APP2:7800 (additional data: 20 bytes)


      I have changed UDP to TCP in both cluster-service.xml and tc5-cluster-service.xml.

      Here is the TCP stack for node1 and node 2 in cluster-service.xml


      <TCP bind_addr="100.10.2.80" start_port="7800" loopback="false"/>
      <TCPPING initial_hosts="100.10.2.80[7800],100.10.2.90[7800]" port_range="3" timeout="3500"
      num_initial_members="3" up_thread="true" down_thread="true"/>
      <MERGE2 min_interval="5000" max_interval="10000"/>
      <FD shun="true" timeout="2500" max_tries="5" up_thread="true" down_thread="true" />
      <VERIFY_SUSPECT timeout="1500" down_thread="false" up_thread="false" />
      <pbcast.NAKACK down_thread="true" up_thread="true" gc_lag="100"
      retransmit_timeout="3000"/>
      <pbcast.STABLE desired_avg_gossip="20000" down_thread="false" up_thread="false" />
      <pbcast.GMS join_timeout="5000" join_retry_timeout="2000" shun="false"
      print_local_addr="true" down_thread="true" up_thread="true"/>
      <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>


      I am using the port 7810 for the TCP stack in tc5-cluster-service.xml. Please help me in resolving this. Thanks.

        • 1. Re: TCP Clustering problem
          belaban

          The config looks okay. Can you try this with the JGroups standalone program ? http://wiki.jboss.org/wiki/Wiki.jsp?page=TestingJBoss

          It appears that you get traffic from 2 different clusters: "sb1585 and "tomcat=sb1585". Which name does your cluster have ?

          • 2. Re: TCP Clustering problem
            somejunk

            Thanks Bela for your quick response.

            In tc5-cluster-service.xml, I have this:
            ${tomcat.partition.name:TomcatDefaultPartition}


            Also, I set an environment variable called JBOSS_CLUSTER_NAME with value sb1585.

            • 3. Re: TCP Clustering problem
              somejunk

              I tried procedure given in the link. On node1 which I started first I get
              GMS: address is App2:7800
              ** New View: [App2:7800|0] [App2:7800]

              When node2 on other machine is started, I get ChannelException: unable to setup protocol stack.

              Please help me on this further. Thanks

              • 4. Re: TCP Clustering problem
                somejunk

                My JGroups is working now. I am getting the message as in the link.
                I still keep getting warning messages as mentioned in my previous post.
                Please help me further with this. Thanks

                • 5. Re: TCP Clustering problem
                  belaban

                   

                  "somejunk" wrote:
                  I tried procedure given in the link. On node1 which I started first I get
                  GMS: address is App2:7800
                  ** New View: [App2:7800|0] [App2:7800]

                  When node2 on other machine is started, I get ChannelException: unable to setup protocol stack.

                  Please help me on this further. Thanks


                  There should be a stack trace on node2, can you post it ? Also, if you enable logging at the trace level for org.jgroups, you will see what the problem is. I suspect an incorrect bind_addr. Any chance you're using IPv6 ? Then disable it as detailed in http://wiki.jboss.org/wiki/Wiki.jsp?page=IPv6

                  • 6. Re: TCP Clustering problem
                    somejunk

                    This is what I am getting on node2.

                    2006-11-02 11:56:55,294 WARN [org.jgroups.protocols.TCP] discarded message from different group (tomcat-%JBOSS_CLUSTER_NAME%). Sender was APP1:7810
                    2006-11-02 11:57:01,138 WARN [org.jgroups.protocols.TCP] discarded message from different group (%JBOSS_CLUSTER_NAME%). Sender was APP1:7800 (additional data: 20 bytes)
                    2006-11-02 11:57:06,669 WARN [org.jgroups.protocols.TCP] discarded message from different group (tomcat-%JBOSS_CLUSTER_NAME%). Sender was APP1:7810
                    2006-11-02 11:57:12,528 WARN [org.jgroups.protocols.TCP] discarded message from different group (%JBOSS_CLUSTER_NAME%). Sender was APP1:7800 (additional data: 20 bytes)
                    2006-11-02 11:57:19,325 WARN [org.jgroups.protocols.TCP] discarded message from different group (tomcat-%JBOSS_CLUSTER_NAME%). Sender was APP1:7810
                    2006-11-02 11:57:22,529 WARN [org.jgroups.protocols.TCP] discarded message from different group (%JBOSS_CLUSTER_NAME%). Sender was APP1:7800 (additional data: 20 bytes)
                    2006-11-02 11:57:29,873 WARN [org.jgroups.protocols.TCP] discarded message from different group (tomcat-%JBOSS_CLUSTER_NAME%). Sender was APP1:7810
                    2006-11-02 11:57:32,185 WARN [org.jgroups.protocols.TCP] discarded message from different group (%JBOSS_CLUSTER_NAME%). Sender was APP1:7800 (additional data: 20 bytes)

                    TCP stack in cluster-service.xml is:


                    <TCP bind_addr="192.168.224.123" start_port="7800" loopback="true"/>
                    <TCPPING initial_hosts="192.168.224.123[7800],192.168.224.122[7800]" port_range="3" timeout="3500"
                    num_initial_members="3" up_thread="true" down_thread="true"/>
                    <MERGE2 min_interval="5000" max_interval="10000"/>
                    <FD shun="true" timeout="2500" max_tries="5" up_thread="true" down_thread="true" />
                    <VERIFY_SUSPECT timeout="1500" down_thread="false" up_thread="false" />
                    <pbcast.NAKACK down_thread="true" up_thread="true" gc_lag="100"
                    retransmit_timeout="3000"/>
                    <pbcast.STABLE desired_avg_gossip="20000" down_thread="false" up_thread="false" />
                    <pbcast.GMS join_timeout="5000" join_retry_timeout="2000" shun="false"
                    print_local_addr="true" down_thread="true" up_thread="true"/>
                    <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>


                    In tc5-cluster-service.xml, I am using port 7810.

                    • 7. Re: TCP Clustering problem
                      belaban

                      Your JBOSS_CLUSTER_NAME variable doesn't expand correctly in (tomcat-%JBOSS_CLUSTER_NAME%).

                      • 8. Re: TCP Clustering problem
                        somejunk

                        Also in boot.log, I see differen partition names.

                        node 1

                        jboss.partition.name: %JBOSS_CLUSTER_HOME%

                        node 2

                        tomcat.partition.name: tomcat-sb1585

                        I have same cluster name in tc5-cluster-service.xml on both nodes.
                        I also have same environment variable and value on both nodes.
                        But why is it taking different partition names? Could this be the reason for the warning messages that I am getting?

                        • 9. Re: TCP Clustering problem
                          somejunk

                          If the environment variable is not getting expanded correctly, Please suggest me how should I proceed further?

                          • 10. Re: TCP Clustering problem
                            belaban

                            Well, make sure you use the same scripts to start JBoss, and verify that the partition names are the same.
                            In the worst case, hard-code them to see if that helps

                            • 11. Re: TCP Clustering problem
                              somejunk

                              I got it working finally....Thanks Bela for your help! I really appreciate it!

                              It was taking different partition names for nodes previously because I have JBOSS_CLUSTER_NAME as environment variable. I changed the environment variable and so it is taking the correct partition name....