11 Replies Latest reply on Nov 2, 2006 1:00 PM by somu junkee

    TCP Clustering problem

    somu junkee Newbie

      Hi

      I have two instances of JBoss 4.0.2 running on two different machines in a cluster. One node is automatically detecting the other node if I use UDP stack. If I use TCP stack, they are not communicating. I am getting the following messages frequently.

      2006-11-01 16:41:04,975 INFO [org.jboss.system.server.Server] JBoss (MX MicroKernel) [4.0.2 (build: CVSTag=JBoss_4_0_2 date=200505022023)] Started in 31s:439ms
      2006-11-01 16:41:31,413 WARN [org.jgroups.protocols.TCP] discarded message from different group (sb1585). Sender was APP2:7800 (additional data: 20 bytes)
      2006-11-01 16:41:37,476 WARN [org.jgroups.protocols.TCP] discarded message from different group (tomcat=sb1585). Sender was APP2:7810
      2006-11-01 16:41:42,757 WARN [org.jgroups.protocols.TCP] discarded message from different group (sb1585). Sender was APP2:7800 (additional data: 20 bytes)
      2006-11-01 16:41:47,789 WARN [org.jgroups.protocols.TCP] discarded message from different group (tomcat=sb1585). Sender was APP2:7810
      2006-11-01 16:41:54,523 WARN [org.jgroups.protocols.TCP] discarded message from different group (sb1585). Sender was APP2:7800 (additional data: 20 bytes)
      2006-11-01 16:41:59,664 WARN [org.jgroups.protocols.TCP] discarded message from different group (tomcat=sb1585). Sender was APP2:7810
      2006-11-01 16:42:04,492 WARN [org.jgroups.protocols.TCP] discarded message from different group (sb1585). Sender was APP2:7800 (additional data: 20 bytes)


      I have changed UDP to TCP in both cluster-service.xml and tc5-cluster-service.xml.

      Here is the TCP stack for node1 and node 2 in cluster-service.xml


      <TCP bind_addr="100.10.2.80" start_port="7800" loopback="false"/>
      <TCPPING initial_hosts="100.10.2.80[7800],100.10.2.90[7800]" port_range="3" timeout="3500"
      num_initial_members="3" up_thread="true" down_thread="true"/>
      <MERGE2 min_interval="5000" max_interval="10000"/>
      <FD shun="true" timeout="2500" max_tries="5" up_thread="true" down_thread="true" />
      <VERIFY_SUSPECT timeout="1500" down_thread="false" up_thread="false" />
      <pbcast.NAKACK down_thread="true" up_thread="true" gc_lag="100"
      retransmit_timeout="3000"/>
      <pbcast.STABLE desired_avg_gossip="20000" down_thread="false" up_thread="false" />
      <pbcast.GMS join_timeout="5000" join_retry_timeout="2000" shun="false"
      print_local_addr="true" down_thread="true" up_thread="true"/>
      <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>


      I am using the port 7810 for the TCP stack in tc5-cluster-service.xml. Please help me in resolving this. Thanks.

        • 1. Re: TCP Clustering problem
          Bela Ban Master

          The config looks okay. Can you try this with the JGroups standalone program ? http://wiki.jboss.org/wiki/Wiki.jsp?page=TestingJBoss

          It appears that you get traffic from 2 different clusters: "sb1585 and "tomcat=sb1585". Which name does your cluster have ?

          • 2. Re: TCP Clustering problem
            somu junkee Newbie

            Thanks Bela for your quick response.

            In tc5-cluster-service.xml, I have this:
            ${tomcat.partition.name:TomcatDefaultPartition}


            Also, I set an environment variable called JBOSS_CLUSTER_NAME with value sb1585.

            • 3. Re: TCP Clustering problem
              somu junkee Newbie

              I tried procedure given in the link. On node1 which I started first I get
              GMS: address is App2:7800
              ** New View: [App2:7800|0] [App2:7800]

              When node2 on other machine is started, I get ChannelException: unable to setup protocol stack.

              Please help me on this further. Thanks

              • 4. Re: TCP Clustering problem
                somu junkee Newbie

                My JGroups is working now. I am getting the message as in the link.
                I still keep getting warning messages as mentioned in my previous post.
                Please help me further with this. Thanks

                • 5. Re: TCP Clustering problem
                  Bela Ban Master

                   

                  "somejunk" wrote:
                  I tried procedure given in the link. On node1 which I started first I get
                  GMS: address is App2:7800
                  ** New View: [App2:7800|0] [App2:7800]

                  When node2 on other machine is started, I get ChannelException: unable to setup protocol stack.

                  Please help me on this further. Thanks


                  There should be a stack trace on node2, can you post it ? Also, if you enable logging at the trace level for org.jgroups, you will see what the problem is. I suspect an incorrect bind_addr. Any chance you're using IPv6 ? Then disable it as detailed in http://wiki.jboss.org/wiki/Wiki.jsp?page=IPv6

                  • 6. Re: TCP Clustering problem
                    somu junkee Newbie

                    This is what I am getting on node2.

                    2006-11-02 11:56:55,294 WARN [org.jgroups.protocols.TCP] discarded message from different group (tomcat-%JBOSS_CLUSTER_NAME%). Sender was APP1:7810
                    2006-11-02 11:57:01,138 WARN [org.jgroups.protocols.TCP] discarded message from different group (%JBOSS_CLUSTER_NAME%). Sender was APP1:7800 (additional data: 20 bytes)
                    2006-11-02 11:57:06,669 WARN [org.jgroups.protocols.TCP] discarded message from different group (tomcat-%JBOSS_CLUSTER_NAME%). Sender was APP1:7810
                    2006-11-02 11:57:12,528 WARN [org.jgroups.protocols.TCP] discarded message from different group (%JBOSS_CLUSTER_NAME%). Sender was APP1:7800 (additional data: 20 bytes)
                    2006-11-02 11:57:19,325 WARN [org.jgroups.protocols.TCP] discarded message from different group (tomcat-%JBOSS_CLUSTER_NAME%). Sender was APP1:7810
                    2006-11-02 11:57:22,529 WARN [org.jgroups.protocols.TCP] discarded message from different group (%JBOSS_CLUSTER_NAME%). Sender was APP1:7800 (additional data: 20 bytes)
                    2006-11-02 11:57:29,873 WARN [org.jgroups.protocols.TCP] discarded message from different group (tomcat-%JBOSS_CLUSTER_NAME%). Sender was APP1:7810
                    2006-11-02 11:57:32,185 WARN [org.jgroups.protocols.TCP] discarded message from different group (%JBOSS_CLUSTER_NAME%). Sender was APP1:7800 (additional data: 20 bytes)

                    TCP stack in cluster-service.xml is:


                    <TCP bind_addr="192.168.224.123" start_port="7800" loopback="true"/>
                    <TCPPING initial_hosts="192.168.224.123[7800],192.168.224.122[7800]" port_range="3" timeout="3500"
                    num_initial_members="3" up_thread="true" down_thread="true"/>
                    <MERGE2 min_interval="5000" max_interval="10000"/>
                    <FD shun="true" timeout="2500" max_tries="5" up_thread="true" down_thread="true" />
                    <VERIFY_SUSPECT timeout="1500" down_thread="false" up_thread="false" />
                    <pbcast.NAKACK down_thread="true" up_thread="true" gc_lag="100"
                    retransmit_timeout="3000"/>
                    <pbcast.STABLE desired_avg_gossip="20000" down_thread="false" up_thread="false" />
                    <pbcast.GMS join_timeout="5000" join_retry_timeout="2000" shun="false"
                    print_local_addr="true" down_thread="true" up_thread="true"/>
                    <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>


                    In tc5-cluster-service.xml, I am using port 7810.

                    • 7. Re: TCP Clustering problem
                      Bela Ban Master

                      Your JBOSS_CLUSTER_NAME variable doesn't expand correctly in (tomcat-%JBOSS_CLUSTER_NAME%).

                      • 8. Re: TCP Clustering problem
                        somu junkee Newbie

                        Also in boot.log, I see differen partition names.

                        node 1

                        jboss.partition.name: %JBOSS_CLUSTER_HOME%

                        node 2

                        tomcat.partition.name: tomcat-sb1585

                        I have same cluster name in tc5-cluster-service.xml on both nodes.
                        I also have same environment variable and value on both nodes.
                        But why is it taking different partition names? Could this be the reason for the warning messages that I am getting?

                        • 9. Re: TCP Clustering problem
                          somu junkee Newbie

                          If the environment variable is not getting expanded correctly, Please suggest me how should I proceed further?

                          • 10. Re: TCP Clustering problem
                            Bela Ban Master

                            Well, make sure you use the same scripts to start JBoss, and verify that the partition names are the same.
                            In the worst case, hard-code them to see if that helps

                            • 11. Re: TCP Clustering problem
                              somu junkee Newbie

                              I got it working finally....Thanks Bela for your help! I really appreciate it!

                              It was taking different partition names for nodes previously because I have JBOSS_CLUSTER_NAME as environment variable. I changed the environment variable and so it is taking the correct partition name....