9 Replies Latest reply on Mar 13, 2007 6:17 PM by brian.stansberry

    WAN Cluster setup

      Hi all,

      Can somebody please provide me some info regarding the setup of a Jboss cluster with nodes in WAN topology? I think I have to use the JGroups TUNNET to pass firewalls...

      Any pointers can help me.

      Johan

        • 2. Re: WAN Cluster setup

          Do you have an example of cluster-service.xml for JBoss AS?

          I keep getting this kind of errors :

          15:11:47,464 INFO [STDOUT]
          -------------------------------------------------------
          GMS: address is AWS00581:3395
          -------------------------------------------------------
          15:11:49,741 INFO [TreeCache] TreeCache local address is AWS00581:3395
          15:11:49,751 INFO [TreeCache] viewAccepted(): [AWS00581:3366|1] [AWS00581:3366, AWS00581:3395]
          15:11:49,843 INFO [TreeCache] received the state (size=1024 bytes)
          15:11:49,884 INFO [TreeCache] state was retrieved successfully (in 143 milliseconds)
          15:11:49,884 INFO [TreeCache] parseConfig(): PojoCacheConfig is empty
          15:11:51,140 INFO [DefaultPartition] Initializing
          15:11:52,263 INFO [STDOUT]
          -------------------------------------------------------
          GMS: address is localhost:3400
          -------------------------------------------------------
          15:11:55,888 INFO [DefaultPartition] Number of cluster members: 2
          15:11:55,888 INFO [DefaultPartition] Other members: 1
          15:11:55,888 WARN [DefaultPartition] No additional information has been found in the JavaGroup addr
          ess: make sure you are running with a correct version of JGroups and that the protocol you are usin
          g supports the 'additionalData' behaviour
          15:11:55,898 ERROR [RouterStub] receive(): java.net.SocketException: Software caused connection abor
          t: recv failed
           at java.net.SocketInputStream.socketRead0(Native Method)
           at java.net.SocketInputStream.read(SocketInputStream.java:129)
           at java.net.SocketInputStream.read(SocketInputStream.java:182)


          • 3. Re: WAN Cluster setup

            The Jgroups config in cluster-service.xml on both nodes on the same machine is as follows :

            <Config>
             <TUNNEL router_port="12001" router_host="localhost"/>
             <PING timeout="3000" gossip_refresh="10000" num_initial_members="2" gossip_host="localhost" gossip_port="12001"/>
             <MERGE2 max_interval="10000" min_interval="5000"/>
             <FD timeout="3000"/>
             <VERIFY_SUSPECT timeout="1500" down_thread="false" up_thread="false"/>
             <pbcast.NAKACK gc_lag="100" retransmit_timeout="600,1200,2400,4800"/>
             <pbcast.STABLE stability_delay="1000" desired_avg_gossip="20000" down_thread="false" max_bytes="0" up_thread="false"/>
             <pbcast.GMS print_local_addr="true" join_timeout="5000" join_retry_timeout="2000" shun="true"/>
             <pbcast.STATE_TRANSFER down_thread="false" up_thread="false"/>
            </Config>


            This is a piece of the logging off the first node :
            -------------------------------------------------------
            GMS: address is AWS00581:4770
            -------------------------------------------------------
            16:26:00,889 INFO [TreeCache] viewAccepted(): [AWS00581:4770|0] [AWS00581:4770]
            16:26:00,929 INFO [TreeCache] TreeCache local address is AWS00581:4770
            16:26:00,939 INFO [TreeCache] State could not be retrieved (we are the first member in group)
            16:26:00,939 INFO [TreeCache] parseConfig(): PojoCacheConfig is empty
            16:26:01,820 INFO [DefaultPartition] Initializing
            16:26:03,001 INFO [STDOUT]
            -------------------------------------------------------
            GMS: address is localhost:4772
            -------------------------------------------------------
            16:26:06,555 INFO [DefaultPartition] Number of cluster members: 1
            16:26:06,565 INFO [DefaultPartition] Other members: 0
            16:26:06,565 WARN [DefaultPartition] No additional information has been found in the JavaGroup addr
            ess: make sure you are running with a correct version of JGroups and that the protocol you are usin
            g supports the 'additionalData' behaviour
            16:26:06,565 INFO [DefaultPartition] Fetching state (will wait for 30000 milliseconds):
            16:26:06,565 INFO [DefaultPartition] State could not be retrieved (we are the first member in group
            )
            16:26:06,685 INFO [HANamingService] Started ha-jndi bootstrap jnpPort=1200, backlog=50, bindAddress
            =/0.0.0.0
            16:26:06,795 INFO [DetachedHANamingService$AutomaticDiscovery] Listening on /0.0.0.0:1102, group=23
            0.0.0.4, HA-JNDI address=10.16.180.47:1200


            Second node :
            -------------------------------------------------------
            GMS: address is AWS00581:4801
            -------------------------------------------------------
            16:27:57,821 INFO [TreeCache] TreeCache local address is AWS00581:4801
            16:27:57,831 INFO [TreeCache] viewAccepted(): [AWS00581:4770|1] [AWS00581:4770, AWS00581:4801]
            16:27:57,921 INFO [TreeCache] received the state (size=1024 bytes)
            16:27:57,951 INFO [TreeCache] state was retrieved successfully (in 120 milliseconds)
            16:27:57,951 INFO [TreeCache] parseConfig(): PojoCacheConfig is empty
            16:28:00,403 INFO [DefaultPartition] Initializing
            16:28:01,494 INFO [STDOUT]
            -------------------------------------------------------
            GMS: address is localhost:4806
            -------------------------------------------------------
            16:28:05,028 INFO [DefaultPartition] Number of cluster members: 2
            16:28:05,028 INFO [DefaultPartition] Other members: 1
            16:28:05,028 WARN [DefaultPartition] No additional information has been found in the JavaGroup addr
            ess: make sure you are running with a correct version of JGroups and that the protocol you are usin
            g supports the 'additionalData' behaviour
            16:28:05,048 INFO [DefaultPartition] New cluster view for partition DefaultPartition: 1 ([127.0.0.1
            :4772, 127.0.0.1:4806] delta: 0)
            16:28:05,088 ERROR [RouterStub] receive(): java.net.SocketException: Software caused connection abor
            t: recv failed
             at java.net.SocketInputStream.socketRead0(Native Method)
             at java.net.SocketInputStream.read(SocketInputStream.java:129)
             at java.net.SocketInputStream.read(SocketInputStream.java:182)
             at java.io.DataInputStream.readInt(DataInputStream.java:353)
             at org.jgroups.stack.RouterStub.receive(RouterStub.java:289)
             at org.jgroups.protocols.TUNNEL.run(TUNNEL.java:182)
             at java.lang.Thread.run(Thread.java:595)
            
            16:28:05,108 INFO [STDOUT]
            -------------------------------------------------------
            GMS: address is null
            -------------------------------------------------------
            16:28:05,118 WARN [ServiceController] Problem starting service jboss:service=DefaultPartition
            java.lang.NullPointerException
             at org.jboss.ha.framework.server.HAPartitionImpl.verifyNodeIsUnique(HAPartitionImpl.java:116
            1)
             at org.jboss.ha.framework.server.HAPartitionImpl.startPartition(HAPartitionImpl.java:272)
             at org.jboss.ha.framework.server.ClusterPartition.startService(ClusterPartition.java:341)
             at org.jboss.system.ServiceMBeanSupport.jbossInternalStart(ServiceMBeanSupport.java:289)
             at org.jboss.system.ServiceMBeanSupport.jbossInternalLifecycle(ServiceMBeanSupport.java:245)
            
             at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
             at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
             at java.lang.reflect.Method.invoke(Method.java:585)
             at org.jboss.mx.interceptor.ReflectedDispatcher.invoke(ReflectedDispatcher.java:155)
             at org.jboss.mx.server.Invocation.dispatch(Invocation.java:94)
             at org.jboss.mx.server.Invocation.invoke(Invocation.java:86)
             at org.jboss.mx.server.AbstractMBeanInvoker.invoke(AbstractMBeanInvoker.java:264)
             at org.jboss.mx.server.MBeanServerImpl.invoke(MBeanServerImpl.java:659)
             at org.jboss.system.ServiceController$ServiceProxy.invoke(ServiceController.java:978)
             at $Proxy0.start(Unknown Source)
             at org.jboss.system.ServiceController.start(ServiceController.java:417)
             at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
             at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
             at java.lang.reflect.Method.invoke(Method.java:585)
             at org.jboss.mx.interceptor.ReflectedDispatcher.invoke(ReflectedDispatcher.java:155)
             at org.jboss.mx.server.Invocation.dispatch(Invocation.java:94)
             at org.jboss.mx.server.Invocation.invoke(Invocation.java:86)
             at org.jboss.mx.server.AbstractMBeanInvoker.invoke(AbstractMBeanInvoker.java:264)


            Logging in the GossipRouter console :
            d:\software\jboss-4.0.5.GA\server\clustertest-node1\lib>java -cp commons-logging.jar;jgroups.jar org
            .jgroups.stack.GossipRouter -port 12001
            GossipRouter is starting...
            GossipRouter started at Mon Mar 12 16:25:17 CET 2007
            Listening on port 12001 bound on address 0.0.0.0/0.0.0.0
            <START STARTUP node1>
            <END STARTUP node1>
            
            <START STARTUP node2>
            12-mrt-2007 16:28:05 org.jgroups.stack.GossipRouter$SocketThread run
            SEVERE: exception=java.net.SocketException: Connection reset
            12-mrt-2007 16:28:07 org.jgroups.stack.GossipRouter route
            SEVERE: failed sending message to localhost:4806: Socket closed
            12-mrt-2007 16:28:19 org.jgroups.stack.GossipRouter route
            SEVERE: cannot find address localhost:4806 in the routing table
            12-mrt-2007 16:28:28 org.jgroups.stack.GossipRouter route
            SEVERE: cannot find address localhost:4806 in the routing table
            12-mrt-2007 16:28:39 org.jgroups.stack.GossipRouter route
            SEVERE: cannot find address localhost:4806 in the routing table
            12-mrt-2007 16:28:47 org.jgroups.stack.GossipRouter sweep
            INFO: Removed member addr=localhost:4806, timestamp=1173713281504 from group DefaultPartition(46365
            msecs old)
            12-mrt-2007 16:28:47 org.jgroups.stack.GossipRouter sweep
            INFO: done (removed 1 entries)
            <END STARTUP node2>


            • 4. Re: WAN Cluster setup

              Hi,

              I dived into the code off the app server. This is the method from the org.jboss.ha.framework.server.HAPartitionImpl wich is crashing :

              protected void verifyNodeIsUnique (Vector javaGroupIpAddresses) throws Exception
               {
               byte[] localUniqueName = this.localJGAddress.getAdditionalData();
               if (localUniqueName == null)
               log.warn("No additional information has been found in the JavaGroup address: " +
               "make sure you are running with a correct version of JGroups and that the protocol " +
               " you are using supports the 'additionalData' behaviour");
              
               for (int i = 0; i < javaGroupIpAddresses.size(); i++)
               {
               IpAddress address = (IpAddress) javaGroupIpAddresses.elementAt(i);
               if (!address.equals(this.localJGAddress))
               {
               if (localUniqueName.equals(address.getAdditionalData()))
               throw new Exception ("Local node removed from cluster (" + this.localJGAddress + "): another node (" + address + ") publicizing the same name was already there");
               }
               }
               }


              The bold line is the one causing the NPE. I suppose the localUniqueName variable is null. What could be wrong in my setup?

              Please help me. I have to be sure this setup is working for next friday.

              Johan,

              • 5. Re: WAN Cluster setup
                brian.stansberry

                What AS version and JGroups release?

                • 6. Re: WAN Cluster setup
                  brian.stansberry

                  For the NPE, see http://jira.jboss.com/jira/browse/JBAS-4202. Thanks for pointing it out.

                  I haven't completely verified it, but I'm pretty sure the TUNNEL protocol does not properly support the additional_data feature, which means you'll get this NPE. I think your only short term workaround is to patch HAPartitionImpl to return after the WARN if localUniqueName == null.

                  • 7. Re: WAN Cluster setup
                    brian.stansberry

                    Instead of patching the NPE, try upgrading server/all/lib/jgroups.jar to JGroups version 2.2.8 or later. See http://labs.jboss.com/portal/jbosscache/compatibility/index.html for version compatibility info.

                    In 2.2.7, the default version in most AS releases over the last couple years, TUNNEL has no support for additional_data.

                    • 8. Re: WAN Cluster setup

                      Thank you very much for replying. I patched the method as follows :

                      protected void verifyNodeIsUnique (Vector javaGroupIpAddresses) throws Exception
                       {
                       byte[] localUniqueName = this.localJGAddress.getAdditionalData();
                       if (localUniqueName == null){
                       log.warn("No additional information has been found in the JavaGroup address: " +
                       "make sure you are running with a correct version of JGroups and that the protocol " +
                       " you are using supports the 'additionalData' behaviour");
                       //path issue JBAS-4202 - http://jira.jboss.com/jira/browse/JBAS-4202
                       return;
                       } else {
                       for (int i = 0; i < javaGroupIpAddresses.size(); i++)
                       {
                       IpAddress address = (IpAddress) javaGroupIpAddresses.elementAt(i);
                       if (!address.equals(this.localJGAddress))
                       {
                       if (localUniqueName.equals(address.getAdditionalData()))
                       throw new Exception ("Local node removed from cluster (" + this.localJGAddress + "): another node (" + address + ") publicizing the same name was already there");
                       }
                       }
                       }
                       }


                      This is working fine. When may we expect the real fix from you guys?

                      Just another question : in my firewall's I only should open the GossipRouter port... correct?

                      Thanks for helping me out.

                      Johan

                      • 9. Re: WAN Cluster setup
                        brian.stansberry

                        The JIRA is fixed in Branch_4_2, where it will go into the 4.2.0.GA release that's out in April. It's fixed in the 3.2, 4.0 and trunk branches, but it's not clear when the next release off those lines will be.

                        Your fix is essentially what I did.

                        Please try a later JGroups though. With that the NPE may not even happen. The localUniqueName really shouldn't be null; that's why we log the WARN.