4 Replies Latest reply on Nov 12, 2007 4:58 AM by rajeshchande

    JBoss Cluster on Multihomed zone

      Hello,

      Scenario 1:

      JBOss: 4.0.3sp1
      JDK: 1.5.0_12
      Sun OS: SunOS 5.10 Generic_118822-18 sun4u sparc SUNW,Sun-Fire-V240
      SINGLE HOMED

      I have setup a JBoss cluster between 2 physical separate servers running successfully. Farm deployment works fine. Session replication is fine. In this configuration, by default the bind_addr is 0.0.0.0 (Correct me If I am wrong here.), as I have not specified anythng in cluster-service.xml
      Multicatsing test is successful: http://www.jgroups.org/javagroupsnew/docs/manual/html/ch02.html#ItDoesntWork

      Note : Scenario 1 is just to demonstrate that I have successfully implemented a cluster with loadbalancing, failover and session replication.

      Scenario 2 :

      JBOss: 4.0.3sp1
      JDK: 1.5.0_12
      SunOS 5.10 Generic_118833-36 sun4v sparc SUNW,Sun-Fire-T200
      Remarks: "Zones" concept is used
      MULTI HOMED - 3 interfaces

      ifconfig -a
      lo0:3: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
      inet 127.0.0.1 netmask ff000000
      e1000g0:3: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
      inet 3.183.188.111 netmask ffffff00 broadcast 3.183.188.255
      e1000g1:3: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
      inet 3.183.158.73 netmask ffffff00 broadcast 3.183.158.255
      e1000g2:3: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 4
      inet 10.50.1.7 netmask ffffff00 broadcast 10.50.1.255

      I am trying to setup a JBoss cluster between 2 solaris zones and they DO NOT seem to form a cluster, if I specify in cluster-service.xml the bind_addr="10.50.1.7" , the cluster is not formed at all. But If I specify the bind_addr=3.183.158.73 and bind_addr=3.183.158.80, then the cluster is FORMED but the session replication does not work.

      The zones are in the same vlan and same subnet.

      Multicatsing test is UNSUCCESSFUL: http://www.jgroups.org/javagroupsnew/docs/manual/html/ch02.html#ItDoesntWork
      The following is successful on 3.183.158: http://www.jgroups.org/javagroupsnew/docs/manual/html/ch02.html#d0e512

      Queries:

        • 1. Re: JBoss Cluster on Multihomed zone

          Queries:
          - Any special configuration for zones on Solaris 10?
          - Specifying bind_addr in cluster-service.xml in a MULTI HOMED machine is MANDATORY?
          - Everytime it uses ONLY 3.183.158 interface to communicate, why not other interfaces like 10.50.1?
          - Any special configuration required to bind the JBoss intances on 10.50.1 NIC and make cluster running? The requirement is to make the cluster working on 10.50.1, what is to be done in that case?
          - Even when the cluster forms with 3.183.158, why the session replication does not work?

          Thanks & Regards,
          Rajesh.

          • 2. Re: JBoss Cluster on Multihomed zone
            brian.stansberry

            A few key points:

            1) If you don't tell JGroups what address to bind to, it doesn't bind to all interfaces, it picks one. So, on a multihomed machine, it's always best to tell it which one to use.

            2) There are multiple JGroups channels in an AS (in 4.0.3.SP1, there are two). Http session replication uses a different channel from the one configured in cluster-service.xml. It's channel is configured in tc5-cluster-service.xml.

            3) If you don't want to manually edit those file to set bind_addr, it can be done globally by starting jboss with the -b switch

            ./run.sh -b 3.183.158.1 -c all

            Doing that sets a global override.

            Why it works on 3.183.158 and not 10.50.1 is likely the result of some OS/network layer configuration on your machine or on the networking hardware between machines.

            • 3. Re: JBoss Cluster on Multihomed zone

              I have specified the bind_addr to 3.183.158 and the cluster works fine (Instances form a cluster while startup).

              I have followed the http://docs.hp.com/en/5991-5569/ar01s07.html and made session replication successfully worked in scenario 1.

              The session replication FAILS to work in scenario 2, What specific changes in tc5-cluster-service.xml required? and why so? Why difference in scenario 1 and 2?

              In scenario 2:

              When accessing the counter application:

              instance 1 log is:

              ##############

              2007/11/12 04:24:37 | 04:24:37,268 INFO [Server] JBoss (MX MicroKernel) [4.0.3SP1 (build: CVSTag=JBoss_4_0_3_SP1 date=200510231054)] Started in 1m:5s:486ms
              2007/11/12 05:22:17 | 05:22:17,583 INFO [TreeCache] viewAccepted(): new members: [int-app-devlc1-01:60788, int-app-devlc1-02:61593]
              2007/11/12 06:00:54 | 06:00:54,828 INFO [FarmMemberService] farmDeployment(), deploy locally: farm/counter.war
              2007/11/12 06:00:55 | 06:00:55,802 INFO [TomcatDeployer] deploy, ctxPath=/counter, warUrl=.../tmp/deploy/tmp698counter.war/
              2007/11/12 06:00:56 | 06:00:56,360 INFO [JBossCacheManager] init(): replicationGranularity_ is 1 and invaldateSessionPolicy is 2
              2007/11/12 06:00:56 | 06:00:56,481 INFO [JBossCacheManager] Starting JBossManager
              2007/11/12 06:00:56 | 06:00:56,493 INFO [JBossCacheManager] We are using mod_jk(2) for load-balancing. Will add JvmRouteValve.
              2007/11/12 06:02:52 | 06:02:52,146 INFO [STDOUT] 4Gv6Zk8uMNFykr6fX1vOCw**.worker2, refresh count:1
              2007/11/12 06:03:01 | 06:03:01,552 INFO [STDOUT] 4Gv6Zk8uMNFykr6fX1vOCw**.worker2, refresh count:2
              2007/11/12 06:03:03 | 06:03:03,249 INFO [STDOUT] 4Gv6Zk8uMNFykr6fX1vOCw**.worker2, refresh count:3
              2007/11/12 06:03:04 | 06:03:04,308 INFO [STDOUT] 4Gv6Zk8uMNFykr6fX1vOCw**.worker2, refresh count:4
              2007/11/12 06:03:33 | TERM trapped. Shutting down.
              2007/11/12 06:03:34 | 06:03:34,487 INFO [Server] Runtime shutdown hook called, forceHalt: true

              ##############

              After shutting instance 1 down, instance 2 has the following on aacceing the counter application:

              ##############
              2007/11/12 05:41:36 | 05:41:36,061 INFO [TreeCache] viewAccepted(): new members: [int-app-devlc1-01:60788]
              2007/11/12 05:41:41 | 05:41:41,652 INFO [STDOUT] 4Gv6Zk8uMNFykr6fX1vOCw**.worker2, refresh count:1
              2007/11/12 05:41:44 | 05:41:44,902 INFO [STDOUT] 4Gv6Zk8uMNFykr6fX1vOCw**.worker1, refresh count:2

              ##############

              The count should BEGIN from "5"

              Any suggestions why session replication is failing?

              Regards,
              Rajesh

              • 4. Re: JBoss Cluster on Multihomed zone

                Hello Bela,

                As suggested, I have changed the tc5-cluster-service.xml as shown below and session replication now works fine.

                ############## ################
                vi /apps/jboss/current/server/devl-cluster-01/deploy/tc5-cluster-service.xml
                ---
                ---
                ---
                 <Config>
                 <UDP mcast_addr="${jboss.partition.udpGroup:228.1.2.3}" mcast_port="45566" bind_addr="3.183.158.73"
                 ip_ttl="8" ip_mcast="true"
                 mcast_send_buf_size="8000000" mcast_recv_buf_size="1500000"
                 ucast_send_buf_size="8000000" ucast_recv_buf_size="1500000"
                 loopback="false"/>
                 <PING timeout="2000" num_initial_members="3"
                 up_thread="true" down_thread="true"/>
                ----
                ----
                ----
                ############## ################


                That was a great input.

                Thanks,

                Rajesh.