5 Replies Latest reply on Jan 11, 2002 8:22 AM by slaboure

    Strange behaviour with 3 nodes

    robster

      Hi
      I'm trying JBoss 3.0.0 alpha clustering on NT/2000. I am trying to cluster an entity bean with round robin (RR) load balancing. I am using Struts in a single web tier instance, which caches a home interface found using HAJNDI.

      With two nodes - all works fine, and the RR works great. I can take a node out, then restore it and all works fine. Thus if nodes were numbered, I get a sequence like 1,2,1,2,1,2.....

      With three nodes, I get unexpected behaviour. I ensure that all three nodes are going before I install the client. What I observe is only two of the nodes being used. The third is not used. I also observe that one of the nodes gets twice as many hits, ie a sequence like;
      1,1,2,1,1,2,1,1,2,1,1,2...

      regards

      Rob

        • 1. Re: Strange behaviour with 3 nodes
          slaboure

          Hello,

          Thank you for your report. Could you please provide us with a JBoss log file (in debug mode) or something equivalent that could help us to discover what is going wrong? You could also put this on Sourceforge as a bug.

          Thank you! Cheers,


          Sacha

          • 2. same problem
            edwu00

            Only 2 nodes can work on the 3.0.0-alpha.
            I read the log files. the 3rd node
            can never join the partition.

            Then when you try to kill the jboss by control-C.
            There is dead lock since it still tries to join the
            partition. I have to use kill -9 to do it.

            Ed

            ==================
            log file from 3rd node which can not join the partition:


            [10:04:45,904,AutoDeployer] Auto deploy of file:/home/gcg/jboss-3.0.0alpha/jboss/deploy/cluster-service.xml
            [10:04:45,931,ServiceCreator] About to create the beanJBOSS-SYSTEM:service=DefaultPartition
            [10:04:45,956,ServiceCreator] Created the beanJBOSS-SYSTEM:service=DefaultPartition
            [10:04:45,958,ClusterPartition] SynchronizedMBeans set to [JBOSS-SYSTEM:service=HASessionState, JBOSS-SYSTEM:service=HAJNDI]
            [10:04:45,961,ServiceCreator] About to create the beanJBOSS-SYSTEM:service=HASessionState
            [10:04:45,983,ServiceCreator] Created the beanJBOSS-SYSTEM:service=HASessionState
            [10:04:45,984,HASessionStateService] Starting
            [10:04:45,985,HASessionStateService] Started
            [10:04:45,987,ServiceCreator] About to create the beanJBOSS-SYSTEM:service=HAJNDI
            [10:04:46,010,ServiceCreator] Created the beanJBOSS-SYSTEM:service=HAJNDI
            [10:04:46,011,HANamingService] Starting
            [10:04:46,011,HANamingService] Started
            [10:04:46,012,ClusterPartition] Starting
            [10:04:46,012,ClusterPartition] Creating JavaGroups JChannel
            [10:04:46,383,ClusterPartition] Creating HAPartition...
            [10:04:46,477,ClusterPartition] ...Initing HAPartition...
            [10:04:46,510,HAPartition:DefaultPartition] creating SubcontextHAPartition
            [10:04:46,511,HAPartition:DefaultPartition] done initing..
            [10:04:46,512,ClusterPartition] ...HAPartition initialized.
            [10:04:46,512,ClusterPartition] registering JBOSS-SYSTEM:service=HASessionState
            [10:04:46,554,HASessionState-/HASessionState/Default] creating SubcontextHASessionState
            [10:04:46,555,HASessionState-/HASessionState/Default] ...HAPartition initialized.
            [10:04:46,555,ClusterPartition] registered JBOSS-SYSTEM:service=HASessionState
            [10:04:46,556,ClusterPartition] registering JBOSS-SYSTEM:service=HAJNDI
            [10:04:46,556,HANamingService] Initializing HAJNDI server
            [10:04:46,556,HANamingService] jndi lookup of /HAPartition/DefaultPartition
            [10:04:46,557,HANamingService] Create remote object
            [10:04:46,563,HANamingService] initialize HAJNDI
            [10:04:46,564,HAJNDI] subscribeToStateTransferEvents
            [10:04:46,564,HAJNDI] registerRPCHandler
            [10:04:46,564,ClusterPartition] registered JBOSS-SYSTEM:service=HAJNDI
            [10:04:46,564,ClusterPartition] Starting ClusterPartition: DefaultPartition
            [10:04:46,565,ClusterPartition] Connecting to channel
            [10:04:46,593,Default]
            -------------------------------------------------------
            GMS: address is southcity:32777
            -------------------------------------------------------
            [10:04:48,557,HAPartition:DefaultPartition] Handle: DistributedState._set
            [10:04:49,625,HAPartition:DefaultPartition] new view accepted: 0 ([southcity:32777])
            [10:04:49,625,HAPartition:DefaultPartition] ViewAccepted: initial members set
            [10:04:49,626,ClusterPartition] Starting channel
            [10:04:49,626,HAPartition:DefaultPartition] Num cluster members: 1
            [10:04:49,630,HAPartition:DefaultPartition] SetState called
            [10:04:49,631,HAPartition:DefaultPartition] state is null
            [10:04:49,631,HAPartition:DefaultPartition] State could not be retrieved, (must be first member of group)
            [10:04:49,631,DefaultPartition:ReplicantManager] mergemembers
            [10:04:49,631,DefaultPartition:ReplicantManager] start MergeMembers
            [10:04:49,952,DefaultPartition:ReplicantManager] notifyKeyListeners
            [10:04:49,952,ClusterPartition] registering JBOSS-SYSTEM:service=HASessionState
            [10:04:49,953,HASessionState-/HASessionState/Default] HASessionState node name : southcity:32777
            [10:04:49,958,DefaultPartition:ReplicantManager] notifyKeyListeners
            [10:04:49,959,HASessionState-/HASessionState/Default] A new HASessionState topology needs to be computed by the master node => this node.
            [10:04:49,959,HASessionState-/HASessionState/Default] New nodes: [southcity:32777]
            [10:04:49,962,HASessionState-/HASessionState/Default] Computed topology : {
            SessionState-'/HASessionState/Default'-Group-1:[[southcity:32777]] aka '[]'
            }
            [10:04:49,968,HASessionState-/HASessionState/Default] Starting repartitioning... :{
            SessionState-'/HASessionState/Default'-Group-1:[[southcity:32777]] aka '[]'
            }
            [10:04:49,969,HASessionState-/HASessionState/Default] We were not yet connected. We connect to sub-partition SessionState-'/HASessionState/Default'-Group-1
            [10:04:49,985,HAPartition:SessionState-'|HASessionState|Default'-Group-1] done initing..
            [10:04:49,990,Default]
            -------------------------------------------------------
            GMS: address is southcity:32779
            -------------------------------------------------------
            [10:04:52,998,HAPartition:SessionState-'|HASessionState|Default'-Group-1] new view accepted: 0 ([southcity:32779])
            [10:04:52,999,HAPartition:SessionState-'|HASessionState|Default'-Group-1] ViewAccepted: initial members set
            [10:04:53,000,HAPartition:SessionState-'|HASessionState|Default'-Group-1] Num cluster members: 1
            [10:04:53,000,HAPartition:SessionState-'|HASessionState|Default'-Group-1] SetState called
            [10:04:53,000,HAPartition:SessionState-'|HASessionState|Default'-Group-1] state is null
            [10:04:53,000,HAPartition:SessionState-'|HASessionState|Default'-Group-1] State could not be retrieved, (must be first member of group)
            [10:04:53,000,SessionState-'|HASessionState|Default'-Group-1:ReplicantManager] mergemembers
            [10:04:53,001,SessionState-'|HASessionState|Default'-Group-1:ReplicantManager] start MergeMembers
            [10:04:53,003,SessionState-'|HASessionState|Default'-Group-1:ReplicantManager] notifyKeyListeners
            [10:04:53,003,HASessionState-/HASessionState/Default] Repartitioning done.
            [10:04:53,004,ClusterPartition] registered JBOSS-SYSTEM:service=HASessionState
            [10:04:53,004,ClusterPartition] registering JBOSS-SYSTEM:service=HAJNDI
            [10:04:53,004,HANamingService] Starting HAJNDI server
            [10:04:53,004,HANamingService] Create HARMIServer proxy
            [10:04:53,133,DefaultPartition:ReplicantManager] notifyKeyListeners
            [10:04:53,169,HANamingService] Start listener
            [10:04:53,169,HANamingService] Started hajndiPort=1100
            [10:04:53,175,ClusterPartition] registered JBOSS-SYSTEM:service=HAJNDI
            [10:04:53,176,ClusterPartition] Started ClusterPartition: DefaultPartition
            [10:04:53,176,ClusterPartition] Started



            • 3. Re: same problem
              slaboure

              Could you please provide us with your OS and JDK version?

              3.0.0alpha can work with more than 2 nodes. This problem has occured with some JDK/OS. Next release will include a new JavaGroups property string that will better fit. Neverthteless, this new property string is not yet defined. In the clustering configuration file, may you please modify the default partition string with these properties:

              UDP(mcast_addr=224.0.0.35;mcast_port=45566;ip_ttl=64;
              mcast_send_buf_size=80000;mcast_recv_buf_size=80000):
              PING(timeout=2000;num_initial_members=3):
              MERGE2(min_interval=5000;max_interval=10000):
              FD:
              VERIFY_SUSPECT(timeout=1500):
              pbcast.STABLE(desired_avg_gossip=20000):
              pbcast.NAKACK(gc_lag=50;retransmit_timeout=300,600,1200,2400,4800):
              UNICAST(timeout=5000;min_wait_time=2000):
              FRAG(frag_size=4096;down_thread=false;up_thread=false):
              pbcast.GMS(join_timeout=5000;join_retry_timeout=2000;shun=false;print_local_addr=true):
              pbcast.STATE_TRANSFER

              (if you don't use SFSB, disable the SFSB service in the clustering config file)

              • 4. Re: same problem
                vlada

                Sacha,


                This issue is most likely related to InetAddress nis resolution while there is a traffic in jg. It happens at my home setup on adsl modem until I map my loopback 127.0.0.1 to my domain name xisnext.2y.net.

                In university lab it works fine however.

                Best,
                Vladimir

                • 5. Re: same problem
                  slaboure

                  Hello Vladimir,

                  As a JavaGroups core contributor ;) could you please expand a little a bit on this? What do you suspect to be the problem? Do you suspect some reverse lookup taking place or something like this?