5 Replies Latest reply on Nov 16, 2010 12:23 PM by wdfink

    member cannot join cluster

    cdesteur

      Hello,

       

      I have two servers in a JBoss cluster configuration.

      This night, these 2 servers have restarted unexpectedly. And when I restart JBoss, it seems that the JBoss are completely independent.

      Here is the extract of the server logfile for the first server :

       

      2010-11-16 14:49:30,192 INFO  [org.jboss.cache.TreeCache] viewAccepted(): [10.2.98.118:55963|0] [10.2.98.118:55963]
      2010-11-16 14:49:30,212 INFO  [org.jboss.cache.TreeCache] TreeCache local address is 10.2.98.118:55963
      2010-11-16 14:49:30,213 INFO  [org.jboss.cache.TreeCache] State could not be retrieved (we are the first member in group)
      2010-11-16 14:49:30,213 INFO  [org.jboss.cache.TreeCache] parseConfig(): PojoCacheConfig is empty
      2010-11-16 14:49:32,635 INFO  [org.jboss.ws.core.server.ServiceEndpointManager] jbossws-1.2.1.GA (build=200704151756)
      2010-11-16 14:49:33,798 INFO  [org.jboss.jmx.adaptor.snmp.agent.SnmpAgentService] SNMP agent going active
      2010-11-16 14:49:34,278 INFO  [org.jboss.ha.framework.interfaces.HAPartition.lc_cluster_prod] Initializing
      2010-11-16 14:49:34,339 INFO  [STDOUT]
      -------------------------------------------------------
      GMS: address is 10.2.98.118:55967
      -------------------------------------------------------
      2010-11-16 14:49:36,415 INFO  [org.jboss.ha.framework.interfaces.HAPartition.lc_cluster_prod] Number of cluster members: 1
      2010-11-16 14:49:36,422 INFO  [org.jboss.ha.framework.interfaces.HAPartition.lc_cluster_prod] Other members: 0
      2010-11-16 14:49:36,424 INFO  [org.jboss.ha.framework.interfaces.HAPartition.lc_cluster_prod] Fetching state (will wait for 30000 milliseconds):
      2010-11-16 14:49:36,463 INFO  [org.jboss.ha.framework.interfaces.HAPartition.lc_cluster_prod] State could not be retrieved (we are the first member in group)
      2010-11-16 14:49:36,673 INFO  [org.jboss.ha.jndi.HANamingService] Started ha-jndi bootstrap jnpPort=1100, backlog=50, bindAddress=/0.0.0.0
      2010-11-16 14:49:36,923 INFO  [org.jboss.ha.jndi.DetachedHANamingService$AutomaticDiscovery] Listening on /0.0.0.0:1102, group=230.0.0.4, HA-JNDI address=10.2.98.118:1100

       

       

      and here is for the second server :

       

      2010-11-16 14:23:44,694 INFO  [org.jboss.cache.TreeCache] viewAccepted(): [10.2.98.117:50679|0] [10.2.98.117:50679]
      2010-11-16 14:23:44,710 INFO  [org.jboss.cache.TreeCache] TreeCache local address is 10.2.98.117:50679
      2010-11-16 14:23:44,710 INFO  [org.jboss.cache.TreeCache] State could not be retrieved (we are the first member in group)
      2010-11-16 14:23:44,710 INFO  [org.jboss.cache.TreeCache] parseConfig(): PojoCacheConfig is empty
      2010-11-16 14:23:45,806 INFO  [org.jboss.ws.core.server.ServiceEndpointManager] jbossws-1.2.1.GA (build=200704151756)
      2010-11-16 14:23:46,225 INFO  [org.jboss.jmx.adaptor.snmp.agent.SnmpAgentService] SNMP agent going active
      2010-11-16 14:23:46,550 INFO  [org.jboss.ha.framework.interfaces.HAPartition.lc_cluster_prod] Initializing
      2010-11-16 14:23:46,626 INFO  [STDOUT]
      -------------------------------------------------------
      GMS: address is 10.2.98.117:50683
      -------------------------------------------------------
      2010-11-16 14:23:48,658 INFO  [org.jboss.ha.framework.interfaces.HAPartition.lc_cluster_prod] Number of cluster members: 1
      2010-11-16 14:23:48,662 INFO  [org.jboss.ha.framework.interfaces.HAPartition.lc_cluster_prod] Other members: 0
      2010-11-16 14:23:48,662 INFO  [org.jboss.ha.framework.interfaces.HAPartition.lc_cluster_prod] Fetching state (will wait for 30000 milliseconds):
      2010-11-16 14:23:48,663 INFO  [org.jboss.ha.framework.interfaces.HAPartition.lc_cluster_prod] State could not be retrieved (we are the first member in group)
      2010-11-16 14:23:48,694 INFO  [org.jboss.ha.jndi.HANamingService] Started ha-jndi bootstrap jnpPort=1100, backlog=50, bindAddress=/0.0.0.0
      2010-11-16 14:23:48,701 INFO  [org.jboss.ha.jndi.DetachedHANamingService$AutomaticDiscovery] Listening on /0.0.0.0:1102, group=230.0.0.4, HA-JNDI address=10.2.98.117:1100
      2010-11-16 14:23:49,161 INFO  [org.jboss.cache.TreeCache] No transaction manager lookup class has been defined. Transactions cannot be used
      2010-11-16 14:23:49,198 INFO  [org.jboss.cache.factories.InterceptorChainFactory] interceptor chain is:

       

      How could I resolve this?Any clues?

       

      Thanks

      Christophe

        • 1. Re: member cannot join cluster
          belaban

          - Which JGroups / JBoss version ?

          - What's your config ?

          - Do you have any firewalls enabled ?

          - Do all of the cluster nodes hang off of the same switch ?

          - Does the switch have firewall settings enabled ?

          • 2. Re: member cannot join cluster
            cdesteur

            I have 2 servers with JBoss 4.2.1 on Windows 2008 SP2

            No firewall activated on the 2 servers, they are on the same VLAN.

            Till yesterday, it worked fine. And no changes have been done on configuration

            • 3. Re: member cannot join cluster
              wdfink

              If you not change the configuration ...

              - is it possible that the configuration was changed before the last restart and not hotdeployed?

              Otherwise I see only that something in the network configuration was changed.

              Are you able to see see and ping the 'other' system?

              Good help you might find in the 'Troubleshooting area of the wiki http://community.jboss.org/wiki/JGroups

              • 4. Re: member cannot join cluster
                cdesteur

                In fact now I see in the log :

                 

                2010-11-16 16:53:36,787 INFO  [org.jboss.ha.framework.interfaces.HAPartition.lc_cluster_prod] New cluster view for partition lc_cluster_prod: 1 ([10.2.98.117:1099, 10.2.98.118:1099] delta: 1)
                2010-11-16 16:53:36,804 INFO  [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.lc_cluster_prod] Merging partitions...
                2010-11-16 16:53:36,808 INFO  [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.lc_cluster_prod] Dead members: 0
                2010-11-16 16:53:36,809 INFO  [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.lc_cluster_prod] Originating groups: [[10.2.98.117:50683|0] [10.2.98.117:50683], [10.2.98.118:64473|0] [10.2.98.118:64473]]
                2010-11-16 16:53:46,235 WARN  [org.jgroups.protocols.pbcast.NAKACK] 10.2.98.118:64469] discarded message from non-member 10.2.98.117:50679, my view is [10.2.98.118:64469|0] [10.2.98.118:64469]
                2010-11-16 16:53:46,461 WARN  [org.jgroups.protocols.pbcast.NAKACK] 10.2.98.118:64469] discarded message from non-member 10.2.98.117:50679, my view is [10.2.98.118:64469|0] [10.2.98.118:64469]
                2010-11-16 16:53:52,503 WARN  [com.adobe.idp.scheduler.jobstore.DSCJobStoreTX] This scheduler instance (GRW8ADOBE21289922745819) is still active but was recovered by another instance in the cluster.  This may cause inconsistent behavior.

                It seems that one instance is still active for the cluster.

                How can I reset the state?

                • 5. Re: member cannot join cluster
                  wdfink

                  What com.adobe.* is?

                  It looks like an application or addon failure.

                  Do you install an application with adobe?

                  I think you restart both instances, right?