2 Replies Latest reply on Oct 1, 2005 4:48 AM by belaban

Cluster member rejecting itself

kkoster Sep 21, 2005 7:56 AM

I have a cluster/farm of 4 Win2003 server machines each with 4.0.3RC2. I have clustering configured with the TCP example of the JGroups stack. When I stop and restart a member in the cluster it:

1) does not pull the deployment from the farm even if the farm directory is cleared
2) fails to join the cluster until after the deployment cycle is complete.
3) after joining the cluster begins reporting messages that seem to indicate that it is not a valid member of the cluster.
4) the fourth machine, which is out of multicast range, is never joining the group/farm. (Although I can see the ports bound to the address on netstat)

Also a quick question, do HA-JNDI clients have to be in multicast range to at least one member of the cluster? Or is there a way to point them to a member and still participate in load balancing. Our network group is paranoid about enabling multicast across network segments and all servers are deployed outside of users' segments.

Any help on any/all of these issues/questions would be appreciated.

These are the messages the server is posting to the console (CTPRODENERGY02 is the server whose log this is):

07:39:30,458 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2867 is not a member !
07:39:30,880 WARN [NAKACK] [CTPRODENERGY02:2910] discarded message from non-member ctprodenergy03:1321
07:39:32,177 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2873 is not a member !
07:39:33,458 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2867 is not a member !
07:39:35,177 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2873 is not a member !
07:39:36,458 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2867 is not a member !
07:39:36,802 WARN [NAKACK] [CTPRODENERGY02:2906] discarded message from non-member ctprodenergy05:1236
07:39:38,177 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2873 is not a member !
07:39:39,458 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2867 is not a member !
07:39:40,474 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2882 is not a member !
07:39:41,177 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2873 is not a member !
07:39:42,458 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2867 is not a member !
07:39:43,474 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2882 is not a member !
07:39:44,177 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2873 is not a member !
07:39:45,458 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2867 is not a member !
07:39:46,474 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2882 is not a member !
07:39:47,177 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2873 is not a member !
07:39:48,458 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2867 is not a member !
07:39:49,474 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2882 is not a member !
07:39:50,177 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2873 is not a member !
07:39:51,458 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2867 is not a member !
07:39:52,474 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2882 is not a member !
07:39:53,177 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2873 is not a member !
07:39:54,458 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2867 is not a member !
07:39:55,474 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2882 is not a member !
07:39:56,177 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2873 is not a member !
07:39:57,458 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2867 is not a member !
07:39:58,474 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2882 is not a member !
07:39:58,880 WARN [NAKACK] [CTPRODENERGY02:2910] discarded message from non-member ctprodenergy03:1321
07:39:59,177 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2873 is not a member !
07:40:00,458 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2867 is not a member !
07:40:01,474 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2882 is not a member !
07:40:02,177 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2873 is not a member !
07:40:03,458 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2867 is not a member !
07:40:04,474 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2882 is not a member !
07:40:05,177 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2873 is not a member !
07:40:06,458 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2867 is not a member !
07:40:07,474 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2882 is not a member !
07:40:08,177 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2873 is not a member !
07:40:09,458 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2867 is not a member !
07:40:10,474 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2882 is not a member !
07:40:11,177 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2873 is not a member !
07:40:12,458 ERROR [CoordGmsImpl] mbr CTPRODENERGY02:2867 is not a member !

This is the content of my cluster.xml:

<?xml version="1.0" encoding="UTF-8"?>

<!-- ===================================================================== -->
<!-- -->
<!-- Sample Clustering Service Configuration -->
<!-- -->
<!-- ===================================================================== -->

<server>

 <!-- ==================================================================== -->
 <!-- Cluster Partition: defines cluster -->
 <!-- ==================================================================== -->

 <mbean code="org.jboss.ha.framework.server.ClusterPartition"
 name="jboss:service=${jboss.partition.name:DefaultPartition}">

 <!-- Name of the partition being built -->
 <attribute name="PartitionName">${jboss.partition.name:DefaultPartition}</attribute>

 <!-- The address used to determine the node name -->
 <attribute name="NodeAddress">${jboss.bind.address}</attribute>

 <!-- Determine if deadlock detection is enabled -->
 <attribute name="DeadlockDetection">False</attribute>

 <!-- Max time (in ms) to wait for state transfer to complete. Increase for large states -->
 <attribute name="StateTransferTimeout">30000</attribute>

 <!-- The JGroups protocol configuration -->
 <attribute name="PartitionConfig">
 <!-- Alternate TCP stack: customize it for your environment, change bind_addr and initial_hosts -->
 <Config>
 <TCP bind_addr="ctprodenergy02" start_port="7800" loopback="true"/>
 <TCPPING initial_hosts="ctprodenergy02[7800],ctprodenergy03[7800],ctprodenergy05[7800],kkoster-2k[7800]" port_range="3" timeout="3500"
 num_initial_members="4" up_thread="true" down_thread="true"/>
 <MERGE2 min_interval="5000" max_interval="10000"/>
 <FD shun="true" timeout="2500" max_tries="5" up_thread="true" down_thread="true" />
 <VERIFY_SUSPECT timeout="1500" down_thread="false" up_thread="false" />
 <pbcast.NAKACK down_thread="true" up_thread="true" gc_lag="100"
 retransmit_timeout="3000"/>
 <pbcast.STABLE desired_avg_gossip="20000" down_thread="false" up_thread="false" />
 <pbcast.GMS join_timeout="5000" join_retry_timeout="2000" shun="false"
 print_local_addr="true" down_thread="true" up_thread="true"/>
 <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>
 </Config>
 </attribute>

 </mbean>

 <!-- ==================================================================== -->
 <!-- HA Session State Service for SFSB -->
 <!-- ==================================================================== -->

 <mbean code="org.jboss.ha.hasessionstate.server.HASessionStateService"
 name="jboss:service=HASessionState">
 <depends>jboss:service=${jboss.partition.name:DefaultPartition}</depends>
 <!-- Name of the partition to which the service is linked -->
 <attribute name="PartitionName">${jboss.partition.name:DefaultPartition}</attribute>
 <!-- JNDI name under which the service is bound -->
 <attribute name="JndiName">/HASessionState/Default</attribute>
 <!-- Max delay before cleaning unreclaimed state.
 Defaults to 30*60*1000 => 30 minutes -->
 <attribute name="BeanCleaningDelay">0</attribute>
 </mbean>

 <!-- ==================================================================== -->
 <!-- HA JNDI -->
 <!-- ==================================================================== -->

 <mbean code="org.jboss.ha.jndi.HANamingService"
 name="jboss:service=HAJNDI">
 <depends>jboss:service=${jboss.partition.name:DefaultPartition}</depends>
 <!-- Name of the partition to which the service is linked -->
 <attribute name="PartitionName">${jboss.partition.name:DefaultPartition}</attribute>
 <!-- Bind address of bootstrap and HA-JNDI RMI endpoints -->
 <attribute name="BindAddress">${jboss.bind.address}</attribute>
 <!-- Port on which the HA-JNDI stub is made available -->
 <attribute name="Port">1100</attribute>
 <!-- Accept backlog of the bootstrap socket -->
 <attribute name="Backlog">50</attribute>
 <!-- The thread pool service used to control the bootstrap and
 auto discovery lookups -->
 <depends optional-attribute-name="LookupPool"
 proxy-type="attribute">jboss.system:service=ThreadPool</depends>

 <!-- A flag to disable the auto discovery via multicast -->
 <attribute name="DiscoveryDisabled">false</attribute>
 <!-- Set the auto-discovery bootstrap multicast bind address. If not
 specified and a BindAddress is specified, the BindAddress will be used. -->
 <attribute name="AutoDiscoveryBindAddress">${jboss.bind.address}</attribute>
 <!-- Multicast Address and group port used for auto-discovery -->
 <attribute name="AutoDiscoveryAddress">${jboss.partition.udpGroup:230.0.0.4}</attribute>
 <attribute name="AutoDiscoveryGroup">1102</attribute>
 <!-- The TTL (time-to-live) for autodiscovery IP multicast packets -->
 <attribute name="AutoDiscoveryTTL">16</attribute>

 <!-- RmiPort to be used by the HA-JNDI service once bound. 0 => auto. -->
 <attribute name="RmiPort">0</attribute>
 <!-- Client socket factory to be used for client-server
 RMI invocations during JNDI queries
 <attribute name="ClientSocketFactory">custom</attribute>
 -->
 <!-- Server socket factory to be used for client-server
 RMI invocations during JNDI queries
 <attribute name="ServerSocketFactory">custom</attribute>
 -->
 </mbean>

 <mbean code="org.jboss.invocation.jrmp.server.JRMPInvokerHA"
 name="jboss:service=invoker,type=jrmpha">
 <attribute name="ServerAddress">${jboss.bind.address}</attribute>
 <!--
 <attribute name="RMIObjectPort">0</attribute>
 <attribute name="RMIClientSocketFactory">custom</attribute>
 <attribute name="RMIServerSocketFactory">custom</attribute>
 -->
 </mbean>

 <!-- the JRMPInvokerHA creates a thread per request. This implementation uses a pool of threads -->
 <mbean code="org.jboss.invocation.pooled.server.PooledInvokerHA"
 name="jboss:service=invoker,type=pooledha">
 <attribute name="NumAcceptThreads">1</attribute>
 <attribute name="MaxPoolSize">300</attribute>
 <attribute name="ClientMaxPoolSize">300</attribute>
 <attribute name="SocketTimeout">60000</attribute>
 <attribute name="ServerBindAddress">${jboss.bind.address}</attribute>
 <attribute name="ServerBindPort">4446</attribute>
 <attribute name="ClientConnectAddress">${jboss.bind.address}</attribute>
 <attribute name="ClientConnectPort">0</attribute>
 <attribute name="EnableTcpNoDelay">false</attribute>
 <depends optional-attribute-name="TransactionManagerService">jboss:service=TransactionManager</depends>
 </mbean>

 <!-- ==================================================================== -->

 <!-- ==================================================================== -->
 <!-- Distributed cache invalidation -->
 <!-- ==================================================================== -->

 <mbean code="org.jboss.cache.invalidation.bridges.JGCacheInvalidationBridge"
 name="jboss.cache:service=InvalidationBridge,type=JavaGroups">
 <depends>jboss:service=${jboss.partition.name:DefaultPartition}</depends>
 <depends>jboss.cache:service=InvalidationManager</depends>
 <attribute name="InvalidationManager">jboss.cache:service=InvalidationManager</attribute>
 <attribute name="PartitionName">${jboss.partition.name:DefaultPartition}</attribute>
 <attribute name="BridgeName">DefaultJGBridge</attribute>
 </mbean>

</server>

1. Re: Cluster member rejecting itself

perfectionist Sep 30, 2005 5:04 PM (in response to kkoster)

We are getting the same type of messages with 4.0.3RC2 and we have done nothing to startup clusting. We are now having problems. We are just developing on an internal network and starting to see strange behavior where it looks like objects are being serialized between developer machines (we each have our own JBoss server running).

How do you turn this behavior off.

15:54:44,970 ERROR [CoordGmsImpl] mbr 10.10.10.53:4871 is not a member !
15:54:45,549 ERROR [CoordGmsImpl] mbr 10.10.10.53:4876 is not a member !
15:55:08,708 WARN [NAKACK] [192.168.2.1:49485] discarded message from non-member 10.10.10.56:1717
15:55:31,792 WARN [NAKACK] [192.168.2.1:49485] discarded message from non-member 10.10.10.51:1131
15:55:34,568 WARN [NAKACK] [192.168.2.1:49485] discarded message from non-member 10.10.10.56:1717
15:55:47,693 WARN [NAKACK] [192.168.2.1:49485] discarded message from non-member 10.10.10.56:1717
15:55:53,418 WARN [NAKACK] [192.168.2.1:49485] discarded message from non-member 10.10.10.51:1131
15:55:59,572 WARN [NAKACK] [192.168.2.1:49480] discarded message from non-member 10.10.10.51:1126
15:56:01,959 WARN [NAKACK] [192.168.2.1:49480] discarded message from non-member 10.10.10.56:1712
15:56:03,756 WARN [NAKACK] [192.168.2.1:49480] discarded message from non-member 10.10.10.51:1126
15:56:08,874 WARN [NAKACK] [192.168.2.1:49485] discarded message from non-member 10.10.10.51:1131
15:56:11,209 WARN [NAKACK] [192.168.2.1:49485] discarded message from non-member 10.10.10.56:1717
15:56:18,882 WARN [NAKACK] [192.168.2.1:49485] discarded message from non-member 10.10.10.56:1717
15:56:19,909 WARN [NAKACK] [192.168.2.1:49485] discarded message from non-member 10.10.10.51:1131
15:56:25,591 WARN [NAKACK] [192.168.2.1:49480] discarded message from non-member 10.10.10.51:1126
15:56:26,752 WARN [NAKACK] [192.168.2.1:49480] discarded message from non-member 10.10.10.51:1126
15:56:28,237 WARN [NAKACK] [192.168.2.1:49480] discarded message from non-member 10.10.10.56:1712
15:56:41,584 WARN [NAKACK] [192.168.2.1:49485] discarded message from non-member 10.10.10.51:1131
15:57:22,349 WARN [NAKACK] [192.168.2.1:49485] discarded message from non-member 10.10.10.56:1717

2. Re: Cluster member rejecting itself

belaban Oct 1, 2005 4:48 AM (in response to kkoster)

Make sure you separate your clusters by using different mcast_addr/mcast_ports combinations
Actions

Go to original post