1 Reply Latest reply on Feb 5, 2007 10:37 AM by davewebb

JBoss 4.0.5 Clustering Behavior

davewebb Feb 3, 2007 1:18 PM

I have 2 physical servers running the same configuration

JBoss 4.0.5
J2SDK1.4.2_13
OpenSuse 10.2

I have clustered an application on the 2 physical servers. Both servers startup fine, but when reviewing the logs I see the following:

2007-02-03 13:09:24,507 INFO [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] New cluster view for partition DefaultPartition: 8 ([192.168.1.73:1099, 192.168.1.74:1099, 192.168.1.74:1099, 192.168.1.74:1099, 192.168.1.74:1099, 192.168.1.74:1099, 192.168.1.74:1099, 192.168.1.74:1099, 192.168.1.74:1099] delta: 1)
2007-02-03 13:09:24,507 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] I am (192.168.1.74:1099) received membershipChanged event:
2007-02-03 13:09:24,507 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] Dead members: 0 ([])
2007-02-03 13:09:24,507 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] New Members : 0 ([])
2007-02-03 13:09:24,507 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] All Members : 9 ([192.168.1.73:1099, 192.168.1.74:1099, 192.168.1.74:1099, 192.168.1.74:1099, 192.168.1.74:1099, 192.168.1.74:1099, 192.168.1.74:1099, 192.168.1.74:1099, 192.168.1.74:1099])
2007-02-03 13:09:31,799 INFO [org.quartz.impl.jdbcjobstore.JobStoreTX] ClusterManager: detected 1 failed or restarted instances.
2007-02-03 13:09:31,800 INFO [org.quartz.impl.jdbcjobstore.JobStoreTX] ClusterManager: Scanning for instance "browning1170525665233"'s failed in-progress jobs.
2007-02-03 13:09:39,307 INFO [org.quartz.impl.jdbcjobstore.JobStoreTX] ClusterManager: detected 1 failed or restarted instances.
2007-02-03 13:09:39,307 INFO [org.quartz.impl.jdbcjobstore.JobStoreTX] ClusterManager: Scanning for instance "browning1170525665233"'s failed in-progress jobs.
2007-02-03 13:09:46,811 INFO [org.quartz.impl.jdbcjobstore.JobStoreTX] ClusterManager: detected 1 failed or restarted instances.
2007-02-03 13:09:46,811 INFO [org.quartz.impl.jdbcjobstore.JobStoreTX] ClusterManager: Scanning for instance "browning1170525665233"'s failed in-progress jobs.
2007-02-03 13:09:52,046 ERROR [org.jgroups.protocols.pbcast.GMS] [mossberg:32848 (additional data: 17 bytes)] received view <= current view; discarding it (current vid: [browning:32852 (additional data: 17 bytes)|8], new vid: [browning:32852 (additional data: 17 bytes)|1])
2007-02-03 13:09:52,048 ERROR [org.jgroups.protocols.pbcast.GMS] [mossberg:32848 (additional data: 17 bytes)] received view <= current view; discarding it (current vid: [browning:32852 (additional data: 17 bytes)|8], new vid: [browning:32852 (additional data: 17 bytes)|2])
2007-02-03 13:09:52,049 ERROR [org.jgroups.protocols.pbcast.GMS] [mossberg:32848 (additional data: 17 bytes)] received view <= current view; discarding it (current vid: [browning:32852 (additional data: 17 bytes)|8], new vid: [browning:32852 (additional data: 17 bytes)|3])
2007-02-03 13:09:52,049 ERROR [org.jgroups.protocols.pbcast.GMS] [mossberg:32848 (additional data: 17 bytes)] received view <= current view; discarding it (current vid: [browning:32852 (additional data: 17 bytes)|8], new vid: [browning:32852 (additional data: 17 bytes)|4])
2007-02-03 13:09:52,051 ERROR [org.jgroups.protocols.pbcast.GMS] [mossberg:32848 (additional data: 17 bytes)] received view <= current view; discarding it (current vid: [browning:32852 (additional data: 17 bytes)|8], new vid: [browning:32852 (additional data: 17 bytes)|5])
2007-02-03 13:09:52,051 ERROR [org.jgroups.protocols.pbcast.GMS] [mossberg:32848 (additional data: 17 bytes)] received view <= current view; discarding it (current vid: [browning:32852 (additional data: 17 bytes)|8], new vid: [browning:32852 (additional data: 17 bytes)|6])
2007-02-03 13:09:52,052 ERROR [org.jgroups.protocols.pbcast.GMS] [mossberg:32848 (additional data: 17 bytes)] received view <= current view; discarding it (current vid: [browning:32852 (additional data: 17 bytes)|8], new vid: [browning:32852 (additional data: 17 bytes)|7])
2007-02-03 13:09:52,053 ERROR [org.jgroups.protocols.pbcast.GMS] [mossberg:32848 (additional data: 17 bytes)] received view <= current view; discarding it (current vid: [browning:32852 (additional data: 17 bytes)|8], new vid: [browning:32852 (additional data: 17 bytes)|8])

Followed by

2007-02-03 13:12:33,063 INFO [org.jboss.ha.framework.interfaces.HAPartition.DefaultPartition] New cluster view for partition DefaultPartition: 11 ([192.168.1.73:1099, 192.168.1.74:1099, 192.168.1.74:1099, 192.168.1.74:1099, 192.168.1.74:1099, 192.168.1.74:1099, 192.168.1.74:1099, 192.168.1.74:1099, 192.168.1.74:1099, 192.168.1.74:1099, 192.168.1.74:1099, 192.168.1.74:1099] delta: 1)
2007-02-03 13:12:33,063 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] I am (192.168.1.74:1099) received membershipChanged event:
2007-02-03 13:12:33,063 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] Dead members: 0 ([])
2007-02-03 13:12:33,063 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] New Members : 0 ([])
2007-02-03 13:12:33,063 INFO [org.jboss.ha.framework.server.DistributedReplicantManagerImpl.DefaultPartition] All Members : 12 ([192.168.1.73:1099, 192.168.1.74:1099, 192.168.1.74:1099, 192.168.1.74:1099, 192.168.1.74:1099, 192.168.1.74:1099, 192.168.1.74:1099, 192.168.1.74:1099, 192.168.1.74:1099, 192.168.1.74:1099, 192.168.1.74:1099, 192.168.1.74:1099])
2007-02-03 13:12:37,008 ERROR [org.jgroups.protocols.pbcast.GMS] [mossberg:32863 (additional data: 17 bytes)] received view <= current view; discarding it (current vid: [browning:32852 (additional data: 17 bytes)|11], new vid: [browning:32852 (additional data: 17 bytes)|1])
2007-02-03 13:12:37,009 ERROR [org.jgroups.protocols.pbcast.GMS] [mossberg:32863 (additional data: 17 bytes)] received view <= current view; discarding it (current vid: [browning:32852 (additional data: 17 bytes)|11], new vid: [browning:32852 (additional data: 17 bytes)|2])
2007-02-03 13:12:37,011 ERROR [org.jgroups.protocols.pbcast.GMS] [mossberg:32863 (additional data: 17 bytes)] received view <= current view; discarding it (current vid: [browning:32852 (additional data: 17 bytes)|11], new vid: [browning:32852 (additional data: 17 bytes)|3])
2007-02-03 13:12:37,011 ERROR [org.jgroups.protocols.pbcast.GMS] [mossberg:32863 (additional data: 17 bytes)] received view <= current view; discarding it (current vid: [browning:32852 (additional data: 17 bytes)|11], new vid: [browning:32852 (additional data: 17 bytes)|4])
2007-02-03 13:12:37,012 ERROR [org.jgroups.protocols.pbcast.GMS] [mossberg:32863 (additional data: 17 bytes)] received view <= current view; discarding it (current vid: [browning:32852 (additional data: 17 bytes)|11], new vid: [browning:32852 (additional data: 17 bytes)|5])
2007-02-03 13:12:37,012 ERROR [org.jgroups.protocols.pbcast.GMS] [mossberg:32863 (additional data: 17 bytes)] received view <= current view; discarding it (current vid: [browning:32852 (additional data: 17 bytes)|11], new vid: [browning:32852 (additional data: 17 bytes)|6])
2007-02-03 13:12:37,014 ERROR [org.jgroups.protocols.pbcast.GMS] [mossberg:32863 (additional data: 17 bytes)] received view <= current view; discarding it (current vid: [browning:32852 (additional data: 17 bytes)|11], new vid: [browning:32852 (additional data: 17 bytes)|7])
2007-02-03 13:12:37,014 ERROR [org.jgroups.protocols.pbcast.GMS] [mossberg:32863 (additional data: 17 bytes)] received view <= current view; discarding it (current vid: [browning:32852 (additional data: 17 bytes)|11], new vid: [browning:32852 (additional data: 17 bytes)|8])
2007-02-03 13:12:37,017 ERROR [org.jgroups.protocols.pbcast.GMS] [mossberg:32863 (additional data: 17 bytes)] received view <= current view; discarding it (current vid: [browning:32852 (additional data: 17 bytes)|11], new vid: [browning:32852 (additional data: 17 bytes)|9])
2007-02-03 13:12:37,018 ERROR [org.jgroups.protocols.pbcast.GMS] [mossberg:32863 (additional data: 17 bytes)] received view <= current view; discarding it (current vid: [browning:32852 (additional data: 17 bytes)|11], new vid: [browning:32852 (additional data: 17 bytes)|10])
2007-02-03 13:12:37,018 ERROR [org.jgroups.protocols.pbcast.GMS] [mossberg:32863 (additional data: 17 bytes)] received view <= current view; discarding it (current vid: [browning:32852 (additional data: 17 bytes)|11], new vid: [browning:32852 (additional data: 17 bytes)|11])

While there are only 2 physical server and only 1 JVM running on each server, the servers keep adding members to the cluster with the same IP/Port combination.

Here is my cluster-service.xml

<?xml version="1.0" encoding="UTF-8"?>

<!-- ===================================================================== -->
<!-- -->
<!-- Sample Clustering Service Configuration -->
<!-- -->
<!-- ===================================================================== -->

<server>

 <!-- ==================================================================== -->
 <!-- Cluster Partition: defines cluster -->
 <!-- ==================================================================== -->

 <mbean code="org.jboss.ha.framework.server.ClusterPartition"
 name="jboss:service=${jboss.partition.name:DefaultPartition}">

 <!-- Name of the partition being built -->
 <attribute name="PartitionName">${jboss.partition.name:DefaultPartition}</attribute>

 <!-- The address used to determine the node name -->
 <attribute name="NodeAddress">${jboss.bind.address}</attribute>

 <!-- Determine if deadlock detection is enabled -->
 <attribute name="DeadlockDetection">False</attribute>

 <!-- Max time (in ms) to wait for state transfer to complete. Increase for large states -->
 <attribute name="StateTransferTimeout">30000</attribute>

 <!-- The JGroups protocol configuration -->
 <attribute name="PartitionConfig">
 <!--
 The default UDP stack:
 - If you have a multihomed machine, set the UDP protocol's bind_addr attribute to the
 appropriate NIC IP address, e.g bind_addr="192.168.0.2".
 - On Windows machines, because of the media sense feature being broken with multicast
 (even after disabling media sense) set the UDP protocol's loopback attribute to true
 -->
 <Config>
 <UDP mcast_addr="${jboss.partition.udpGroup:228.1.69.1}" mcast_port="45566"
 ip_ttl="${jgroups.mcast.ip_ttl:8}" ip_mcast="true"
 mcast_recv_buf_size="2000000" mcast_send_buf_size="640000"
 ucast_recv_buf_size="2000000" ucast_send_buf_size="640000"
 loopback="false"/>
 <PING timeout="2000" num_initial_members="3"
 up_thread="true" down_thread="true"/>
 <MERGE2 min_interval="10000" max_interval="20000"/>
 <FD_SOCK down_thread="false" up_thread="false"/>
 <FD shun="true" up_thread="true" down_thread="true"
 timeout="10000" max_tries="5"/>
 <VERIFY_SUSPECT timeout="3000" num_msgs="3"
 up_thread="true" down_thread="true"/>
 <pbcast.NAKACK gc_lag="50" retransmit_timeout="300,600,1200,2400,4800"
 max_xmit_size="8192"
 up_thread="true" down_thread="true"/>
 <UNICAST timeout="300,600,1200,2400,4800" window_size="100" min_threshold="10"
 down_thread="true"/>
 <pbcast.STABLE desired_avg_gossip="20000" max_bytes="400000"
 up_thread="true" down_thread="true"/>
 <FRAG frag_size="8192"
 down_thread="true" up_thread="true"/>
 <pbcast.GMS join_timeout="5000" join_retry_timeout="2000"
 shun="true" print_local_addr="true"/>
 <pbcast.STATE_TRANSFER up_thread="true" down_thread="true"/>
 </Config>

 <!-- Alternate TCP stack: customize it for your environment, change bind_addr and initial_hosts -->
 <!--
 <Config>
 <TCP bind_addr="thishost" start_port="7800" loopback="true"
 recv_buf_size="2000000" send_buf_size="640000"
 tcp_nodelay="true" up_thread="false" down_thread="false"/>
 <TCPPING initial_hosts="thishost[7800],otherhost[7800]" port_range="3" timeout="3500"
 num_initial_members="3" up_thread="false" down_thread="false"/>
 <MERGE2 min_interval="5000" max_interval="10000"
 up_thread="false" down_thread="false"/>
 <FD_SOCK down_thread="false" up_thread="false"/>
 <FD shun="true" up_thread="false" down_thread="false"
 timeout="10000" max_tries="5"/>
 <VERIFY_SUSPECT timeout="1500" down_thread="false" up_thread="false" />
 <pbcast.NAKACK up_thread="false" down_thread="false" gc_lag="100"
 retransmit_timeout="300,600,1200,2400,4800"/>
 <pbcast.STABLE desired_avg_gossip="20000" max_bytes="400000"
 down_thread="false" up_thread="false" />
 <pbcast.GMS join_timeout="5000" join_retry_timeout="2000" shun="true"
 print_local_addr="true" up_thread="false" down_thread="false"/>
 <FC max_credits="2000000" down_thread="false" up_thread="false"
 min_threshold="0.10"/>
 <FRAG2 frag_size="60000" down_thread="false" up_thread="true"/>
 <pbcast.STATE_TRANSFER up_thread="false" down_thread="false"/>
 </Config>
 -->
 </attribute>
 <depends>jboss:service=Naming</depends>
 </mbean>

 <!-- ==================================================================== -->
 <!-- HA Session State Service for SFSB -->
 <!-- ==================================================================== -->

 <mbean code="org.jboss.ha.hasessionstate.server.HASessionStateService"
 name="jboss:service=HASessionState">
 <depends>jboss:service=Naming</depends>
 <!-- We now inject the partition into the HAJNDI service instead
 of requiring that the partition name be passed -->
 <depends optional-attribute-name="ClusterPartition"
 proxy-type="attribute">jboss:service=${jboss.partition.name:DefaultPartition}</depends>
 <!-- JNDI name under which the service is bound -->
 <attribute name="JndiName">/HASessionState/Default</attribute>
 <!-- Max delay before cleaning unreclaimed state.
 Defaults to 30*60*1000 => 30 minutes -->
 <attribute name="BeanCleaningDelay">0</attribute>
 </mbean>

 <!-- ==================================================================== -->
 <!-- HA JNDI -->
 <!-- ==================================================================== -->

 <mbean code="org.jboss.ha.jndi.HANamingService"
 name="jboss:service=HAJNDI">
 <!-- We now inject the partition into the HAJNDI service instead
 of requiring that the partition name be passed -->
 <depends optional-attribute-name="ClusterPartition"
 proxy-type="attribute">jboss:service=${jboss.partition.name:DefaultPartition}</depends>
 <!-- Bind address of bootstrap and HA-JNDI RMI endpoints -->
 <attribute name="BindAddress">${jboss.bind.address}</attribute>
 <!-- Port on which the HA-JNDI stub is made available -->
 <attribute name="Port">1100</attribute>
 <!-- RmiPort to be used by the HA-JNDI service once bound. 0 => auto. -->
 <attribute name="RmiPort">1101</attribute>
 <!-- Accept backlog of the bootstrap socket -->
 <attribute name="Backlog">50</attribute>
 <!-- The thread pool service used to control the bootstrap and
 auto discovery lookups -->
 <depends optional-attribute-name="LookupPool"
 proxy-type="attribute">jboss.system:service=ThreadPool</depends>

 <!-- A flag to disable the auto discovery via multicast -->
 <attribute name="DiscoveryDisabled">false</attribute>
 <!-- Set the auto-discovery bootstrap multicast bind address. If not
 specified and a BindAddress is specified, the BindAddress will be used. -->
 <attribute name="AutoDiscoveryBindAddress">${jboss.bind.address}</attribute>
 <!-- Multicast Address and group port used for auto-discovery -->
 <attribute name="AutoDiscoveryAddress">${jboss.partition.udpGroup:230.0.0.4}</attribute>
 <attribute name="AutoDiscoveryGroup">1102</attribute>
 <!-- The TTL (time-to-live) for autodiscovery IP multicast packets -->
 <attribute name="AutoDiscoveryTTL">16</attribute>
 <!-- The load balancing policy for HA-JNDI -->
 <attribute name="LoadBalancePolicy">org.jboss.ha.framework.interfaces.RoundRobin</attribute>

 <!-- Client socket factory to be used for client-server
 RMI invocations during JNDI queries
 <attribute name="ClientSocketFactory">custom</attribute>
 -->
 <!-- Server socket factory to be used for client-server
 RMI invocations during JNDI queries
 <attribute name="ServerSocketFactory">custom</attribute>
 -->
 </mbean>

 <mbean code="org.jboss.invocation.jrmp.server.JRMPInvokerHA"
 name="jboss:service=invoker,type=jrmpha">
 <attribute name="ServerAddress">${jboss.bind.address}</attribute>
 <attribute name="RMIObjectPort">4447</attribute>
 <!--
 <attribute name="RMIClientSocketFactory">custom</attribute>
 <attribute name="RMIServerSocketFactory">custom</attribute>
 -->
 <depends>jboss:service=Naming</depends>
 </mbean>

 <!-- the JRMPInvokerHA creates a thread per request. This implementation uses a pool of threads -->
 <mbean code="org.jboss.invocation.pooled.server.PooledInvokerHA"
 name="jboss:service=invoker,type=pooledha">
 <attribute name="NumAcceptThreads">1</attribute>
 <attribute name="MaxPoolSize">300</attribute>
 <attribute name="ClientMaxPoolSize">300</attribute>
 <attribute name="SocketTimeout">60000</attribute>
 <attribute name="ServerBindAddress">${jboss.bind.address}</attribute>
 <attribute name="ServerBindPort">4446</attribute>
 <attribute name="ClientConnectAddress">${jboss.bind.address}</attribute>
 <attribute name="ClientConnectPort">0</attribute>
 <attribute name="EnableTcpNoDelay">false</attribute>
 <depends optional-attribute-name="TransactionManagerService">jboss:service=TransactionManager</depends>
 <depends>jboss:service=Naming</depends>
 </mbean>

 <!-- ==================================================================== -->

 <!-- ==================================================================== -->
 <!-- Distributed cache invalidation -->
 <!-- ==================================================================== -->

 <mbean code="org.jboss.cache.invalidation.bridges.JGCacheInvalidationBridge"
 name="jboss.cache:service=InvalidationBridge,type=JavaGroups">
 <!-- We now inject the partition into the HAJNDI service instead
 of requiring that the partition name be passed -->
 <depends optional-attribute-name="ClusterPartition"
 proxy-type="attribute">jboss:service=${jboss.partition.name:DefaultPartition}</depends>
 <depends>jboss.cache:service=InvalidationManager</depends>
 <attribute name="InvalidationManager">jboss.cache:service=InvalidationManager</attribute>
 <attribute name="BridgeName">DefaultJGBridge</attribute>
 </mbean>

</server>

Any help is appreciated. Thank you!

1. Re: JBoss 4.0.5 Clustering Behavior

davewebb Feb 5, 2007 10:37 AM (in response to davewebb)

Solution is to add the bind_addr="" to the UDP protocol for the tc5 and jboss cluster service files. Issue is that there are 2 NICs on the servers. Once I added tha bind_addr everything works great.
Actions