6 Replies Latest reply on Oct 14, 2009 2:30 PM by triumphthepup

Buddy replication - initial state transfer after node restar

triumphthepup Oct 9, 2009 10:39 AM

We are seeing an issue that I'd like to confirm with others before I write it up as a jira issue.

JBoss 5.1.0.
JBoss Cache 3.2.1
Buddy Replication (two nodes, separate machines)

We're currently using Total Replication for HTTP Session replication but we're increasing our node count (to three for now) and we'd like to change from total, to buddy replication. After a significant amount of testing, buddy replication works very well, except in one case for us.

The problem:
Two servers, A and B.
- Start server A
- Log into web app, creating an HTTP Session, S1, on server A
- Start server B. When B starts (this first time), the session S1 is perfectly replicated from A to B into B's "_BUDDY_BACKUP_" for server A
- Now restart server B, upon starting the second time, the session S1 from server A is only partially replicated to the "_BUDDY_BACKUP_" tree on server B. In particular it appears that (at least) the "DistributableSessionMetadata" was not replicated (should have a key of "2" in the replicated cache entry)
- Shut down server A, when the user hits server B, JBoss will try to unserialize and use S1, however it will fail because some data is not present in S1, such as the session metadata.

Example session S1 that successfully replicated after the initial startup of server B

--- Cache1 ---
/ {}
 /_BUDDY_BACKUP_ {}
 /192.168.71.60_7600 {}
 /JSESSION {}
 /ROOT_localhost {}
 /TYqa9j30si-QlGaVL9OVvQ__ {0=16, 1=1255098672537, org.jboss.seam.security.rememberMe=org.jboss.ha.framework.server.SimpleCachableMarshalledValue{raw=nullserialized=true}, 2=org.jboss.web.tomcat.service.session.distributedcache.spi.DistributableSessionMetadata@c1e908b, org.jboss.seam.CONVERSATION#1$org.jboss.seam.persistence.persistenceContexts=org.jboss.ha.framework.server.SimpleCachableMarshalledValue{raw=nullserialized=true}, org.jboss.seam.core.conversationEntries=org.jboss.ha.framework.server.SimpleCachableMarshalledValue{raw=nullserialized=true}, org.jboss.seam.international.localeSelector=org.jboss.ha.framework.server.SimpleCachableMarshalledValue{raw=nullserialized=true}, org.jboss.seam.CONVERSATION#1$org.jboss.seam.core.conversation=org.jboss.ha.framework.server.SimpleCachableMarshalledValue{raw=nullserialized=true}, org.jboss.seam.CONVERSATION#1$org.jboss.seam.international.statusMessages=org.jboss.ha.framework.server.SimpleCachableMarshalledValue{raw=nullserialized=true}, org.jboss.seam.security.identity=org.jboss.ha.framework.server.SimpleCachableMarshalledValue{raw=nullserialized=true}, javax.faces.request.charset=UTF-8, org.jboss.seam.CONVERSATION#1$org.jboss.seam.faces.redirect=org.jboss.ha.framework.server.SimpleCachableMarshalledValue{raw=nullserialized=true}, org.jboss.seam.security.credentials=org.jboss.ha.framework.server.SimpleCachableMarshalledValue{raw=nullserialized=true}, pier=org.jboss.ha.framework.server.SimpleCachableMarshalledValue{raw=nullserialized=true}, org.jboss.seam.web.session=org.jboss.ha.framework.server.SimpleCachableMarshalledValue{raw=nullserialized=true}}
------------

Example session S1 that failed to replicate fully to B after restart (notice the missing "2=..." among many other things)

--- Cache1 ---
/ {}
 /_BUDDY_BACKUP_ {}
 /192.168.71.60_7600 {}
 /JSESSION {}
 /ROOT_localhost {}
 /TYqa9j30si-QlGaVL9OVvQ__ {0=153, 1=1255097288878, foo=org.jboss.ha.framework.server.SimpleCachableMarshalledValue{raw=nullserialized=true}}
------------

What appears to be happening is that during the initial startup of the secondary server, server A properly calls the "AssignToBuddyGroupCommand" on server B, passing the initial state. However on subsequent restarts of server B, that command is never executed from server A.

I believe the problem is that server A fails to recognize the server B was shutdown, and the BuddyManager never removes him as a buddy (at least not in the timespan of server B restarting). When I look at the BuddyManager on server A in the jmx-console, I can see that his buddy group is never updated when server B restarts. I believe the data that does make it to B after a restart is just from the standard "replicate commands" that occur when something changes in the session S1 on server A.

For example, if server B was restarted at 6:50am, server A's buddy information in JMX looks like this (last updated 6:40am, which implies he never processed B leaving)

BuddyGroup: (dataOwner: 192.168.71.60:7600, groupName: 192.168.71.60_7600, buddies: [192.168.71.62:7600],lastModified: Fri Oct 09 06:40:27 PDT 2009)

In conjunction with that, in the log output from server A, I see this when server B is restarting:

2009-10-09 06:50:52,477 DEBUG [org.jboss.cache.buddyreplication.BuddyManager] Nothing has changed; new buddy list is identical to the old one.

Here are my relevant configurations (let me know if I missed sections):

jboss-web.xml

 <replication-config>
 <replication-trigger>SET_AND_NON_PRIMITIVE_GET</replication-trigger>
 <replication-granularity>ATTRIBUTE</replication-granularity>
 </replication-config>

<bean name="StandardSessionCacheConfig" class="org.jboss.cache.config.Configuration">

 <!-- Provides batching functionality for caches that don't want to interact with regular JTA Transactions -->
 <property name="transactionManagerLookupClass">org.jboss.cache.transaction.BatchModeTransactionManagerLookup</property>

 <!-- Name of cluster. Needs to be the same for all members -->
 <property name="clusterName">${jboss.partition.name:DefaultPartition}-SessionCache</property>
 <!-- Use a UDP (multicast) based stack. Need JGroups flow control (FC)
 because we are using asynchronous replication. -->
 <property name="multiplexerStack">${jboss.default.jgroups.stack:tcp}</property>
 <property name="fetchInMemoryState">true</property>

 <property name="nodeLockingScheme">PESSIMISTIC</property>
 <property name="isolationLevel">REPEATABLE_READ</property>
 <property name="useLockStriping">false</property>
 <property name="cacheMode">REPL_ASYNC</property>

 <!-- Number of milliseconds to wait until all responses for a
 synchronous call have been received. Make this longer
 than lockAcquisitionTimeout.-->
 <property name="syncReplTimeout">17500</property>
 <!-- Max number of milliseconds to wait for a lock acquisition -->
 <property name="lockAcquisitionTimeout">15000</property>
 <!-- The max amount of time (in milliseconds) we wait until the
 state (ie. the contents of the cache) are retrieved from
 existing members at startup. -->
 <property name="stateRetrievalTimeout">60000</property>

 <!-- Not needed for a web session cache that doesn't use FIELD -->
 <property name="useRegionBasedMarshalling">false</property>
 <!-- Must match the value of "useRegionBasedMarshalling" -->
 <property name="inactiveOnStartup">false</property>

 <!-- Disable asynchronous RPC marshalling/sending -->
 <property name="serializationExecutorPoolSize">0</property>
 <!-- We have no asynchronous notification listeners -->
 <property name="listenerAsyncPoolSize">0</property>

 <property name="exposeManagementStatistics">true</property>

 <property name="buddyReplicationConfig">
 <bean class="org.jboss.cache.config.BuddyReplicationConfig">

 <!-- Just set to true to turn on buddy replication -->
 <property name="enabled">true</property>

 <!-- A way to specify a preferred replication group. We try
 and pick a buddy who shares the same pool name (falling
 back to other buddies if not available). -->
 <property name="buddyPoolName">default</property>

 <property name="buddyCommunicationTimeout">17500</property>

 <!-- Do not change these -->
 <property name="autoDataGravitation">false</property>
 <property name="dataGravitationRemoveOnFind">true</property>
 <property name="dataGravitationSearchBackupTrees">true</property>

 <property name="buddyLocatorConfig">
 <bean class="org.jboss.cache.buddyreplication.NextMemberBuddyLocatorConfig">
 <!-- The number of backup copies we maintain -->
 <property name="numBuddies">1</property>
 <!-- Means that each node will *try* to select a buddy on
 a different physical host. If not able to do so
 though, it will fall back to colocated nodes. -->
 <property name="ignoreColocatedBuddies">true</property>
 </bean>
 </property>
 </bean>
 </property>
 <property name="cacheLoaderConfig">
 <bean class="org.jboss.cache.config.CacheLoaderConfig">
 <!-- Do not change these -->
 <property name="passivation">true</property>
 <property name="shared">false</property>

 <property name="individualCacheLoaderConfigs">
 <list>
 <bean class="org.jboss.cache.loader.FileCacheLoaderConfig">
 <!-- Where passivated sessions are stored -->
 <property name="location">${jboss.server.data.dir}${/}session</property>
 <!-- Do not change these -->
 <property name="async">false</property>
 <property name="fetchPersistentState">true</property>
 <property name="purgeOnStartup">true</property>
 <property name="ignoreModifications">false</property>
 <property name="checkCharacterPortability">false</property>
 </bean>
 </list>
 </property>
 </bean>
 </property>
 </bean>

I have tried UDP / TCP, passivation / no passivation, and confirmed that things again work fine when using "total", not "buddy" replication.

Has anyone else seen this? Let me know if more information is needed.

1. Re: Buddy replication - initial state transfer after node re

brian.stansberry Oct 12, 2009 1:02 PM (in response to triumphthepup)

Thanks, Richard, for this thorough report. How are you restarting server B? Hard kill + restart, or normal shutdown plus restart? If it's a hard kill, have you edited the AS's JGroups configurations to remove FD_SOCK?

I doubt you have removed FD_SOCK; the above questions are just to remove one not-highly-likely possibility.

Manik, it occurs to me that a lot of our testing in this area involves multi-node clusters, not just two. The QE failover testing uses 4 nodes; my general impression of the unit test stuff is that tests tend to set up 3 or 4 nodes. So I'll try a simple two node unit test.
Actions

2. Re: Buddy replication - initial state transfer after node re

triumphthepup Oct 12, 2009 1:56 PM (in response to triumphthepup)

Hi Brian, thanks for getting back to me....

How are you restarting server B? Hard kill + restart, or normal shutdown plus restart?

I see the same behavior with either a graceful restart or kill -9.

If it's a hard kill, have you edited the AS's JGroups configurations to remove FD_SOCK?

Nope, haven't tweaked FD_SOCK at all. I'll paste my jgroups config at the bottom.

I'll try to reproduce this with three nodes.

Our tcp jgroups config (default settings)

<stack name="tcp"
 description="TCP based stack, with flow control and message bundling.
 TCP stacks are usually used when IP multicasting cannot
 be used in a network, e.g. because it is disabled (e.g.
 routers discard multicast)">
 <config>
 <TCP
 singleton_name="tcp"
 start_port="${jboss.jgroups.tcp.tcp_port:7600}"
 tcp_nodelay="true"
 loopback="false"
 recv_buf_size="20000000"
 send_buf_size="640000"
 discard_incompatible_packets="true"
 max_bundle_size="64000"
 max_bundle_timeout="30"
 use_incoming_packet_handler="true"
 enable_bundling="true"
 use_send_queues="false"
 sock_conn_timeout="300"
 skip_suspected_members="true"
 timer.num_threads="12"
 enable_diagnostics="${jboss.jgroups.enable_diagnostics:true}"
 diagnostics_addr="${jboss.jgroups.diagnostics_addr:224.0.0.75}"
 diagnostics_port="${jboss.jgroups.diagnostics_port:7500}"

 use_concurrent_stack="true"

 thread_pool.enabled="true"
 thread_pool.min_threads="20"
 thread_pool.max_threads="200"
 thread_pool.keep_alive_time="5000"
 thread_pool.queue_enabled="true"
 thread_pool.queue_max_size="1000"
 thread_pool.rejection_policy="discard"

 oob_thread_pool.enabled="true"
 oob_thread_pool.min_threads="1"
 oob_thread_pool.max_threads="20"
 oob_thread_pool.keep_alive_time="5000"
 oob_thread_pool.queue_enabled="false"
 oob_thread_pool.queue_max_size="100"
 oob_thread_pool.rejection_policy="run"/>
 <!-- Alternative 1: multicast-based automatic discovery. -->
 <MPING timeout="3000"
 num_initial_members="3"
 mcast_addr="${jboss.partition.udpGroup:230.11.11.11}"
 mcast_port="${jgroups.tcp.mping_mcast_port:45700}"
 ip_ttl="${jgroups.udp.ip_ttl:2}"/>
 <!-- Alternative 2: non multicast-based replacement for MPING. Requires a static configuration
 of *all* possible cluster members.
 <TCPPING timeout="3000"
 initial_hosts="${jgroups.tcpping.initial_hosts:localhost[7600],localhost[7601]}"
 port_range="1"
 num_initial_members="3"/>
 -->
 <MERGE2 max_interval="100000" min_interval="20000"/>
 <FD_SOCK/>
 <FD timeout="6000" max_tries="5" shun="true"/>
 <VERIFY_SUSPECT timeout="1500"/>
 <pbcast.NAKACK use_mcast_xmit="false" gc_lag="0"
 retransmit_timeout="300,600,1200,2400,4800"
 discard_delivered_msgs="true"/>
 <UNICAST timeout="300,600,1200,2400,3600"/>
 <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
 max_bytes="400000"/>
 <pbcast.GMS print_local_addr="true" join_timeout="3000"
 shun="true"
 view_bundling="true"
 view_ack_collection_timeout="5000"/>
 <FC max_credits="2000000" min_threshold="0.10"
 ignore_synchronous_response="true"/>
 <FRAG2 frag_size="60000"/>
 <!-- pbcast.STREAMING_STATE_TRANSFER/ -->
 <pbcast.STATE_TRANSFER/>
 <pbcast.FLUSH timeout="0"/>
 </config>
 </stack>

3. Re: Buddy replication - initial state transfer after node re

triumphthepup Oct 12, 2009 2:38 PM (in response to triumphthepup)

After some basic testing with 3 nodes, I don't immediately see the same issue. It seems to do a good job of reorganizing the buddy relationships any time a 3rd node comes online. It typically creates a nice ring where each node only has one buddy. Sounds like the issue might not exist when there are more than 2 nodes as a result of this reshuffle?

With the FD_SOCK, I assume that you're implying that server A never recognized that server B went away and came back, hence continued normal (delta) replication instead of calling "AssignToBuddyGroupCommand"? That would make sense.
Actions
4. Re: Buddy replication - initial state transfer after node re

brian.stansberry Oct 12, 2009 3:50 PM (in response to triumphthepup)

Yes, my line of thinking on FD_SOCK was as you stated. Very much a long shot. :-)

Thanks for testing with 3 nodes. This afternoon I'm setting up a basic unit test with two nodes; will report back here.
Actions
5. Re: Buddy replication - initial state transfer after node re

brian.stansberry Oct 12, 2009 6:33 PM (in response to triumphthepup)

Richard, this is most definitely a bug.

https://jira.jboss.org/jira/browse/JBCACHE-1549

Thanks again for the thorough report.
Actions
6. Re: Buddy replication - initial state transfer after node re

triumphthepup Oct 14, 2009 2:30 PM (in response to triumphthepup)

Thanks, we'll just hold off on moving to buddy replication until the 3.3 release. Total replication works well for us at this point.
Actions

Go to original post