Cross Data Center Replication Issues with JDG 6.2
vbchin2 Feb 28, 2014 11:34 AMI have been trying to setup a demo of Cross Data Center Replication using JDG 6.2 and so far haven't been able to get it to work properly. I need help to understand what could be fixed to get the testing steps listed below work correctly and successfully.
SETUP (please refer to the attachments)
- Single VM (Mac OS X) (with IP address configured to be 192.168.1.5 for testing purposes)
- Two sites: site-1 and site-2 and a cluster (of 3 JDG 6.2 nodes), one for each site
- One distributed cache labCache configured to be backed-up in under its own cache-container: xsite
- All the JDG nodes are launched with a difference of 100 for port-offset. So, node-1 is given a port offset of 100, node-2 200, and so on with node-6 a value of 600
- Each node is given a copy of JDG 6.2 standalone folder. So node-1 runs runs in folder standalone1 and so on with node-6 runs in standalone6
- Nodes under site-1 cluster use a multicast port of 239.1.1.1 and nodes under site-2 use 239.2.2.2
- With different multicast port settings for two clusters, the discovery of one cluster from other happens via MPING using address: 234.99.54.14 and port: 12000
TESTING STEPS
- Bring both sites up and ensure clustering of nodes under each site
- Use the Infinispan CLI to push the data of 100 entries into the cache hosted by node-1. Ensure distribution across cluster and back up across the site
./ispn-cli.sh -c remoting://192.168.1.5:10099/xsite/labCache -f input-data-site-1.txt
- Use the Infinispan CLI again to log in into the same node and this time just issue the clear command and ensure the erasure of entries for all the caches
- Repeat step #2
FAILURE
The failure happened at step #4 where repopulating the cache with same information just dragged on for several minutes with the node 1 eventually timing out with CLI and with various warnings and exceptions as shown below
WARNINGS AND EXCEPTIONS
Following are the various warning and exceptions that are found repeatedly:
standalone1.log:
19:54:51,027 ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (remote-thread-18) ISPN000136: Execution error: org.infinispan.util.concurrent.TimeoutException: Replication timeout for jdg-2/site-1
standalone4.log:
19:54:02,268 ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (remote-thread-2) ISPN000136: Execution error: org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [30 seconds] on key [0] for requestor [Thread[remote-thread-2,5,main]]! Lock held by [Thread[remote-thread-0,5,main]]
standalone6.log:
19:54:45,728 ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (OOB-7,shared=relay) ISPN000136: Execution error: org.infinispan.util.concurrent.TimeoutException: Replication timeout for jdg-4/site-2
standalone6.log:
19:54:45,765 WARN [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (OOB-3,shared=relay) ISPN000071: Caught exception when handling command SingleRpcCommand{cacheName='labCache', command=ClearCommand{flags=[IGNORE_RETURN_VALUES, SKIP_XSITE_BACKUP]}}: org.infinispan.util.concurrent.TimeoutException: Replication timeout for jdg-4/site-2
standalone6.log:
19:54:02,339 WARN [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (remote-thread-8) ISPN000071: Caught exception when handling command SingleRpcCommand{cacheName='labCache', command=ClearCommand{flags=[IGNORE_RETURN_VALUES, SKIP_XSITE_BACKUP]}}: org.infinispan.util.concurrent.TimeoutException: Unable to acquire lock after [30 seconds] on key [2] for requestor [Thread[remote-thread-8,5,main]]! Lock held by [Thread[remote-thread-3,5,main]]
standalone5.log:
19:54:53,532 WARN [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (OOB-187,shared=tcp) ISPN000071: Caught exception when handling command SingleRpcCommand{cacheName='labCache', command=PutKeyValueCommand{key=0, value=Some data, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}}: org.infinispan.util.concurrent.TimeoutException: Node jdg-4/site-2 timed out
-
site-1.xml 15.0 KB
-
site-2.xml 15.0 KB
-
start-jdg-cluster.sh 1.4 KB
-
logs.zip 81.5 KB
-
input-data-site-1.txt.zip 385 bytes