Distribution configuration problems
monty-temboo Jun 22, 2011 2:55 PMI was able to get a simple test running with replication of a cache. One node puts in some data, many other nodes connect, replicate, and can read the data. When I switch to distributed mode, however, I'm seeing some problems. I'm sure I'm missing something basic.
My programs are run with -Djgroups.bind_addr=localhost and all nodes are running on the same machine. I didn't see how I could put that specific piece into the jgroups config file, for me it only seemed to work if I pass it as a JVM argument. But that's not my real question.
Here's my config:
<infinispan
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:infinispan:config:5.0 http://www.infinispan.org/schemas/infinispan-config-5.0.xsd"
xmlns="urn:infinispan:config:5.0">
<global>
<globalJmxStatistics
enabled="true"
jmxDomain="org.infinispan"
cacheManagerName="SampleCacheManager"/>
<transport
clusterName="infinispan-cluster"
machineId="m1"
rackId="r1" nodeName="Node-A">
<properties>
<property name="configurationFile" value="jgroups.xml" />
</properties>
</transport>
</global>
<default>
<locking
isolationLevel="READ_COMMITTED"
lockAcquisitionTimeout="20000"
writeSkewCheck="false"
concurrencyLevel="5000"
useLockStriping="false"
/>
<transaction
transactionManagerLookupClass="org.infinispan.transaction.lookup.JBossStandaloneJTAManagerLookup"
syncRollbackPhase="false"
syncCommitPhase="false"
useEagerLocking="true"
eagerLockSingleNode="false"
cacheStopTimeout="30000" />
<jmxStatistics enabled="true"/>
<clustering mode="distribution">
<sync/>
<hash
numOwners="2"
rehashWait="120000"
rehashRpcTimeout="600000"
rehashEnabled="true"
/>
<l1
enabled="true"
lifespan="600000"
/>
</clustering>
</default>
</infinispan>
My cache has 10000 elements with 64 char strings as keys and 2K strings as values.
When I'm running my test, I bring up 4 nodes that read a number of random things out of the cache every few seconds and report on successes or failures. I have just one node that populates the cache, with a 20 ms delay between inserts. Things seem to start out ok, but after a few minutes I see OutOfMemory errors which I really didn't expect. Apparently something is going wrong during rebalancing. This is the message from the node that populated the cache. After the cache is populated it does nothing at all, just sits there acting as one of the distributed nodes but not modifying or reading the cache directly.
11:27:52,180 INFO JGroupsTransport -- ISPN00094: Received new cluster view: [Node-A-9813|5] [Node-A-9813, Node-A-4857, Node-A-15412, Node-A-19100]
11:28:07,732 ERROR UNICAST2 -- couldn't deliver OOB message [dst: Node-A-19100, src: Node-A-9813 (3 headers), size=60000 bytes, flags=OOB|DONT_BUNDLE|NO_FC]
at org.jgroups.protocols.FRAG2.unfragment(FRAG2.java:296) | |
at org.jgroups.protocols.FRAG2.up(FRAG2.java:170) | |
at org.jgroups.protocols.FlowControl.up(FlowControl.java:418) | |
at org.jgroups.protocols.FlowControl.up(FlowControl.java:418) | |
at org.jgroups.protocols.pbcast.GMS.up(GMS.java:891) | |
at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:246) | |
at org.jgroups.protocols.UNICAST2.handleDataReceived(UNICAST2.java:671) | |
at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:320) | |
at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:703) | |
at org.jgroups.protocols.BARRIER.up(BARRIER.java:119) | |
at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:133) | |
at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:177) | |
at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:275) | |
at org.jgroups.protocols.MERGE2.up(MERGE2.java:209) | |
at org.jgroups.protocols.Discovery.up(Discovery.java:291) | |
at org.jgroups.protocols.PING.up(PING.java:66) | |
at org.jgroups.protocols.MPING.up(MPING.java:176) | |
at org.jgroups.protocols.TP.passMessageUp(TP.java:1102) | |
at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1658) | |
at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1640) | |
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) | |
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) | |
at java.lang.Thread.run(Thread.java:637) |
11:28:13,422 ERROR RebalanceTask -- ISPN00146: Error transferring state to node after rehash
java.util.concurrent.ExecutionException: org.infinispan.CacheException: Problems invoking command.
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) | |
at java.util.concurrent.FutureTask.get(FutureTask.java:83) | |
at org.infinispan.util.concurrent.AggregatingNotifyingFutureImpl.get(AggregatingNotifyingFutureImpl.java:74) | |
at org.infinispan.distribution.RebalanceTask.performRehash(RebalanceTask.java:172) | |
at org.infinispan.distribution.RehashTask.call(RehashTask.java:72) | |
at org.infinispan.distribution.RehashTask.call(RehashTask.java:48) | |
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) | |
at java.util.concurrent.FutureTask.run(FutureTask.java:138) | |
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) | |
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) | |
at java.lang.Thread.run(Thread.java:637) |
Caused by: org.infinispan.CacheException: Problems invoking command.
at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:149) | |
at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:577) | |
at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:488) | |
at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:364) | |
at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:770) | |
at org.jgroups.JChannel.up(JChannel.java:1484) | |
at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1074) | |
at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:263) | |
at org.jgroups.protocols.FRAG2.unfragment(FRAG2.java:310) | |
at org.jgroups.protocols.FRAG2.up(FRAG2.java:170) | |
at org.jgroups.protocols.FlowControl.up(FlowControl.java:418) | |
at org.jgroups.protocols.FlowControl.up(FlowControl.java:418) | |
at org.jgroups.protocols.pbcast.GMS.up(GMS.java:891) | |
at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:246) | |
at org.jgroups.protocols.UNICAST2.handleDataReceived(UNICAST2.java:671) | |
at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:320) | |
at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:703) | |
at org.jgroups.protocols.BARRIER.up(BARRIER.java:119) | |
at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:133) | |
at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:177) | |
at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:275) | |
at org.jgroups.protocols.MERGE2.up(MERGE2.java:209) | |
at org.jgroups.protocols.Discovery.up(Discovery.java:291) | |
at org.jgroups.protocols.PING.up(PING.java:66) | |
at org.jgroups.protocols.MPING.up(MPING.java:176) | |
at org.jgroups.protocols.TP.passMessageUp(TP.java:1102) | |
at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1658) | |
at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1640) | |
... 3 more |
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2882) | |
at java.lang.StringValue.from(StringValue.java:24) | |
at java.lang.String.<init>(String.java:178) | |
at java.lang.String.valueOf(String.java:2840) | |
at org.jboss.marshalling.UTFUtils.readUTFBytes(UTFUtils.java:207) | |
at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:288) | |
at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209) | |
at org.jboss.marshalling.AbstractObjectInput.readObject(AbstractObjectInput.java:37) | |
at org.infinispan.container.entries.ImmortalCacheValue$Externalizer.readObject(ImmortalCacheValue.java:127) | |
at org.infinispan.container.entries.ImmortalCacheValue$Externalizer.readObject(ImmortalCacheValue.java:119) | |
at org.infinispan.marshall.jboss.ExternalizerTable$ExternalizerAdapter.readObject(ExternalizerTable.java:356) | |
at org.infinispan.marshall.jboss.ExternalizerTable.readObject(ExternalizerTable.java:246) | |
at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:351) | |
at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209) | |
at org.jboss.marshalling.AbstractObjectInput.readObject(AbstractObjectInput.java:37) | |
at org.infinispan.marshall.MarshallUtil.unmarshallMap(MarshallUtil.java:66) | |
at org.infinispan.marshall.exts.MapExternalizer.readObject(MapExternalizer.java:81) | |
at org.infinispan.marshall.exts.MapExternalizer.readObject(MapExternalizer.java:47) | |
at org.infinispan.marshall.jboss.ExternalizerTable$ExternalizerAdapter.readObject(ExternalizerTable.java:356) | |
at org.infinispan.marshall.jboss.ExternalizerTable.readObject(ExternalizerTable.java:246) | |
at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:351) | |
at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209) | |
at org.jboss.marshalling.AbstractObjectInput.readObject(AbstractObjectInput.java:37) | |
at org.infinispan.marshall.exts.ReplicableCommandExternalizer.readObject(ReplicableCommandExternalizer.java:107) | |
at org.infinispan.marshall.exts.ReplicableCommandExternalizer.readObject(ReplicableCommandExternalizer.java:71) | |
at org.infinispan.marshall.jboss.ExternalizerTable$ExternalizerAdapter.readObject(ExternalizerTable.java:356) | |
at org.infinispan.marshall.jboss.ExternalizerTable.readObject(ExternalizerTable.java:246) | |
at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:351) | |
at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209) | |
at org.jboss.marshalling.AbstractObjectInput.readObject(AbstractObjectInput.java:37) | |
at org.infinispan.marshall.jboss.GenericJBossMarshaller.objectFromObjectStream(GenericJBossMarshaller.java:192) | |
at org.infinispan.marshall.VersionAwareMarshaller.objectFromByteBuffer(VersionAwareMarshaller.java:121) |
On one of the reading nodes I see this set of messages:
11:28:24,803 ERROR Retransmitter -- failed retransmission task
11:28:51,482 ERROR TimeScheduler2 -- failed executing task 148870-148903
11:28:52,471 ERROR TimeScheduler2 -- task execution failed
11:28:51,482 WARN UNICAST2 -- failed sending the message
11:29:04,311 ERROR TCP -- failed sending message to Node-A-15412 (2180 bytes): java.lang.OutOfMemoryError: Java heap space, cause: null
11:29:10,585 ERROR TCP -- failed sending message to Node-A-15412 (2180 bytes): java.lang.OutOfMemoryError: Java heap space, cause: null
11:28:50,588 ERROR UNICAST2 -- couldn't deliver OOB message [dst: Node-A-47, src: Node-A-15412 (3 headers), size=92 bytes, flags=OOB|DONT_BUNDLE|NO_FC]
11:28:46,061 ERROR TCP -- failed sending message to Node-A-15412 (2180 bytes): java.lang.OutOfMemoryError: Java heap space, cause: null
11:28:41,900 ERROR TCP -- failure sending message to 127.0.0.1:7800: java.lang.OutOfMemoryError: Java heap space
11:28:40,467 ERROR jgroups -- uncaught exception in Thread[ConnectionMap.Acceptor,null,null,5,ConnectionMap] (thread group=org.jgroups.util.Util$1[name=JGroups,maxpri=10] )
11:28:40,467 ERROR TimeScheduler2 -- failed running task org.jgroups.protocols.FD_ALL$TimeoutChecker@c4703a8
11:28:39,880 ERROR UNICAST2 -- couldn't deliver OOB message [dst: Node-A-47, src: Node-A-15412 (3 headers), size=92 bytes, flags=OOB|DONT_BUNDLE|NO_FC]
11:28:37,500 ERROR TCP -- failed handling data from 127.0.0.1:7801
11:29:30,570 ERROR TCP -- failed handling incoming message
11:29:32,370 ERROR jgroups -- uncaught exception in Thread[OOB-63,infinispan-cluster,Node-A-47,5,Thread Pools] (thread group=org.jgroups.util.Util$1[name=JGroups,maxpri=10] )
11:29:29,713 ERROR TimeScheduler2 -- failed executing task FixedIntervalTask: task=org.jgroups.protocols.FD_ALL$TimeoutChecker@c4703a8, cancelled=false
java.lang.OutOfMemoryError: Java heap space
11:29:24,206 ERROR TCP -- failed sending message to cluster (61 bytes): java.lang.OutOfMemoryError: Java heap space, cause: null
11:29:20,866 WARN UNICAST2 -- failed sending the message
java.lang.OutOfMemoryError: Java heap space
11:29:17,914 ERROR TCP -- failed handling incoming message
11:29:16,993 ERROR Retransmitter -- failed retransmission task
java.lang.OutOfMemoryError: Java heap space
11:29:15,068 ERROR Retransmitter -- failed retransmission task
11:29:46,225 ERROR TimeScheduler2 -- failed executing task 148615-148761
11:29:47,103 ERROR TimeScheduler2 -- task execution failed
11:29:47,847 ERROR jgroups -- uncaught exception in Thread[Timer-6,infinispan-cluster,Node-A-47,5,JGroups] (thread group=org.jgroups.util.Util$1[name=JGroups,maxpri=10] )
11:29:15,068 WARN UNICAST2 -- failed sending the message
11:29:10,992 ERROR RpcManagerImpl -- ISPN00073: Unexpected error while replicating
11:29:10,976 WARN UNICAST2 -- failed sending the message
11:29:53,541 ERROR RequestCorrelator -- failed sending the response
11:29:07,075 ERROR TCP -- failed sending message to Node-A-15412 (2180 bytes): java.lang.OutOfMemoryError: Java heap space, cause: null
11:29:06,027 ERROR TP$TransferQueueBundler -- exception sending bundled msgs: java.lang.OutOfMemoryError: Java heap space:, cause: null
Any ideas what might be causing the OutOfMemory errors?
Thanks,
Monty