3 Replies Latest reply on Jul 11, 2011 6:24 AM by manik

    Distribution configuration problems

    monty-temboo

      I was able to get a simple test running with replication of a cache.  One node puts in some data, many other nodes connect, replicate, and can read the data.  When I switch to distributed mode, however, I'm seeing some problems.  I'm sure I'm missing something basic.

       

      My programs are run with -Djgroups.bind_addr=localhost and all nodes are running on the same machine.  I didn't see how I could put that specific piece into the jgroups config file, for me it only seemed to work if I pass it as a JVM argument.  But that's not my real question.

       

      Here's my config:

       

      <infinispan

            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

            xsi:schemaLocation="urn:infinispan:config:5.0 http://www.infinispan.org/schemas/infinispan-config-5.0.xsd"

            xmlns="urn:infinispan:config:5.0">

         <global>

            <globalJmxStatistics

                  enabled="true"

                  jmxDomain="org.infinispan"

                  cacheManagerName="SampleCacheManager"/>

            <transport

                  clusterName="infinispan-cluster"

                  machineId="m1"

                  rackId="r1" nodeName="Node-A">

               <properties>

                  <property name="configurationFile" value="jgroups.xml" />

               </properties>

            </transport>

         </global>

         <default>

            <locking

               isolationLevel="READ_COMMITTED"

               lockAcquisitionTimeout="20000"

               writeSkewCheck="false"

               concurrencyLevel="5000"

               useLockStriping="false"

            />

            <transaction

                  transactionManagerLookupClass="org.infinispan.transaction.lookup.JBossStandaloneJTAManagerLookup"

                  syncRollbackPhase="false"

                  syncCommitPhase="false"

                  useEagerLocking="true"

                  eagerLockSingleNode="false"

                  cacheStopTimeout="30000" />

            <jmxStatistics enabled="true"/>

            <clustering mode="distribution">

               <sync/>

               <hash

                  numOwners="2"

                  rehashWait="120000"

                  rehashRpcTimeout="600000"

                  rehashEnabled="true"

               />

               <l1

                  enabled="true"

                  lifespan="600000"

               />

            </clustering>

         </default>

      </infinispan>

       

       

      My cache has 10000 elements with 64 char strings as keys and 2K strings as values.

       

      When I'm running my test, I bring up 4 nodes that read a number of random things out of the cache every few seconds and report on successes or failures.  I have just one node that populates the cache, with a 20 ms delay between inserts.  Things seem to start out ok, but after a few minutes I see OutOfMemory errors which I really didn't expect.  Apparently something is going wrong during rebalancing.  This is the message from the node that populated the cache.  After the cache is populated it does nothing at all, just sits there acting as one of the distributed nodes but not modifying or reading the cache directly.

       

      11:27:52,180  INFO JGroupsTransport -- ISPN00094: Received new cluster view: [Node-A-9813|5] [Node-A-9813, Node-A-4857, Node-A-15412, Node-A-19100]

      11:28:07,732 ERROR UNICAST2 -- couldn't deliver OOB message [dst: Node-A-19100, src: Node-A-9813 (3 headers), size=60000 bytes, flags=OOB|DONT_BUNDLE|NO_FC]

       

      at org.jgroups.protocols.FRAG2.unfragment(FRAG2.java:296)
      at org.jgroups.protocols.FRAG2.up(FRAG2.java:170)
      at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)
      at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)
      at org.jgroups.protocols.pbcast.GMS.up(GMS.java:891)
      at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:246)
      at org.jgroups.protocols.UNICAST2.handleDataReceived(UNICAST2.java:671)
      at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:320)
      at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:703)
      at org.jgroups.protocols.BARRIER.up(BARRIER.java:119)
      at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:133)
      at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:177)
      at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:275)
      at org.jgroups.protocols.MERGE2.up(MERGE2.java:209)
      at org.jgroups.protocols.Discovery.up(Discovery.java:291)
      at org.jgroups.protocols.PING.up(PING.java:66)
      at org.jgroups.protocols.MPING.up(MPING.java:176)
      at org.jgroups.protocols.TP.passMessageUp(TP.java:1102)
      at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1658)
      at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1640)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      at java.lang.Thread.run(Thread.java:637)

      11:28:13,422 ERROR RebalanceTask -- ISPN00146: Error transferring state to node after rehash

      java.util.concurrent.ExecutionException: org.infinispan.CacheException: Problems invoking command.

      at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
      at java.util.concurrent.FutureTask.get(FutureTask.java:83)
      at org.infinispan.util.concurrent.AggregatingNotifyingFutureImpl.get(AggregatingNotifyingFutureImpl.java:74)
      at org.infinispan.distribution.RebalanceTask.performRehash(RebalanceTask.java:172)
      at org.infinispan.distribution.RehashTask.call(RehashTask.java:72)
      at org.infinispan.distribution.RehashTask.call(RehashTask.java:48)
      at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      at java.lang.Thread.run(Thread.java:637)

      Caused by: org.infinispan.CacheException: Problems invoking command.

      at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:149)
      at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:577)
      at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:488)
      at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:364)
      at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:770)
      at org.jgroups.JChannel.up(JChannel.java:1484)
      at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1074)
      at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:263)
      at org.jgroups.protocols.FRAG2.unfragment(FRAG2.java:310)
      at org.jgroups.protocols.FRAG2.up(FRAG2.java:170)
      at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)
      at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)
      at org.jgroups.protocols.pbcast.GMS.up(GMS.java:891)
      at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:246)
      at org.jgroups.protocols.UNICAST2.handleDataReceived(UNICAST2.java:671)
      at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:320)
      at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:703)
      at org.jgroups.protocols.BARRIER.up(BARRIER.java:119)
      at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:133)
      at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:177)
      at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:275)
      at org.jgroups.protocols.MERGE2.up(MERGE2.java:209)
      at org.jgroups.protocols.Discovery.up(Discovery.java:291)
      at org.jgroups.protocols.PING.up(PING.java:66)
      at org.jgroups.protocols.MPING.up(MPING.java:176)
      at org.jgroups.protocols.TP.passMessageUp(TP.java:1102)
      at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1658)
      at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1640)
      ... 3 more

      Caused by: java.lang.OutOfMemoryError: Java heap space

      at java.util.Arrays.copyOf(Arrays.java:2882)
      at java.lang.StringValue.from(StringValue.java:24)
      at java.lang.String.<init>(String.java:178)
      at java.lang.String.valueOf(String.java:2840)
      at org.jboss.marshalling.UTFUtils.readUTFBytes(UTFUtils.java:207)
      at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:288)
      at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209)
      at org.jboss.marshalling.AbstractObjectInput.readObject(AbstractObjectInput.java:37)
      at org.infinispan.container.entries.ImmortalCacheValue$Externalizer.readObject(ImmortalCacheValue.java:127)
      at org.infinispan.container.entries.ImmortalCacheValue$Externalizer.readObject(ImmortalCacheValue.java:119)
      at org.infinispan.marshall.jboss.ExternalizerTable$ExternalizerAdapter.readObject(ExternalizerTable.java:356)
      at org.infinispan.marshall.jboss.ExternalizerTable.readObject(ExternalizerTable.java:246)
      at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:351)
      at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209)
      at org.jboss.marshalling.AbstractObjectInput.readObject(AbstractObjectInput.java:37)
      at org.infinispan.marshall.MarshallUtil.unmarshallMap(MarshallUtil.java:66)
      at org.infinispan.marshall.exts.MapExternalizer.readObject(MapExternalizer.java:81)
      at org.infinispan.marshall.exts.MapExternalizer.readObject(MapExternalizer.java:47)
      at org.infinispan.marshall.jboss.ExternalizerTable$ExternalizerAdapter.readObject(ExternalizerTable.java:356)
      at org.infinispan.marshall.jboss.ExternalizerTable.readObject(ExternalizerTable.java:246)
      at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:351)
      at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209)
      at org.jboss.marshalling.AbstractObjectInput.readObject(AbstractObjectInput.java:37)
      at org.infinispan.marshall.exts.ReplicableCommandExternalizer.readObject(ReplicableCommandExternalizer.java:107)
      at org.infinispan.marshall.exts.ReplicableCommandExternalizer.readObject(ReplicableCommandExternalizer.java:71)
      at org.infinispan.marshall.jboss.ExternalizerTable$ExternalizerAdapter.readObject(ExternalizerTable.java:356)
      at org.infinispan.marshall.jboss.ExternalizerTable.readObject(ExternalizerTable.java:246)
      at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:351)
      at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209)
      at org.jboss.marshalling.AbstractObjectInput.readObject(AbstractObjectInput.java:37)
      at org.infinispan.marshall.jboss.GenericJBossMarshaller.objectFromObjectStream(GenericJBossMarshaller.java:192)
      at org.infinispan.marshall.VersionAwareMarshaller.objectFromByteBuffer(VersionAwareMarshaller.java:121)

       

       

       

      On one of the reading nodes I see this set of messages:

      11:28:24,803 ERROR Retransmitter -- failed retransmission task

      11:28:51,482 ERROR TimeScheduler2 -- failed executing task 148870-148903

      11:28:52,471 ERROR TimeScheduler2 -- task execution failed

      11:28:51,482  WARN UNICAST2 -- failed sending the message

      11:29:04,311 ERROR TCP -- failed sending message to Node-A-15412 (2180 bytes): java.lang.OutOfMemoryError: Java heap space, cause: null

      11:29:10,585 ERROR TCP -- failed sending message to Node-A-15412 (2180 bytes): java.lang.OutOfMemoryError: Java heap space, cause: null

      11:28:50,588 ERROR UNICAST2 -- couldn't deliver OOB message [dst: Node-A-47, src: Node-A-15412 (3 headers), size=92 bytes, flags=OOB|DONT_BUNDLE|NO_FC]

      11:28:46,061 ERROR TCP -- failed sending message to Node-A-15412 (2180 bytes): java.lang.OutOfMemoryError: Java heap space, cause: null

      11:28:41,900 ERROR TCP -- failure sending message to 127.0.0.1:7800: java.lang.OutOfMemoryError: Java heap space

      11:28:40,467 ERROR jgroups -- uncaught exception in Thread[ConnectionMap.Acceptor,null,null,5,ConnectionMap] (thread group=org.jgroups.util.Util$1[name=JGroups,maxpri=10] )

      11:28:40,467 ERROR TimeScheduler2 -- failed running task org.jgroups.protocols.FD_ALL$TimeoutChecker@c4703a8

      11:28:39,880 ERROR UNICAST2 -- couldn't deliver OOB message [dst: Node-A-47, src: Node-A-15412 (3 headers), size=92 bytes, flags=OOB|DONT_BUNDLE|NO_FC]

      11:28:37,500 ERROR TCP -- failed handling data from 127.0.0.1:7801

      11:29:30,570 ERROR TCP -- failed handling incoming message

      11:29:32,370 ERROR jgroups -- uncaught exception in Thread[OOB-63,infinispan-cluster,Node-A-47,5,Thread Pools] (thread group=org.jgroups.util.Util$1[name=JGroups,maxpri=10] )

      11:29:29,713 ERROR TimeScheduler2 -- failed executing task FixedIntervalTask: task=org.jgroups.protocols.FD_ALL$TimeoutChecker@c4703a8, cancelled=false

      java.lang.OutOfMemoryError: Java heap space

      11:29:24,206 ERROR TCP -- failed sending message to cluster (61 bytes): java.lang.OutOfMemoryError: Java heap space, cause: null

      11:29:20,866  WARN UNICAST2 -- failed sending the message

      java.lang.OutOfMemoryError: Java heap space

      11:29:17,914 ERROR TCP -- failed handling incoming message

      11:29:16,993 ERROR Retransmitter -- failed retransmission task

      java.lang.OutOfMemoryError: Java heap space

      11:29:15,068 ERROR Retransmitter -- failed retransmission task

      11:29:46,225 ERROR TimeScheduler2 -- failed executing task 148615-148761

      11:29:47,103 ERROR TimeScheduler2 -- task execution failed

      11:29:47,847 ERROR jgroups -- uncaught exception in Thread[Timer-6,infinispan-cluster,Node-A-47,5,JGroups] (thread group=org.jgroups.util.Util$1[name=JGroups,maxpri=10] )

      11:29:15,068  WARN UNICAST2 -- failed sending the message

      11:29:10,992 ERROR RpcManagerImpl -- ISPN00073: Unexpected error while replicating

      11:29:10,976  WARN UNICAST2 -- failed sending the message

      11:29:53,541 ERROR RequestCorrelator -- failed sending the response

      11:29:07,075 ERROR TCP -- failed sending message to Node-A-15412 (2180 bytes): java.lang.OutOfMemoryError: Java heap space, cause: null

      11:29:06,027 ERROR TP$TransferQueueBundler -- exception sending bundled msgs: java.lang.OutOfMemoryError: Java heap space:, cause: null

       

       

      Any ideas what might be causing the OutOfMemory errors? 

       

      Thanks,

       

      Monty