9 Replies Latest reply on Feb 4, 2011 11:09 AM by gerbszt

    Limit on entries in Infinispan cache in cluster setup

    divyapt

      I put  2 million entries into the cache on one node .

       

      When i try to access those entries in another node in a cluster set up . i get this error :

       

      Exception in thread "main" org.infinispan.CacheException: Unable to invoke method public void org.infinispan.distribution.DistributionManagerImpl.waitForJoinToComplete() throws java.lang.Throwable on object

              at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:174)

              at org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:889)

              at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:687)

              at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:589)

              at org.infinispan.factories.ComponentRegistry.start(ComponentRegistry.java:150)

              at org.infinispan.CacheDelegate.start(CacheDelegate.java:317)

              at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:516)

              at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:439)

              at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:408)

              at InfiniSpanTest2.main(InfiniSpanTest2.java:19)

      Caused by: java.lang.reflect.InvocationTargetException

              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

              at java.lang.reflect.Method.invoke(Method.java:597)

              at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:171)

              ... 9 more

      Caused by: org.infinispan.CacheException: Unexpected exception

              at org.infinispan.distribution.JoinTask.performRehash(JoinTask.java:142)

              at org.infinispan.distribution.RehashTask.call(RehashTask.java:53)

              at org.infinispan.distribution.RehashTask.call(RehashTask.java:33)

              at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

              at java.util.concurrent.FutureTask.run(FutureTask.java:138)

              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

              at java.lang.Thread.run(Thread.java:619)

      Caused by: org.infinispan.CacheException: Remote (slave84-52573) failed unexpectedly

              at org.infinispan.remoting.transport.AbstractTransport.parseResponseAndAddToResponseList(AbstractTransport.java:74)

              at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:414)

              at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:101)

              at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:125)

              at org.infinispan.distribution.JoinTask.performRehash(JoinTask.java:113)

              ... 7 more

      Caused by: java.lang.OutOfMemoryError: Java heap space

              at org.infinispan.io.ExposedByteArrayOutputStream.write(ExposedByteArrayOutputStream.java:90)

              at org.jboss.marshalling.Marshalling$6.write(Marshalling.java:378)

              at org.jboss.marshalling.UTFUtils.writeUTFBytes(UTFUtils.java:134)

              at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:328)

              at org.jboss.marshalling.AbstractMarshaller.writeObject(AbstractMarshaller.java:423)

              at org.infinispan.container.entries.MortalCacheValue$Externalizer.writeObject(MortalCacheValue.java:100)

              at org.infinispan.marshall.jboss.ConstantObjectTable$ExternalizerAdapter.writeObject(ConstantObjectTable.java:322)

              at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:147)

              at org.jboss.marshalling.AbstractMarshaller.writeObject(AbstractMarshaller.java:423)

              at org.infinispan.marshall.MarshallUtil.marshallMap(MarshallUtil.java:59)

              at org.infinispan.marshall.exts.MapExternalizer.writeObject(MapExternalizer.java:61)

              at org.infinispan.marshall.jboss.ConstantObjectTable$ExternalizerAdapter.writeObject(ConstantObjectTable.java:322)

              at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:147)

              at org.jboss.marshalling.AbstractMarshaller.writeObject(AbstractMarshaller.java:423)

              at org.infinispan.remoting.responses.SuccessfulResponse$Externalizer.writeObject(SuccessfulResponse.java:59)

              at org.infinispan.marshall.jboss.ConstantObjectTable$ExternalizerAdapter.writeObject(ConstantObjectTable.java:322)

              at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:147)

              at org.jboss.marshalling.AbstractMarshaller.writeObject(AbstractMarshaller.java:423)

              at org.infinispan.marshall.jboss.GenericJBossMarshaller.objectToObjectStream(GenericJBossMarshaller.java:98)

              at org.infinispan.marshall.VersionAwareMarshaller.objectToBuffer(VersionAwareMarshaller.java:93)

              at org.infinispan.marshall.AbstractMarshaller.objectToBuffer(AbstractMarshaller.java:31)

              at org.infinispan.remoting.transport.jgroups.MarshallerAdapter.objectToBuffer(MarshallerAdapter.java:22)

              at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:595)

              at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:489)

              at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:365)

              at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:771)

              at org.jgroups.JChannel.up(JChannel.java:1465)

              at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:954)

              at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:430)

              at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:265)

              at org.jgroups.protocols.FRAG2.up(FRAG2.java:190)

              at org.jgroups.protocols.FlowControl.up(FlowControl.java:419)

       

       

      I get this error even after i increase the heapspace.

       

      Is there any limit on the number of entries in the cache? If not how do i overcome this error?

       

       

      Thanks,

      Divya

        • 1. Limit on entries in Infinispan cache in cluster setup
          vblagojevic

          Divya,

           

          At the moment rehashing of key/value pairs amongst the Infinispan nodes is done through RPC calls. As such RPC calls have to serialize entire payload for the rehash transfer. If you have 2 million entries and a single node, after another node joins it needs to serialize roughly half of those entries and transfer them to a joining nodes. As you can imagine this will likely cause OOM.

           

          The related issue is https://issues.jboss.org/browse/ISPN-284

           

          Regards,

          Vladimir

          • 2. Limit on entries in Infinispan cache in cluster setup
            gerbszt

            I've got similar problem, and after a bit of investigation, I've found that a rehash has not natural big hunger for a memory.

            The reason is a huge number of rentransmissions in UNICAST layer. In my case during a single rehash a UNICAST sent 300MB in 5k messages. Unfortunately it had to perform 14k retransmissions to complete reliable transfer.

            At the UDP layer it ended up with 1.2GB in 20k messages. if you've got 300MB in a single cache, which is not so much, you have to reserve another 1GB of free memory, to survive join of a new node. Worse yet, new node after broadcasting a JOIN_REHASH_END message, still receives retransmitted messages. JOIN_REHASH_END response is at the end of this message stream, and if the stream is too long, the whole task stops on the timeout and cache is left in FAILED state.

             

            Do you have any suggestions how to get rid of the retransmissions? Any special jgroups config? Or maybe rehash transfer messages shouldn't be marked with Message.NO_FC flag which bypasses UFC protocol?

            • 3. Re: Limit on entries in Infinispan cache in cluster setup
              an1310
              • 4. Re: Limit on entries in Infinispan cache in cluster setup
                gerbszt

                My configuration is very basic - a single distributed cache with hash numOwners=2 and L1 disabled. Most of the configuration parameters like an eviction strategy or a locking concurrent strategy don't have much impact on a rehash, which is part of a cache initialization sequence (please, have a look at org.infinispan.distribution.JoinTask class).

                 

                I've tried jgroups-udp.xml and jgroups-tcp.xml from the standard distribution. They both suffer from the same performance problem on rehash. One thing, I didn't test is removing retransmitting UNICAST protocol from the JGroups stack. Do you know, it could be safely done with TCP at the bottom of the stack? What are the potential risks?

                • 5. Re: Limit on entries in Infinispan cache in cluster setup
                  vblagojevic

                  Guys, this cannot be fixed with any configuration tweaking. Jacek, for now, do a slow and steady data filling of the cluster, if possible. Add some data, then add nodes, add some more data, then nodes and so on. This will be fixed soon.

                  • 6. Re: Limit on entries in Infinispan cache in cluster setup
                    gerbszt

                    What about rehash on leave, and another one on rejoin when the cluster is full and warm? Even a small network failure could fire up the rehash

                    with big data transfer.

                    It seems like rehash in the current version is not ready for the production. I'm waiting impatiently for a new one  

                    • 7. Limit on entries in Infinispan cache in cluster setup
                      vblagojevic

                      That depends on number of nodes. If you have lots data, one Infinispan node, then yes, adding another node will cause large data transfer potentialy causing OOM. If you gradually fill your cluster with data and have a lots Infinispan nodes you will never experience this problem. Subscribe to JIRA above and you'll get notified when this issue is fixed.

                      • 8. Re: Limit on entries in Infinispan cache in cluster setup
                        nadirx

                        I am being bitten by this as well. If all nodes are running everything is fine. If I stop the nodes, upgrade the application, and restart the nodes, the first one comes up ok, but the second one gets the OOM. All nodes are configured with 1GB heap (-Xmx1g), node 1 has about 15MB of data, while node 2 has 220MB of data.

                         

                        Is there anything I can do to avoid the problem before 5.1.x come out ?

                         

                        Tristan

                        • 9. Re: Limit on entries in Infinispan cache in cluster setup
                          gerbszt

                          If it's really the same problem, the only solution I've found is to disable rehash.

                          But before you do that, I suggest to register JGroups channel and all the protocols in JMX, in order to check how big was the "real" data transfer. Compare NumBytesSent and NumBytesReceived attributes in TCP/UDP MBean with the same attrs in UNICAST. If the former are substantially bigger than the latter and NumberOfRetransmissions >> 0, then that's it.