13 Replies Latest reply on Dec 28, 2011 3:08 PM by Prasanth Manchambhatla

    Need advice on state retrieval

    henners Newbie

      I have two nodes in a replication cluster..  I Notice that state retreival on restart causes problems when there is a few thousand entries to retrieve and when the cluster is being loaded with traffic ...  Cant aquire locks in the state provider and EOFExceptions in the state retreiver.

       

      If the cluster is not being sent any requsts from clients then the state retreival seems to go ok.

       

      Is there something I can configure to get around this ?  I dont mind if the client backs of from sending requests during the restart of the node .. but I dont know how to get it to do this ... or maybe I just miss some obvious point ?

       

      Its two hotrod servers in the cluster and I am using version 5.0.1

       

      Thanks

      Henners

        • 1. Re: Need advice on state retrieval
          Galder Zamarreño Master

          We've made a few substantial changes to how state transfer works, and also how Hot Rod servers are started in Infinispan 5.1. Would you mind trying with the latest Infinispan 5.1.0.BETA5 ?

          1 of 1 people found this helpful
          • 2. Re: Need advice on state retrieval
            henners Newbie

            Thanks for the quick response...  I will try 5.1.0  

            Any idea when the final version will be released ? 

            • 3. Re: Need advice on state retrieval
              Galder Zamarreño Master

              I've just released 5.1.0.CR1, so final will be... soon...

              • 4. Re: Need advice on state retrieval
                henners Newbie

                Hi

                 

                I'm trying the BETA5 I get the following exception in the first instance  when the second instance starts up ...

                thanks

                 

                 

                2011-12-07 14:35:21,186 DEBUG (notification-thread-3) [org.infinispan.transaction.TransactionTable] View changed, recalculating minViewId
                2011-12-07 14:35:21,273 DEBUG (ViewHandler,Infinispan-Cluster,osps039_1-53748) [org.jgroups.protocols.pbcast.STABLE] resuming message garbage collection
                2011-12-07 14:35:22,440 DEBUG (OOB-2,Infinispan-Cluster,osps039_1-53748) [org.infinispan.cacheviews.CacheViewsManagerImpl] myFirstCache: Node osps040_1-47570 is joining
                2011-12-07 14:35:22,448 DEBUG (CacheViewInstaller-1,osps039_1-53748) [org.infinispan.cacheviews.CacheViewsManagerImpl] Installing new view CacheView{viewId=2, members=[osps039_1-53748, osps040_1-47570]} for cache myFirstCache
                2011-12-07 14:35:22,785 ERROR (CacheViewInstaller-1,osps039_1-53748) [org.infinispan.cacheviews.CacheViewsManagerImpl] Failed to prepare view CacheView{viewId=2, members=[osps039_1-53748, osps040_1-47570]} for cache  myFirstCache, rolling back to view CacheView{viewId=1, members=[osps039_1-53748]}
                java.lang.IllegalStateException: Server address for osps040_1-47570 not present
                at org.infinispan.server.hotrod.ch.ServerHashSeed.getHashSeed(ServerHashSeed.scala:38)
                at org.infinispan.server.hotrod.ch.ServerHashSeed.getHashSeed(ServerHashSeed.scala:33)
                at org.infinispan.distribution.ch.AbstractWheelConsistentHash.setCaches(AbstractWheelConsistentHash.java:117)
                at org.infinispan.distribution.ch.TopologyAwareConsistentHash.setCaches(TopologyAwareConsistentHash.java:75)
                at org.infinispan.distribution.ch.ConsistentHashHelper.createConsistentHash(ConsistentHashHelper.java:117)
                at org.infinispan.statetransfer.ReplicatedStateTransferManagerImpl.createConsistentHash(ReplicatedStateTransferManagerImpl.java:56)
                at org.infinispan.statetransfer.BaseStateTransferManagerImpl.prepareView(BaseStateTransferManagerImpl.java:289)
                at org.infinispan.cacheviews.CacheViewsManagerImpl.handlePrepareView(CacheViewsManagerImpl.java:449)
                at org.infinispan.cacheviews.CacheViewsManagerImpl.clusterPrepareView(CacheViewsManagerImpl.java:296)
                at org.infinispan.cacheviews.CacheViewsManagerImpl.clusterInstallView(CacheViewsManagerImpl.java:244)
                at org.infinispan.cacheviews.CacheViewsManagerImpl$ViewInstallationTask.call(CacheViewsManagerImpl.java:815)
                at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
                at java.util.concurrent.FutureTask.run(FutureTask.java:138)
                at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                at java.lang.Thread.run(Thread.java:619)
                2011-12-07 14:35:22,807 DEBUG (CacheViewInstaller-1,osps039_1-53748) [org.infinispan

                • 5. Re: Need advice on state retrieval
                  henners Newbie

                  ahem but to qualify that :  When I use the out of the box startServer.sh its ok.. I get the above when I start the hotrod server from my java classMaybe more tricky to get right than it looks...and maybe pointless ?

                  • 6. Re: Need advice on state retrieval
                    henners Newbie

                    Hmmm well with 5.1.0 Beta 5 I get lots of problems when one of the nodes restarts -

                     

                    when the remote node goes down I see

                    java.lang.IllegalStateException: Server address for osps039_1-7374 not present
                    at org.infinispan.server.hotrod.ch.ServerHashSeed.getHashSeed(ServerHashSeed.scala:38)
                    at org.infinispan.server.hotrod.ch.ServerHashSeed.getHashSeed(ServerHashSeed.scala:33)
                    at org.infinispan.distribution.ch.AbstractWheelConsistentHash.setCaches(AbstractWheelConsistentHash.java:117)
                    at org.infinispan.distribution.ch.TopologyAwareConsistentHash.setCaches(TopologyAwareConsistentHash.java:75)
                    at org.infinispan.distribution.ch.ConsistentHashHelper.createConsistentHash(ConsistentHashHelper.java:117)
                    at org.infinispan.statetransfer.ReplicatedStateTransferManagerImpl.createConsistentHash(ReplicatedStateTransferManagerImpl.java:56)
                    at org.infinispan.statetransfer.BaseStateTransferManagerImpl.prepareView(BaseStateTransferManagerImpl.java:289)
                    at org.infinispan.cacheviews.CacheViewsManagerImpl.handlePrepareView(CacheViewsManagerImpl.java:449)
                    at org.infinispan.cacheviews.CacheViewsManagerImpl.clusterPrepareView(CacheViewsManagerImpl.java:296)
                    at org.infinispan.cacheviews.CacheViewsManagerImpl.clusterInstallView(CacheViewsManagerImpl.java:244)
                    at org.infinispan.cacheviews.CacheViewsManagerImpl$ViewInstallationTask.call(CacheViewsManagerImpl.java:815)
                    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
                    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                    at java.lang.Thread.run(Thread.java:619)
                    2011-12-07 15:41:03,173 DEBUG (CacheViewInstaller-1,osps039_1-7374) [org.infinispan.cacheviews.CacheViewsManagerImpl] myFirstCache: Rolling back to cache view 5, new view id is 7
                    2011-12-07 15:41:03,172 DEBUG (CacheViewInstaller-3,osps039_1-7374) [org.infinispan.statetransfer.StateTransferLockImpl] Unblocked write commands for cache view 6
                    2011-12-07 15:41:03,175 DEBUG (CacheViewInstaller-3,osps039_1-7374) [org.infinispan.statetransfer.BaseStateTransferTask] Node osps039_1-7374 completed state transfer for view 6 in 17 milliseconds!
                    2011-12-07 15:41:03,176 ERROR (CacheViewInstaller-1,osps039_1-7374) [org.infinispan.cacheviews.CacheViewsManagerImpl] ISPN000166: View installation failed for cache myFirstCache
                    java.lang.IllegalArgumentException: Cannot rollback to view 5, we are at view 3
                    at org.infinispan.statetransfer.BaseStateTransferManagerImpl.rollbackView(BaseStateTransferManagerImpl.java:320)
                    at org.infinispan.cacheviews.CacheViewsManagerImpl.handleRollbackView(CacheViewsManagerImpl.java:498)
                    at org.infinispan.cacheviews.CacheViewsManagerImpl.clusterRollbackView(CacheViewsManagerImpl.java:333)
                    at org.infinispan.cacheviews.CacheViewsManagerImpl.clusterInstallView(CacheViewsManagerImpl.java:261)
                    at org.infinispan.cacheviews.CacheViewsManagerImpl$ViewInstallationTask.call(CacheViewsManagerImpl.java:815)
                    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
                    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                    at java.lang.Thread.run(Thread.java:619)

                     

                     

                    Then after the remote node restarts

                     

                    Caused by: java.lang.IllegalStateException: Cannot prepare new view CacheView{viewId=12, members=[osps039_1-7374, osps040_1-38409]} on cache ___defaultcache, we are currently preparing view CacheView{viewId=8, members=[osps039_1-7374, osps040_1-38409]}
                    at org.infinispan.cacheviews.CacheViewInfo.prepareView(CacheViewInfo.java:98)
                    at org.infinispan.cacheviews.CacheViewsManagerImpl.handlePrepareView(CacheViewsManagerImpl.java:444)
                    at org.infinispan.commands.control.CacheViewControlCommand.perform(CacheViewControlCommand.java:127)
                    at org.infinispan.remoting.InboundInvocationHandlerImpl.handle(InboundInvocationHandlerImpl.java:136)
                    at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommand(CommandAwareRpcDispatcher.java:162)
                    at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:141)
                    at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:447)
                    at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:354)
                    at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:230)
                    at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:556)
                    at org.jgroups.JChannel.up(JChannel.java:716)
                    at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1026)
                    at org.jgroups.protocols.FRAG2.up(FRAG2.java:181)
                    at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)
                    at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)
                    at org.jgroups.protocols.pbcast.GMS.up(GMS.java:881)
                    at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:244)
                    at org.jgroups.protocols.UNICAST.up(UNICAST.java:332)
                    at org.jgroups.protocols.pbcast.NAKACK.handleMessage(NAKACK.java:700)
                    at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:561)
                    at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:140)
                    at org.jgroups.protocols.FD.up(FD.java:273)
                    at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:284)
                    at org.jgroups.protocols.MERGE2.up(MERGE2.java:205)
                    at org.jgroups.protocols.Discovery.up(Discovery.java:354)
                    at org.jgroups.protocols.MPING.up(MPING.java:179)
                    at org.jgroups.protocols.TP.passMessageUp(TP.java:1174)
                    at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1709)
                    at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1691)
                    ... 3 more

                     

                    • 7. Re: Need advice on state retrieval
                      Galder Zamarreño Master

                      That code has now changed the issue should not be present. Can you please try with 5.1.0.CR1?

                      • 8. Re: Need advice on state retrieval
                        henners Newbie

                        Yes it works

                         

                        But i get another issue on the client side... it seems the client cant cope with one of the cluster members going down  I get this

                         

                        java.lang.IllegalStateException: We should not reach here!
                        at org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation.execute(RetryOnFailureOperation.java:78)
                        at org.infinispan.client.hotrod.impl.RemoteCacheImpl.get(RemoteCacheImpl.java:333)

                         

                        When both the cluster members are back up I need to restart the client to be able to do more operations on the cache...

                        I'll investigate more incase I did not upgrade the client properly and start another dicsusison thread ...

                         

                        Thanks

                        • 9. Re: Need advice on state retrieval
                          henners Newbie

                          hmmm I spoke too soon.... it doesn't transfer any state.. my test case was lying to me.   After the node restarts every second get fails.. i assume because the client tries the servers in round robin ..

                          • 10. Re: Need advice on state retrieval
                            Galder Zamarreño Master

                            If the cache is replicated then yes, it will use round-robin. Have a look in the logs enabling TRACE on org.infinispan to see if you can spot why state transfer is not happening.

                            • 11. Re: Need advice on state retrieval
                              henners Newbie

                              In fact its working now ..I missed the stateRetrieval element in the clustering ... Im still getting problems with the client though.

                               

                              Thanks for all the advice

                              • 12. Re: Need advice on state retrieval
                                Galder Zamarreño Master

                                Hmm, can you provide a test case of your Hot Rod client? At least show some Hot Rod client code? TRACE log of org.infinispan category on server and client?.... etc