11 Replies Latest reply on Jun 1, 2018 4:26 AM by sivarni2003

    Infinispan blocking state during iteration through cache

    tomas11

      Hi all

       

      We have spotted in our environment following blocking state in Infinispan. This leads to high CPU load and this load does not goes down.

       

      One of our threads has stopped and this is line from its thread dump

      at sun.misc.Unsafe.park(Native Method)

          - parking to wait for <68f06> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)

          at java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)

          at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(Unknown Source)

          at org.infinispan.stream.impl.DistributedCacheStream$IteratorSupplier.get(DistributedCacheStream.java:754)

          at org.infinispan.util.CloseableSuppliedIterator.getNext(CloseableSuppliedIterator.java:26)

          at org.infinispan.util.CloseableSuppliedIterator.hasNext(CloseableSuppliedIterator.java:32)

          at org.infinispan.stream.impl.RemovableIterator.getNextFromIterator(RemovableIterator.java:34)

          at org.infinispan.stream.impl.RemovableIterator.hasNext(RemovableIterator.java:43)

          at org.infinispan.commons.util.Closeables$IteratorAsCloseableIterator.hasNext(Closeables.java:93)

          at org.infinispan.stream.impl.RemovableIterator.getNextFromIterator(RemovableIterator.java:34)

          at org.infinispan.stream.impl.RemovableIterator.hasNext(RemovableIterator.java:43)

          at org.infinispan.commons.util.IteratorMapper.hasNext(IteratorMapper.java:26)

       

      So it seems that during iteration through cache cache.entrySet().iterator() Infinispan can get to some infinite loop? This thread stay in this state and not move.

      What is proper way to iterate through cache values?

      Is there always certainty that following code will end?

      final Iterator<> iterator = cache.entrySet().iterator();

      while (iterator.hasNext()) {

           iterator.next()

      }

       

      What could be reason for that?

       

      I read recently that JVM configuration and GC can influence behavior/stability of cluster generally. So I add here also our configuration we use in our environment:

      -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+UseTLAB -XX:NewSize=128m -XX:MaxNewSize=128m -XX:MaxTenuringThreshold=0 -XX:SurvivorRatio=1024 -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=60 -Xms300m -Xmx300m

       

      Is in configuration something suspicious?

       

      Thanks for help

      Tomas

        • 1. Re: Infinispan blocking state during iteration through cache
          william.burns

          The stack trace shown wouldn't consume CPU as it is in a WAITING state.  At this point it is waiting for a remote response from another node with data to be passed to the iterator.

           

          The code you posted, assuming you don't have an exception while iterating over the iterator should be fine.  However you should really enclose the iterator in a try finally block as the documentation states at [1] under Iterator use.  This way if there is an exception the iterator is closed making sure resources are returned properly.

           

          [1] Cache (Infinispan JavaDoc 8.2.2.Final API)

           

          Can you post the entire thread dump from the various nodes when it is in this state?

           

          Did you have any exceptions on any of the various nodes?

          • 2. Re: Infinispan blocking state during iteration through cache
            tomas11

            Currently I do not have thread dumps from that state. I can attached them if it will reproduce. It is 15 nodes environment.

             

            Also what is suspicious to me is that suddenly some of nodes starts to have bigger GC activity. Without seeing any specific reason for that. That is why I also posted our jvm configuration if this can somehow affect it.

             

            However I've attached snapshot of node where higher GC has occurred. From there it seams that SocketInputStream.read() and DatagramSocket.recieve() is consuming a lot. Is it this or something else? How can I find out what is behind this higher GC activity? It seems that it is some "background" staff happening. How I can find out what it is?

             

            Here an be see jmx statistics from 2 different nodes. On one seems GC fine on other happen quite often.

             

            jmx.png

                 Here is cpu sampling from the node where is high CPU load.

            hotspots.png

            • 3. Re: Infinispan blocking state during iteration through cache
              william.burns

              This gives a hint and may explain why your thread appears to be stuck waiting for more entries.  Basically your remote node doesn't seem to be feeding back results fast enough.  The iterator starts a request where a remote node (1 at a time) will send data back to the originator.  This is done in batch sizes (default value is the state transfer chunk size - 512).  Is it possible you have very large entries in your cache or you have configured chunk size to be quite large?  You can configure the batch size of the retrieval on a per request basis as well using the [1] method.

               

              Maybe you can tweak your call for

              cache.entrySet().iterator()
              

              to be

              cache.entrySet().stream().distributedBatchSize(64).iterator()
              

               

              This way it will use an iterator where it only retrieves 64 entries from a chunk at a time (technically it can do up to 2x the batch size to try to prefetch).  Is it possible you have multiple requests doing this as well?

               

              [1] CacheStream (Infinispan JavaDoc 9.0.0.Alpha2 API)

              • 4. Re: Infinispan blocking state during iteration through cache
                tomas11

                Thanks for the hint.

                 

                And how do you see this GC activity? Why it is running so often? What can influence this?

                 

                From node perspective it should be relatively "silent" so no active actions are done. But still there is lot of GC going on.

                 

                I also read that you suggest not to call cache.size() method too often and specially parallel on more nodes. What about simultaneous iterations over cache on more nodes in parallel? Is this a problem or not?

                • 5. Re: Infinispan blocking state during iteration through cache
                  william.burns

                  From my previous comment, my thought is that you are pulling down a lot of large objects in a given batch.  This is why I asked how large your objects are (that would help factor this in).  To be honest a heap of 312M seems a bit small to me in general   If you could provide a general idea of how large your objects are or how many you are storing in each node (counting backups) would be much appreciated.

                   

                  tomas11 wrote:

                   

                  I also read that you suggest not to call cache.size() method too often and specially parallel on more nodes. What about simultaneous iterations over cache on more nodes in parallel? Is this a problem or not?

                  If you are referring to [1] this shouldn't normally be an issue.  There is something causing an issue there and unfortunately I wasn't able to get enough information to ascertain why.  My guess re: size is that they had a cache loader that was causing a slowdown and thus it timed out.  Things like that are why I asked for more information regarding your usage; unfortunately a single thread dump with GC logs isn't enough tell you what is happening.  My best guess is that the objects retrieved in a single batch are too large causing a GC full run for each batch that comes in.

                   

                  I assume you saw no exceptions on the other nodes that could cause this.  Also can you provide the configuration for this cache, I am especially interested in whether or not you have a cache loader configured.

                   

                  To reiterate can you please provide the following:

                   

                  1. Please provide a general idea of how large your objects are or how many you are storing in each node (counting backups).
                  2. Confirm that you saw no exceptions on the various nodes.
                  3. Provide the configuration for this cache
                  4. If possible can you provide a thread dump while this is happening

                   

                  With that it should give me a better idea of what may be causing your issue.

                   

                  [1] Concurrent TimeoutException when running multi clustered server

                   

                  Side note: If you are able taking a heap dump while this was occurring should show you exactly what is using up the heap.

                  • 6. Re: Infinispan blocking state during iteration through cache
                    tomas11

                    Hi William

                     

                    We have some hw boundaries. This limit us in memory consumption - that's why there is only 300MB heap set.

                    Please provide a general idea of how large your objects are or how many you are storing in each node (counting backups).

                    Our objects size is average 1,6kB. Numbers can vary but lets say in one of our most overcrowded cache it could be 46 000.

                    Confirm that you saw no exceptions on the various nodes.

                    Yes I don't see direct exceptions to that. Here and then there are TimeoutExceptions but most probably not related to iteration. Usually connected to replication - and this is also not clear to me why they can happen. Memory leak with org.infinispan.remoting.transport.jgroups.CustomRequestCorrelator?

                    Provide the configuration for this cache

                    <distributed-cache name="services" owners="5" mode="ASYNC" deadlock-detection-spin="1000">

                         <store-as-binary keys="false" values="false"/>

                         <state-transfer enabled="true" timeout="2000000" />

                    </distributed-cache>

                    If possible can you provide a thread dump while this is happening

                    Unfortunately do not have thread dumps from other nodes. But to keep discussion going I am providing answers to other topics.

                     

                    Thanks

                    Tomas

                    • 7. Re: Infinispan blocking state during iteration through cache
                      tcgtam1

                      Is this problem resolved?

                      I have been R&D infinispan (the embedded Spring) recently using my local Eclipse (multiple node is NOT used). I encounter exactly the same symptom and tried all the tweaks in this thread but has no luck.

                       

                      It is clear that the problem is at the hasNext() call, during when multiple threads are accessing the same cache. I found that the problem occurs only when the cache is NOT local (say, not <local-cache>).

                       

                      Anyone can provide further help?

                      • 8. Re: Infinispan blocking state during iteration through cache
                        sivarni2003

                        The same issue is irritating me too. Our requirement is that in here we may use single or multiple nodes sharing same cache. So we keep the cache as distributed, so that we can add more nodes if we want, on the fly.

                         

                        But after running sometime, threads are blocked in iterating through cache in hasNext(). If we use the cache as local cache, this issue wasn't there.

                         

                        Any help will be appreciated as I am stuck in this problem for the past 2 weeks.  Thanks in advance

                        • 9. Re: Infinispan blocking state during iteration through cache
                          william.burns

                          I am pretty certain the symptoms that everyone was seeing was fixed in [ISPN-9080] Distributed cache stream should wait for new topology when segments aren't complete - JBoss Issue Tracker .

                           

                          But keep in mind this does not fix the real issue. The real issue is that a new topology was never installed, thus preventing streams from continuing work.

                           

                          Side note, but it was discovered that there are some environments that drop large UDP packets. We have found that setting the jgroups FRAG size to 8k has helped such environments, default was 60k.

                          • 10. Re: Infinispan blocking state during iteration through cache
                            sivarni2003

                            Thanks for the quick reply William Burns. As of now I can't immediately move to version 9.2. So, I will try with FRAG size and let you know the result.

                            • 11. Re: Infinispan blocking state during iteration through cache
                              sivarni2003

                              I tried with FRAG. But the same issue persists. I think we  must move to 9 version then. Thanks for the helpwilliam.burns.