7 Replies Latest reply on Mar 9, 2016 9:54 AM by william.burns

    Infinispan 7 - Map/Reduce very slow and does not scale

    nsahattchiev

      Hi all,

       

      I have a question regarding the scalability of map/reduce.


      In our software we use Infinispan 7 with replicated cache and currently we see performance drop when we have more than one cache node started.

      Last week we do some performance tests. You can see the results below.

       

      Our setup:

      Hardware: 2 boxes with Intel Xeon 2.4 GHz and 8 cores each (16 cores in total) and 24 GB memory (48 GB in total)

      Software: 4 JVMs (Java 7) with 6 GB heap each and Infinispan 7.2.5.Final (2 JVMs per box)

      Cache: Infinispan replicated cache with 400'000 objects

      Jgroups with unicast communication

       

      Test 1:

      Only one Infinispan node started. Searching in the cache with 10 parallel users. Throughput 186 requests/min, average response time 3077 millis.

       

      Test 2:

      Two Infinispan nodes started - both on the same box. Searching in the cache with 10 parallel users. Throughput 102 requests/min, average response time 5602 millis.

       

      Test 3:

        All 4 Infinispan nodes started. Searching in the cache with 10 parallel users. Throughput 102 requests/min, average response time 5445 millis.

       

      Our test show, that Infinispan perform best when only one node is running. With 2 or more nodes we get only about 55% of the throughput, even we have more resources (CPU and heap).

      Can somebody explain me, what could be wrong with our setup? Because I can't imagine that Infinispan does not scale. 

       

      Thanks

      Nikolai

        • 1. Re: Infinispan 7 - Map/Reduce very slow and does not scale
          rvansa

          In every distributed system, there's a penalty associated with the need to use slow RPC communication (marshalling, network latency...), so it's quite expected that local-only test gets the best results. When you can fit all the data on single node.

           

          The MR algorithm assumes that there's heavy weight computation involved, so it rather distributes the task across nodes (to utilize more CPUs) than run whole computation locally. However in your case, the network overhead outweights this advantage. Just check your CPU usage - I guess it's somewhat lower when you use 2 physical machines. You can also try to increase the number of parallel users; once you saturate all CPUs, 2 machines setup should be faster. Actually you won't see much linear scaling in any test when going from 1 to 2 or 4 nodes; the scaling (almost linear) will be seen once you go from, say, 8 nodes up to hundreds.

           

          Since you have all data on each node in replicated caches, it would probably make sense to add a switch for this behaviour on repl caches, but in Infinispan 8 MR is deprecated (in favor of standard Java 8 Streams API which is even more capable and you can use the local-only version there) so I don't think a feature request would be fulfilled. So my recommendation is to switch to Java 8 JVM and latest Infinispan, and use Streams.

          1 of 1 people found this helpful
          • 2. Re: Infinispan 7 - Map/Reduce very slow and does not scale
            william.burns

            Hello Nikolai,

             

            I am sorry to hear that Map/Reduce is not scaling in your scenario.  Its performance matters greatly on the code it is executing.

             

            Recently while developing the new Distributed Streams for Infinispan 8, the feature Radim mentioned, I had found there are some scalability issues with Map/Reduce.  One of these is the fact when you have a large number of collected values that have distinct keys.  As an example in word count if you have a lot of unique words for it to count it can choke a bit.  My guess is maybe you are running into the same thing.

             

            Luckily in our internal testing, which I hope we will blog about in the near future, the new Streams feature is able to handle almost 5 orders of magnitude more unique words in this test before we had to increase the heap size.  This also includes it performing better as well.

             

            One option of Streams is that you can pass the Flag.CACHE_MODE_LOCAL to your AdvancedCache before creating a stream which will force it to run only on the local node.  Also related to your situation is that until [1] is integrated (guessing ISPN 8.2 Final) Replication always runs on the local node only.

             

            [1][ISPN-6306] Distributed Streams should also be used for Replication Cache - JBoss Issue Tracker

            1 of 1 people found this helpful
            • 3. Re: Infinispan 7 - Map/Reduce very slow and does not scale
              nsahattchiev

              Radim, William,

               

              thanks for your swift response and competent opinion!

               

              @Radim:

               

              I'm aware, that there is overhead, when a system is using RPC communication. We are using JGroups to distribute asynchronous jobs in a grid of JVMs, but the overhead in our case is about 5% and in Map/Reduce it is 45%. That's why I doubt, that only the RPC is the reason for the poor performance. I monitored also the GC behaviour during Map/Reduce and the GC time was 20% of the whole time, which is pretty much.

               

              About the scalability: I do not expect a linear scaling, when increase the Infinispan nodes from 2 to 4, but to have 2x more resources and zero performance improvement is quite disappointing.

               

              Additionally to the test with the replicated cache I did a test starting the cache in distributed mode, but unfortunately the behaviour was exactly the same - one node performed best and there was no performance gain with 4 nodes vs 2 nodes.

               

              @William:

               

              Yes, in our case all 400'000 object in the cache have distinct keys. Unfortunately we are not able to switch to Infinispan 8, because our software is running Java 7, but probably I can do some performance tests with the Distributed Streams, when 8.2.Final is released. Is there an estimate, when it will be available?   

               

              btw: I did the performance tests also with Infinispan 6.0.2.Final. The same behaviour as with 7.2.5.Final regarding the scalability, but the positive thing is, that Infinispan 7 is about 20% faster than 6. So I hope that the new Stream feature in Infinispan 8.2 would be even faster, because with the current performance Map/Reduce is not usable.

                 

              Our approach now is to replace all Map/Reduce tasks with Infinispan Queries, because they perform about 10x faster and the GC overhead is under 1%.

               

              Regards

              Nikolai

              • 4. Re: Infinispan 7 - Map/Reduce very slow and does not scale
                nadirx

                Infinispan 8.2.0.Final will be released next week.

                • 5. Re: Infinispan 7 - Map/Reduce very slow and does not scale
                  william.burns

                  Nikolai Sahattchiev wrote:

                   

                  Yes, in our case all 400'000 object in the cache have distinct keys. Unfortunately we are not able to switch to Infinispan 8, because our software is running Java 7, but probably I can do some performance tests with the Distributed Streams, when 8.2.Final is released.

                  I just wanted to clarify.  The scalability isn't tied to how many keys or entries you have in your cache, but rather how many unique keys are emitted to the Collector [1].

                   

                  Nikolai Sahattchiev wrote:

                    

                  Our approach now is to replace all Map/Reduce tasks with Infinispan Queries, because they perform about 10x faster and the GC overhead is under 1%.

                  When you say Infinispan Queries do you mean to use Indexed queries or you are using Indexless queries?  Depending on your use case Indexed Queries should be faster, if all you want to do is retrieve those values.  The power of Distributed Streams is when you want to do distributed computations.  Actually Indexless Queries internally use Distributed Streams; in Infinispan 7 it is using the first generation implementation of it, which is further improved in Infinispan 8.

                   

                  Either way if you can describe what you are doing with the data we can try to figure out what is the most performant way to do so.

                   

                  [1] Collector (Infinispan JavaDoc 8.2.0.CR1 API)

                  • 6. Re: Infinispan 7 - Map/Reduce very slow and does not scale
                    nsahattchiev
                    Either way if you can describe what you are doing with the data we can try to figure out what is the most performant way to do so.

                     

                    We have an AssetDocument object which have ‘code’, ‘isin’ and about 15 other fields. The use case is to find all documents which have certain ‘isin’. The ’isin’ field can appear in many documents, but the combination of 'code' and 'isin' is always unique. During our tests the emitted documents were not more than 5 (from 400’000).

                     

                    You can see the used Mapper and Reducer below:

                    private static class IsinMapper implements Mapper<String, AssetDocument, String, AssetDocument> {

                      private final String isin;


                      public
                    IsinMapper(final String pIsin) {
                         isin = pIsin;
                      }

                      @Override
                      public void
                    map(final String key, final AssetDocument doc, final Collector<String, AssetDocument> collector) {
                         if (isin.equals(doc.getISIN())) {
                      collector.emit(value.getCode(), doc);
                      }
                      }
                    }

                    private static class IsinReducer implements Reducer<String, AssetDocument> {
                      @Override
                       public AssetDocument reduce(final String key, final Iterator<AssetDocument> iterator) {
                      // there is always only one document returned by the iterator
                       return iterator.next();
                      }
                    }

                     

                    Theoretically executing the task on 4 nodes should be faster than on 2 nodes, but as you can see in my first post, there is no any performance gain.

                    • 7. Re: Infinispan 7 - Map/Reduce very slow and does not scale
                      william.burns

                      That is unfortunate that even with only 4 emitted values you are seeing this performance degradation.  I however believe that the new Distributed Streams should be much more performant for you.  By the way 8.2 was just released yesterday, you can get it at Download - Infinispan

                       

                      Btw you can replace all of the map reduce code with just the following with Distributed Streams:

                       

                      Cache<String, AssetDocument> cache = ...
                      String isin = ...
                      
                      List<AssetDocument>results = cache.values().parallelStream()
                         .filter((Serializable & Predicate<? super AssetDocument>) ad -> isin.equals(ad.getISIN())
                         .collect(CacheCollectors.serializableCollector(() -> Collectors.toList());
                      
                      

                       

                      However looking at your use case, I think it might be even better if you instead used an indexed query since you are pulling down very few entries and don't require compute grid capabilities.  You can read all about it at Infinispan User Guide.  Unfortunately it requires some additional configuration and jars but should give you even better performance as you don't have to iterate upon your entire data set.