9 Replies Latest reply on Dec 12, 2012 1:38 PM by sannegrinovero

    Mass-indexer and MapReduce issues

    aolean

      Hi all,

       

      I'm currently testing Infinispan to see if I could use it on a small distributed grid to perform basic queries on approximately 20 millions entries.

       

      I'm running infinispan 5.2 Beta5 on fedora 16 using oracle java 1.7 64Bits, I'm only using one node (my own PC) for the moment but I configured the node as distributed. The index I use is stored in ram for the moment.

      I included in the classpath all jars from the "lib" directory, the infinispan core jar, and also the 2 jars related to query and lucene.

       

      The 2 problems I face so far :

      • It seem that the property
        <property name="hibernate.search.default.indexing_strategy" value="manual" />
        is not taken into account in the index configuration. I would like to define this property so that I can insert all the values by batch without indexing (later using a cache loader, for the moment I'm just inserting dummy values in the cache), then load a mass-indexer once all the values are loaded in the cache.
        Is it the proper way to do it ?
      • When launching the mass-indexer using
        searchManager.getMassIndexer().start();
        I got the following error :
        java.lang.NoClassDefFoundError: org/infinispan/cdi/InfinispanExtension
        I get the same error when trying to use the MapReduce feature directly. I don't have this problem in 5.1 version.
        Do I need to include more jars than the ones I already mentioned ?

       

      Thanks !

        • 1. Re: Mass-indexer and MapReduce issues
          vblagojevic

          Yeah, as far as 2) goes you need infinispan-cdi-versionxyz.jar on your classpath, maybe just put the lib directory on your classpath! For question 1 I'll ping my collegaue to help you!

          Cheers,

          Vladimir

          • 2. Re: Mass-indexer and MapReduce issues
            sannegrinovero
            • It seem that the property
              <property name="hibernate.search.default.indexing_strategy" value="manual" />
              is not taken into account in the index configuration. I would like to define this property so that I can insert all the values by batch without indexing (later using a cache loader, for the moment I'm just inserting dummy values in the cache), then load a mass-indexer once all the values are loaded in the cache.
              Is it the proper way to do it ?

            Hi, good point I opened https://issues.jboss.org/browse/ISPN-2610 to improve on that. (see also the issue descriptions for how to achieve the same today)

             

            Note that the MassIndexer is not necessarily more efficient than on-the-fly indexing, it depends from your data and the tuning applied to the IndexWriter.

            You might prefer to have asynchronous indexing enabled so that the backend won't slow you down too much (it will still block if the async buffer gets filled).

            • 3. Re: Mass-indexer and MapReduce issues
              aolean

              Hi Vladimir,

               

              Thanks for your help on this !

               

              The problem is that I already included cdi, I have those 2 jars in my classpath :

              - infinispan-cdi.jar (from the cdi module)

              - infinispan-cdi-5.2.0.Beta5.jar from lib

               

              I checked : InfinispanExtension is indeed correctly included in the infinispan-cdi-5.2.0.Beta5.jar with the correct path.

               

              I allso defined a new variable of type InfinispanExtension in my code :

              org.infinispan.cdi.InfinispanExtension dummy = null;

              SearchManager searchManager = org.infinispan.query.Search.getSearchManager(c1);

              searchManager.getMassIndexer().start();

               

              The code is compiling succesfully so InfinispanExtension is visible and correctly included in my classpath, but still it fails when launching the mass indexer (last line in previous code extract) with the following stack :

               

              Exception in thread "main" org.infinispan.CacheException: java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: org/infinispan/cdi/InfinispanExtension

                        at org.infinispan.distexec.mapreduce.MapReduceTask.execute(MapReduceTask.java:352)

                        at org.infinispan.query.impl.massindex.MapReduceMassIndexer.start(MapReduceMassIndexer.java:43)

                        at net.sebhome.ion.test.PerfTest1.main(PerfTest1.java:66)

               

              Would you have any idea how it is possible ? Is MapReduce module using a custom class loader ?

               

              Thanks

              • 4. Re: Mass-indexer and MapReduce issues
                aolean

                Hi Sanne,

                 

                Thanks for your explanation, in that case maybe I will try to do some benchmarking before going with the massIndex solution directly.

                 

                I took a look at the workaround proposed in jira, just wondering if the SKIP flag is also taking effect during the mass index ? It would be unfortunate

                For the moment, I'm just using a custom EntityIndexingInterceptor to deactivate the indexing depending on the value of an external variable : works fine so far.

                 

                Thanks

                • 5. Re: Mass-indexer and MapReduce issues
                  vblagojevic

                  You definitely do not need to use InfinispanExtension in your code. This is internal class never intended to be used by clients. It is the mechanism Infinispan uses to hook into CDI runtime. No custom classloading is used. Make sure that you downloaded official Infinispan distribution from the download page; include infinispan jar and lib in your classpath and run your example again.

                   

                  Regards,

                  Vladimir

                  • 6. Re: Mass-indexer and MapReduce issues
                    sannegrinovero

                    I took a look at the workaround proposed in jira, just wondering if the SKIP flag is also taking effect during the mass index ? It would be unfortunate

                     

                    No the flags only affect the currently executing command, they are "forgotten" after that. On indexing performance: I have just been playing with it this weekend with some new performance tests and found some new bottlenecks which need to be fixed before we tag Final. It might be an unfortunate moment to try performance, please monitor https://issues.jboss.org/browse/ISPN-2613

                    • 7. Re: Mass-indexer and MapReduce issues
                      aolean

                      @Sanne

                      Ok understood thanks for the update. No pb for performance, but on Jira the index engine referenced for the problem is NRT : is it the underlying one used by infinispan by default ? Because hibernate search doc refers to this feature as "extreme low-latency writes as a tradeoff of non-clustered and non-shared index". So did you rewrite this NRT functionality to be "infinispan compliant", like putting the buffers in the cache, ... ?

                       

                      @Vladimir

                      Yes I know that InfinispanExtension is an internal class, but I use currently eclipse's auto-deployment to test infinispan, so my point was just to show that if I have no error at compile time when defining a dummy value with this class, it is also necessarily available at runtime because of eclipse. In fact, when forgetting about additional modules and just putting the jars from the lib directory in the classpath, I observed that I can make it work only when removing the infinispan-cdi-version.jar from the classpath. So ... I don't know. Tried several things (I scanned all the jars to see if InfinispanExtension was included in another jar, displayed the hierarchy of class loaders to see if one of them could have hidden the class from a child, re-downloaded the software, re-deployed in eclipse, ...). Nothing.

                      Anyway, now I just don't include this jar and it's working.

                       

                      Thanks

                      • 8. Re: Mass-indexer and MapReduce issues
                        vblagojevic

                        Eclipse + Infinispan test + CDI runtime = exercise in self flagellation. Do not do it!

                        • 9. Re: Mass-indexer and MapReduce issues
                          sannegrinovero

                          @Sanne

                          Ok understood thanks for the update. No pb for performance, but on Jira the index engine referenced for the problem is NRT : is it the underlying one used by infinispan by default ? Because hibernate search doc refers to this feature as "extreme low-latency writes as a tradeoff of non-clustered and non-shared index". So did you rewrite this NRT functionality to be "infinispan compliant", like putting the buffers in the cache, ... ?

                          No I didn't do anything special, but there are still cases in which NRT is ok to be used with Infinispan:

                          1. you have a single node
                          2. each node has it's own index (so indexes are not shared)
                          3. it's ok for updates to be visibile to other nodes only after a (longer) while