12 Replies Latest reply on May 31, 2013 8:48 AM by geensb

    ModeShape + InfiniSpan cluster: indexes are not updated

    geensb

      We are trying to set up a clustered ModeShape with a replicated InfiniSpan as cache in JBoss 5 on MS Windows using SQL Server. (ModeShape 3.2.0, InfiniSpan 5.2.5, JGroups 3.2.7, SQL Server 2008, JBoss is not clustered).

       

      The problem we are facing is that when data is modified on the first machine the index on the second machine is not updated. The changes however can be found in InfiniSpan so our InfiniSpan configuration would appear to be correct (even so I've attached it for completeness).

       

      A bunch of these messages do appear in the second machine's log which to me seem to indicate that it is getting "nudged" about the udpated data:

       

      2013-05-22 16:07:58,162 TRACE [org.modeshape.jcr.bus.ClusteredRepositoryChangeBus] (Incoming-2,HA2-BRMS-23320) Received on cluster 'modeshape-cluster' 10 changes on workspace system made by <anonymous> from process '4b0a01e9-27d3-4c6b-bd2a-fa09a585334a' at 2013-05-22T16:07:58.300+02:00
      2013-05-22 16:07:58,224 TRACE [org.infinispan.remoting.InboundInvocationHandlerImpl] (OOB-5,HA2-BRMS-5605) About to send back response null for command PrepareCommand {modifications=[PutKeyValueCommand{key=b099a6e7505d64fb52aaa6-4560-4ab6-9ae8-f0d028de2be3, value=SchematicEntryWholeDelta{ "metadata" : { "id" : "b099a6e7505d64fb52aaa6-4560-4ab6-9ae8-f0d028de2be3" , "contentType" : "application/json" } , "content" : { "key" : "b099a6e7505d64fb52aaa6-4560-4ab6-9ae8-f0d028de2be3" , "parent" : "b099a6e7505d64/" , "properties" : { "http://www.jcp.org/jcr/1.0" : { "primaryType" : { "$name" : "{http://www.lettergen.com/common/}Request" } , "created" : { "$date" : "2013-05-22T16:07:58.222+02:00" } , "mixinTypes" : [ { "$name" : "mix:lockable" } ] , "createdBy" : "<anonymous>" , "lockOwner" : "<anonymous>" , "lockIsDeep" : true } , "http://www.lettergen.com/common/" : { "guid" : "07c40c43f3b82e08227382940545ba58" } } , "children" : [ { "key" : "b099a6e7505d649b3eeaa6-bcfd-43c6-b2f1-b8aba017fb6a" , "name" : "lg:requestData" } ] , "childrenInfo" : { "count" : 1 } } }, flags=[SKIP_REMOTE_LOOKUP, DELTA_WRITE], putIfAbsent=false, lifespanMillis=-1, maxIdleTimeMillis=-1, successful=true}, PutKeyValueCommand{key=b099a6e7505d649b3eeaa6-bcfd-43c6-b2f1-b8aba017fb6a, value=SchematicEntryWholeDelta{ "metadata" : { "id" : "b099a6e7505d649b3eeaa6-bcfd-43c6-b2f1-b8aba017fb6a" , "contentType" : "application/json" } , "content" : { "key" : "b099a6e7505d649b3eeaa6-bcfd-43c6-b2f1-b8aba017fb6a" , "parent" : "b099a6e7505d64fb52aaa6-4560-4ab6-9ae8-f0d028de2be3" , "properties" : { "http://www.jcp.org/jcr/1.0" : { "primaryType" : { "$name" : "nt:file" } , "created" : { "$date" : "2013-05-22T16:07:58.409+02:00" } , "createdBy" : "<anonymous>" } } , "children" : [ { "key" : "b099a6e7505d64c74c1749-7b13-4a5e-b703-50677337d991" , "name" : "jcr:content" } ] , "childrenInfo" : { "count" : 1 } } }, flags=null, putIfAbsent=true, lifespanMillis=-1, maxIdleTimeMillis=-1, successful=true}, PutKeyValueCommand{key=545c23ac44b1f83fbe696e9db8b4beb961446731-ref, value=SchematicEntryWholeDelta{ "metadata" : { "id" : "545c23ac44b1f83fbe696e9db8b4beb961446731-ref" , "contentType" : "application/json" } , "content" : { "sha1" : "545c23ac44b1f83fbe696e9db8b4beb961446731" , "refCount" : 13 } }, flags=[SKIP_REMOTE_LOOKUP, DELTA_WRITE], putIfAbsent=false, lifespanMillis=-1, maxIdleTimeMillis=-1, successful=true}, PutKeyValueCommand{key=b099a6e7505d64c74c1749-7b13-4a5e-b703-50677337d991, value=SchematicEntryWholeDelta{ "metadata" : { "id" : "b099a6e7505d64c74c1749-7b13-4a5e-b703-50677337d991" , "contentType" : "application/json" } , "content" : { "key" : "b099a6e7505d64c74c1749-7b13-4a5e-b703-50677337d991" , "parent" : "b099a6e7505d649b3eeaa6-bcfd-43c6-b2f1-b8aba017fb6a" , "properties" : { "http://www.jcp.org/jcr/1.0" : { "primaryType" : { "$name" : "nt:resource" } , "data" : { "$sha1" : "545c23ac44b1f83fbe696e9db8b4beb961446731" , "$len" : 672 } , "lastModified" : { "$date" : "2013-05-22T16:07:58.409+02:00" } , "lastModifiedBy" : "<anonymous>" , "mimeType" : "text/xml" } } } }, flags=null, putIfAbsent=true, lifespanMillis=-1, maxIdleTimeMillis=-1, successful=true}], onePhaseCommit=true, gtx=GlobalTransaction:<HA1-BRMS-47165>:288:remote, cacheName='BinRepository', topologyId=6}
      2013-05-22 16:07:58,255 TRACE [org.infinispan.remoting.InboundInvocationHandlerImpl] (OOB-5,HA2-BRMS-5605) About to send back response null for command TxCompletionNotificationCommand{ xid=null, internalId=0, topologyId=6, gtx=GlobalTransaction:<HA1-BRMS-47165>:288:local, cacheName=DataStoreRepository} 
      2013-05-22 16:07:58,287 TRACE [org.modeshape.jcr.bus.ClusteredRepositoryChangeBus] (Incoming-1,HA2-BRMS-23320) Received on cluster 'modeshape-cluster' 10 changes on workspace default made by <anonymous> from process '4b0a01e9-27d3-4c6b-bd2a-fa09a585334a' at 2013-05-22T16:07:58.409+02:00
      

       

      However when trying to query for this node (using SQL2) on the second machine it can't be found. It seems that for some reason the index doesn't get updated.

      Note that requesting the root node and iterating over the entire repository does work as expected, presumably the index isn't used when doing so (which would make sense).

       

      We have been struggling with this for a while now and haven't been able to dig up anything that could help us so far. Possibly we haven't been looking for the right thing, any pointers or information you could provide would be most appreciated.

        • 1. Re: ModeShape + InfiniSpan cluster: indexes are not updated
          rhauch

          Have you tried using the JCR API to verify that the content is correctly updated and available on all machines in the cluster? For example, you should be able to register a listener in each of the processes, and each listener should see all of the same events. If that is the case but you're not seeing the changes in the JCR-SQL2 query results, then the problem is related to indexing. However, if the changes on one machine do not appear on another machine using the JCR API, then that is likely caused by ModeShape processes not seeing the other machines.

           

          Based upon your configuration, I suspect the problem is that you're not actually setting up clustered indexing. Your ModeShape configuration contains the following:

           

               "query" : {
                  "enabled" : true,
                  "indexStorage" : {
                      "type" : "filesystem",
                      "location" : "C:/modeshape/index",
                  },
          

           

          This means you're storing the indexes on the file system, and that each process is only indexing the changes that are made locally.

           

          But there are several things that would need to change:

           

          1. Each ModeShape process needs to have its own, separate directory where it can store the indexes. So if you're running multiple processes in the same cluster on the same machine, the above configuration will not work since multiple processes would attempt to store their indexes in the same location. Essentially, you will have to specify a different directory for each process. (Remember that our configuration files do support variables with values defined via system properties, and they make it a lot easier to share configuration files.)
          2. You should be setting up JMS master/slave or JGroups master/slave index storage. We have a sample test configuration for JGroups (see the master config and the slave config), but the JMS configuration is fairly similar.
          • 2. Re: ModeShape + InfiniSpan cluster: indexes are not updated
            geensb

               

            Have you tried using the JCR API to verify that the content is correctly updated and available on all machines in the cluster? For example, you should be able to register a listener in each of the processes, and each listener should see all of the same events. If that is the case but you're not seeing the changes in the JCR-SQL2 query results, then the problem is related to indexing. However, if the changes on one machine do not appear on another machine using the JCR API, then that is likely caused by ModeShape processes not seeing the other machines.

            We at first tried to cluster the indexes by also storing them in InfiniSpan, but we couldn't get this to work and in fact we found information stating that this wouldn't work (unfortunately I can't find the link right now). Also there are only two nodes so far so it's hard to compare the logging.

             

            Each ModeShape process needs to have its own, separate directory where it can store the indexes. So if you're running multiple processes in the same cluster on the same machine, the above configuration will not work since multiple processes would attempt to store their indexes in the same location. Essentially, you will have to specify a different directory for each process. (Remember that our configuration files do support variables with values defined via system properties, and they make it a lot easier to share configuration files.)

            The machines are separate virtual machines, so their index directories are guaranteed to be separate.

             

            You should be setting up JMS master/slave or JGroups master/slave index storage. We have a sample test configuration for JGroups (see the master config and the slave config), but the JMS configuration is fairly similar.

            I wasn't aware of this, will be looking into this first thing tomorrow.

             

            I have tried to use the same configuration files locally (though in JBoss EAP 6.1 on Linux and with PostgreSQL with everything embedded in the application war) and oddly enough there things seem to work as expected. Also here index directories are separate as are the databases infinispan uses for either instance.

            • 3. Re: ModeShape + InfiniSpan cluster: indexes are not updated
              albertdev

              Since I'm part of the "We" my colleague is referring to, I'd better join in.

               

              I changed the configuration to a master / slave setup using JGroups, as shown in the samples. It works, although a lot depends on index replication.

               

              Now I was wondering if there is also support for a peer-to-peer indexing setup?

              It appears from my tests that the whole cluster setup becomes unusable if the master goes down, and the slaves can write to the cluster but can't query for those nodes until the index is replicated. Would it be possible to let both nodes have a local copy of the index AND forward changes to each other?

              • 4. Re: ModeShape + InfiniSpan cluster: indexes are not updated
                hchiorean

                Peer-to-peer index replication is something we're working on atm (see https://issues.jboss.org/browse/MODE-1943). Hopefully it will part of 3.3, which is due out towards the end of next week.

                 

                In 3.2, the only fault-tolerant cluster mode for indexes is via JMS, assuming the latter is configured in such a way. This is not always simple to configure, which is why we're working on the above peer-to-peer mode.

                • 5. Re: ModeShape + InfiniSpan cluster: indexes are not updated
                  albertdev

                  Horia Chiorean wrote:

                   

                  Peer-to-peer index replication is something we're working on atm (see https://issues.jboss.org/browse/MODE-1943). Hopefully it will part of 3.3, which is due out towards the end of next week.

                   

                  In 3.2, the only fault-tolerant cluster mode for indexes is via JMS, assuming the latter is configured in such a way. This is not always simple to configure, which is why we're working on the above peer-to-peer mode.

                  I've seen your pull request with a fix for MODE-1943. I tested it, and it appears to work as advertised.

                   

                  Now it seems that in a write-heavy setup, the indexing sometimes can't keep up with the flood of input and cause the query manager to return null. However, I think this is more a limitation with the way the query manager depends on the index. On the whole of it, your solution looks better than any of the Hibernate Search clustering options due to the following reasons:

                  • There's no index copying overhead when nothing gets written during an idle period, and no need to tweak the refresh interval.
                  • The whole cluster can't go belly-up because the master died (Hibernate search reportedly has a master reelection process for the JGroups backend, but it's documentation doesn't say anything about what it does for the index copy process).
                  • No need for a shared filesystem or extra infinispan cluster (as a matter of fact, I believe there is no master + slave backend for Infinispan?).

                   

                  I wonder though what direction the Modeshape project wants to go with regards to clustering? Infinispan and JGroups can do some clever tricks, but the whole indexing thing seems like the Achilles' heel for the query manager.

                  • 6. Re: ModeShape + InfiniSpan cluster: indexes are not updated
                    hchiorean

                    In retrospect, we've seen/are seeing that HSearch + Lucene (the way we're using them right now) are not optimal for real high-performance clustered scenarios. We've been looking at/thinking about a couple of improvements in this area:

                    • changing our in-house/custom query engine with the one from the Teiid project (http://www.jboss.org/teiid/). It's not only more mature, but it's a lot more perfomant
                    • moving to Solr/Elastic Search for scalability & performance.

                     

                    Randall Hauch could probably provide more insight into these topics.

                    • 7. Re: ModeShape + InfiniSpan cluster: indexes are not updated
                      rhauch

                      In retrospect, we've seen/are seeing that HSearch + Lucene (the way we're using them right now) are not optimal for real high-performance clustered scenarios.

                       

                      +1, although I'd be a bit more specific that it's not ideal for real high performance clustered scenarios with high write volumes. Of course, those scenarios are quite common.

                       

                      • changing our in-house/custom query engine with the one from the Teiid project (http://www.jboss.org/teiid/). It's not only more mature, but it's a lot more perfomant

                      This is the plan, but it is also completely unrelated to maintaining the indexes and keeping them up-to-date as the content changes.

                       

                       

                      • moving to Solr/Elastic Search for scalability & performance.

                       

                      This will likely have the greatest impact on performance and scalability. We also plan to change the structure of the indexes, though choosing the design of how many indexes and what exactly they contain is full of compromises and tradeoffs. (The current design is heavily weighted toward query performance, and was based largely upon what we've learned from 2.x. IMO we went too far toward that end of the spectrum, and this is even more true with the peer-to-peer approach.)

                       

                      Unfortunately, the whole indexing and query system is a big compromise of living within the constraints of query performance and indexing performance as well as the basic design/requirements of Lucene (no clustering support) and Hibernate Search (one approach of clustering on top of Lucene).

                      • 8. Re: ModeShape + InfiniSpan cluster: indexes are not updated
                        albertdev

                        Okay, so things seem to work fine when we start from a clean environment and store a couple of thousand nodes. The peers exchange messages and as long as we use Infinispan's replication transport, everything stays synced.

                         

                        What doesn't work so great is a reboot or a sudden stop of one of the nodes: either the index gets out of sync, or we need to trigger a complete rebuild. I'm not sure if the out-of-sync problem resolves itself, so we tried the rebuilding approach. However, this process seems to take ages, even though all Infinispan nodes should have a local copy of the data.

                         

                        Is there a way to see what gets indexed? Our config doesn't have any text-extractors, so I guess it should only index nodes and their attributes.

                        Any logging "categories" which can be set to a given level to have an idea of what is going on?

                        • 9. Re: ModeShape + InfiniSpan cluster: indexes are not updated
                          rhauch

                          Okay, so things seem to work fine when we start from a clean environment and store a couple of thousand nodes. The peers exchange messages and as long as we use Infinispan's replication transport, everything stays synced.

                           

                          What doesn't work so great is a reboot or a sudden stop of one of the nodes: either the index gets out of sync, or we need to trigger a complete rebuild. I'm not sure if the out-of-sync problem resolves itself, so we tried the rebuilding approach. However, this process seems to take ages, even though all Infinispan nodes should have a local copy of the data.

                          What index clustering approach are you using?

                           

                          Is there a way to see what gets indexed? Our config doesn't have any text-extractors, so I guess it should only index nodes and their attributes.

                          Any logging "categories" which can be set to a given level to have an idea of what is going on?

                           

                          You can enable TRACE logging on "org.modeshape.jcr.query.lucene.basic.BasicLuceneSchema" (or higher) to see exactly what node changes are getting indexed.

                          • 10. Re: ModeShape + InfiniSpan cluster: indexes are not updated
                            geensb

                            Randall Hauch wrote:

                             

                            Okay, so things seem to work fine when we start from a clean environment and store a couple of thousand nodes. The peers exchange messages and as long as we use Infinispan's replication transport, everything stays synced.

                             

                            What doesn't work so great is a reboot or a sudden stop of one of the nodes: either the index gets out of sync, or we need to trigger a complete rebuild. I'm not sure if the out-of-sync problem resolves itself, so we tried the rebuilding approach. However, this process seems to take ages, even though all Infinispan nodes should have a local copy of the data.

                            What index clustering approach are you using?

                             

                            We are currently using the peer-to-peer indexing introduced with https://issues.jboss.org/browse/MODE-1943 so each time a cluster node is restarted we do a full index rebuild (as recommended in the aforementioned issue). This seems to take ages (more than 20minutes) even with a relatively small amount of nodes (we test with ~2500 nodes)

                            • 11. Re: ModeShape + InfiniSpan cluster: indexes are not updated
                              hchiorean

                              When you say "seems to take ages" are you referring to: a) the server node is "unresponsive" until indexing has completed or b) queries executed on that node will not return the correct results for more than 20 minutes ?

                               

                              If it's (a): re-indexing is done asynchronously by default, so are you sure re-indexing is the cause and also that Infinispan has finished data "reconciliation" with the other nodes ? Based on what Randall suggested above, if you enable TRACE logging (not only in org.modeshape.jcr.query but also in org.modeshape.jcr.cache) you should get a clearer picture of what step is taking so long.

                               

                              If it's (b): TRACE logging (as suggested above) will really tell you if re-indexing is going on for the full 20+ minutes. If that is the case, the only thing I can think of is that you attach VisualVM (or a similar profiler) to the process and look at the running threads to check what's going on. For example, we've noticed that Lucene storing indexes on a NTFS (windows) filesystem is considerably slower than storing indexes on an EXT filesystem (I suspect due to file locks).

                              • 12. Re: ModeShape + InfiniSpan cluster: indexes are not updated
                                geensb

                                Horia Chiorean wrote:

                                 

                                When you say "seems to take ages" are you referring to: a) the server node is "unresponsive" until indexing has completed or b) queries executed on that node will not return the correct results for more than 20 minutes ?

                                 

                                We update the indexes in synchronous mode so "a" because we otherwise expect "b"  to happen.

                                 

                                 

                                If it's (b): TRACE logging (as suggested above) will really tell you if re-indexing is going on for the full 20+ minutes. If that is the case, the only thing I can think of is that you attach VisualVM (or a similar profiler) to the process and look at the running threads to check what's going on. For example, we've noticed that Lucene storing indexes on a NTFS (windows) filesystem is considerably slower than storing indexes on an EXT filesystem (I suspect due to file locks).

                                Unfortunately both are true, it is actually indexing all that time and unfortunately this is on a Windows system (so NTFS). It would seem that we either have to live with indexes being possibly out of date by updating asynchronously or a delay in startup while indexes are being updated until another indexing strategy can be added (as you and Randall mentioned earlier in the thread).