1 Reply Latest reply on Apr 12, 2013 2:13 AM by hchiorean

    rebuild index on startup defaults and startup times

    bwallis42

      I have a test system with about 164000 nodes and find that it takes about 5 minutes to initialise the repository on restart of the appserver. The cause is the default settings for the repository indexing which are:

       

      {code}<indexing rebuild-upon-startup="IF_MISSING" rebuild-upon-startup-mode="SYNC"/>{code}

       

      What this seems to mean is that on initialisation of the repository the startup sequence walks the complete node tree checking if there are any indexes missing and it does this synchronously. The rate of 546 nodes/second (on my mac laptop) is not too bad but this does make for a slow appserver/repository startup.

       

      Setting the startup mode to ASYNC changes the behaviour so that the check happens in the background and you can use the repository for both node access and for queries. Of course the query results may not be accurate if there are any missing indexes.

       

      Four questions

      1. Should the default setting be async?
      2. Is there a way to launch an "if_missing" check programatically and get a report of any that were? This might be something that would be useful for a production server installation to run periodically
      3. Does the value of async-thread-pool-size affect the startup async re-indexing? I couldn't see evidence of multiple threads when I set it to 3 and there was no discernable change in the time taken.
      4. Not really a startup question, I'm working on the assumption that the indexes would be updated within the scope of the current transaction during the call to session.save(). Is this the case (of course this probably depends on the type of index storage used, I expect the filesystem stored indexes are not transactional but cache one will be when they work correctly)
        • 1. Re: rebuild index on startup defaults and startup times
          hchiorean

          Hi Brian,

           

          With the fixes for https://issues.jboss.org/browse/MODE-1872 and https://issues.jboss.org/browse/MODE-1876, re-indexing is performed asynchronously by default and if_missing basically means that re-indexing is only performed if there aren't any indexes at all.

          In other words, the entire index tree is no longer walked/read. Its simple presence is enough.

           

          So regarding your questions:

          1. Yes, we believe async is the correct setting, because it's more important to have a "responsive" repository ASAP, as opposed to having an unresponsive repository when having to perform re-indexing for lots of nodes.

          2. There isn't a way to do this programmatically and also the code which walks the entire index tree has been removed, so I don't know that this would make sense. The reason it was removed is that loading all indexes in memory is a ...bad idea for a real system.

          3. Because ModeShape uses just 1 index (with lots of segments), I really doubt it makes a difference, because the update/add index segment jobs would still "bubble down" to the same index writer instance. That being said, we haven't really tested/profiled this, so I can't be 100% sure.

          4. Index updates during a transient session, in the case of an active transaction, are sent to Hibernate Search, together with the transaction context. Hibernate Search in turn, will run the index updates only when the transaction has been committed successfully (via a Synchronization listener). In case the transaction is rolled back, or smth. unexpected occurs, the listener won't be fired and no index changes will be performed.