1 Reply Latest reply on Mar 25, 2015 5:52 AM by hchiorean

    Repository diagnostic tools

    folch

      Hi,

       

      Is there any kind of tool or API we can use for repository diagnosis or troubleshooting in Modeshape 3.8.0?

      Rencently, we had some repository corruptions doing some stress testing with high concurrency. We were looking at what could be the root cause and we decided to add Locking on certain nodes and change Infinispan configuration to avoid that in the future. However a question was raised from Product Owner, what can we do if this happens in a Prodcution environment, appart from defining a Backup plan for the data. Can we do a sanity check on the repository? Maybe in Infinispan Cache? Can we check consistency? Can we recover corrupted data? Maybe manually?

      Basically we wanted to know what kind or tools or APIs are available to answer those questions.

       

      Thanks in advance

        • 1. Re: Repository diagnostic tools
          hchiorean

          The short answer is none - there are no such tools for ModeShape.

          The problem IMO comes from the sheer complexity of the JCR spec. To check whether a repository is "consistent" or not we'd have to define what "consistency" means:

          • consistency in the sense of ModeShape's internal data structures (the /jcr:system area) and/or its relationship with the "outside"/client stored data. For example are all the JCR namespaces present, do all internal locks have node correspondents etc
          • consistency in the sense of the client data being correct - e.g. hard referential integrity (JCR hard references), binary data integrity (binary data referenced by nodes actually exists)  etc
          • consistency when running in a cluster - this is something which IMO is quasi-impossible to measure, since you can have split-brain scenarios where each partition is data-consistent, but the actual clustered state would be inconsistent...

          and the list probably goes on. Because of this, once data becomes corrupted, the only option ATM. is to restore (via backup/restore) to a previously known/consistent state.

           

          What aspects of consistency are relevant for your use case ?

           

          In the case of both ModeShape 3.x and 4.x one possible cause of data corruption under stress is a known ISPN issue: see [MODE-2280] Child node not found under high concurrency when eviction is enabled - JBoss Issue Tracker and the linked ISPN bug. This will only be fixed once ModeShape is able to move to at least ISPN 7.2