2 Replies Latest reply on Feb 26, 2014 10:19 AM by rhauch

    Connector and children refresh operation

    dr_nickson

      We're trying to make modeshape not to call connector's getChildren operation for a specific nodes (or do it in rare cases). (ModeShape 3.7)

       

      The situation is following:

      We're using CmisConnector to access external CMIS repository.

      External CMIS repo has one folder with lots of documents (possibly million(s)). The contents of this folder rarely browsed. It is mostly used for putting (creation) documents inside.

      Reading of those external documents make modeshape's performance degrade or blocks it from start when indexing is ON.

       

      In order to improve performance we made connector to return node's (folder's) children as paged list. Also specified PageWriter.UNKNOWN_TOTAL_SIZE for children size to make modeshape use getChildReference when appropriate instead of getChildren operation.

       

      But for some cases like document creation getChildren is called anyways and walks through all the pages until gets to the end. This is caused for example with LazyCachedNode.getChildReferences().getChildCount(name) operation when SNS is being checked before actual node creation. This action takes much time which increases with number of items.

       

      So the question:

      Is there any way to make modeshape not to call getChildren or minimize the frequency of these calls ?

       

      Note: We cannot define external folder as empty as in this case after document is created or queried modeshape will fail with error stating that parent has no reference to it's child. Parent (folder) must still be able to provide it's children for reference validation.

       

      Also is it possible to delegate some logic related to children processing to a connector eg. getChildCount(name) for SNS check? So the there will be no necessity to read nodes children completely while connector may do optimized query call for a specified name.

       

      Thanks in advance

        • 1. Re: Connector and children refresh operation
          hchiorean

          The logic behind the getChildren() calls is dictated by the JCR spec (e.g. SNS counting), so there isn't a way to turn that off (nor is it desired IMO).

           

          However, in the case of external nodes we can look into optimizing the calls which, in the case of pages of children, load each page (like getChildCount(name)). One thing we can try is what you suggested - delegate these calls to the connector implementation. Please feel free to log an enhancement request describing this case.

           

          Note: We cannot define external folder as empty as in this case after document is created or queried modeshape will fail with error stating that parent has no reference to it's child. Parent (folder) must still be able to provide it's children for reference validation.


          Not sure I understand what you mean by this.

          • 2. Re: Connector and children refresh operation
            rhauch

            But for some cases like document creation getChildren is called anyways and walks through all the pages until gets to the end.

            So it sounds like you've made the CmisConnector implement Pageable, which is great.

             

            Also specified PageWriter.UNKNOWN_TOTAL_SIZE for children size to make modeshape use getChildReference when appropriate instead of getChildren operation.

             

            That is correct. If you look at the logic here and here, returning PageWriter.UNKNOWN_TOTAL_SIZE for children size will make ModeShape call directly to the connector for the child reference by key and parent key, and this would be faster than going through all the segments.

             

            But for some cases like document creation getChildren is called anyways and walks through all the pages until gets to the end. This is caused for example with LazyCachedNode.getChildReferences().getChildCount(name) operation when SNS is being checked before actual node creation. This action takes much time which increases with number of items.

             

            So the question:

            Is there any way to make modeshape not to call getChildren or minimize the frequency of these calls ?

             

            Yes, there may be a way to improve this, as Horia mentioned above. Unfortunately, pushing more of this logic down to the connector means breaking changes to the connector SPI. That's something that we can do in the 4.0 timeframe (our current release effort), but is something we try to avoid in minor releases (e.g., 3.x). In 4.0 it also would have to compete with our other very high-priority items. Any assistance here would be greatly appreciated.

             

            BTW, even though some of the methods seem like they are related to SNS, they aren't always used that way. For example, getting the number of children with a particular name does indeed allow us to make ModeShape behave properly with SNS, but it also allows us to find out whether there is a child with a given name (e.g., the method can return 0 or 1).

             

            In the meantime, one thing that you can try is to play with the sizes of your pages. Increasing the size of the pages and thus returning fewer pages will mean your app consumes more memory, but it will require fewer calls to the connector. We can't really say what that is, since it depends so much on your particular environment (hardware, available memory, CPU utilization, network latency, etc.), your CMIS source, your application, and of course your data.

             

            Also, can you add more detail in terms of the performance goals and actuals? How far off are you?