5 Replies Latest reply on Dec 5, 2011 2:03 PM by rhauch

    LargeValue: gc and streaming

    jamat

      Hello,

       

      Our app is using the JPA connector and some of the data that we are storing are files.

      What I have noticed is that the entries in the MODE_SIMPLE_LARGE_VALUES table were not removed when I was deleting the corresponding property. I have also found out that there was a 'garbage collector' that was supposed to cleared those entries.

      But nothing was happening.

      Reading some piece of code I saw that the default value for the jpa connector was to enable garbage collection but this info is lost when some properties are set. For instance if the creatingWorkspacesAllowed property is set then the 'setCreatingWorkspacesAllowed()' method in JpaSource is called and a new RepositorySourceCapabilities is built but the autoGarbageCollection is not provided and thus has its default value of 'true'

       

      My second question/comment is about how those large values are managed. More precisely is it possible to stream directly to/from the db or is the corresponding property loaded fully in memory. My concern here of course is about big files. I would like to avoid OOM.

      Related question: when are the properties of a node loaded in memory ? Is it only when I access a property ? IOW if I iterate over the children of a node but without accessing any property will the properties of the children be loaded in memory ? (Yes I know this is not a very good example)

       

      TIA.

        • 1. Re: LargeValue: gc and streaming
          rhauch

          Our app is using the JPA connector and some of the data that we are storing are files.

          What I have noticed is that the entries in the MODE_SIMPLE_LARGE_VALUES table were not removed when I was deleting the corresponding property. I have also found out that there was a 'garbage collector' that was supposed to cleared those entries.

          But nothing was happening.

          Reading some piece of code I saw that the default value for the jpa connector was to enable garbage collection but this info is lost when some properties are set. For instance if the creatingWorkspacesAllowed property is set then the 'setCreatingWorkspacesAllowed()' method in JpaSource is called and a new RepositorySourceCapabilities is built but the autoGarbageCollection is not provided and thus has its default value of 'true'

          Can you provide a specific location of the particular code you're looking at? Perhaps a link to the code in our GitHub repository? Which DBMS are you using, and are you specifying the correct Hibernate dialect in your configuration file?

           

           

          My second question/comment is about how those large values are managed. More precisely is it possible to stream directly to/from the db or is the corresponding property loaded fully in memory. My concern here of course is about big files. I would like to avoid OOM.

           

          Very large files could be a problem at the moment.

           

           

          Related question: when are the properties of a node loaded in memory ? Is it only when I access a property ? IOW if I iterate over the children of a node but without accessing any property will the properties of the children be loaded in memory ? (Yes I know this is not a very good example)

           

          How and when ModeShape loads the properties of a node depends on the access pattern. In your example, the properties of the children should not be loaded when you iterate over the children, but are loaded only when you ask for the properties on one of the child nodes.

          • 2. Re: LargeValue: gc and streaming
            jamat

            Was not clear enough (eben though I provided the name of the file). I was looking at:

             

            ./extensions/modeshape-connector-store-jpa/src/main/java/org/modeshape/connector/store/jpa/JpaSource.java

             

            and  for instance we have:

             

            public synchronized void setCreatingWorkspacesAllowed( boolean allowWorkspaceCreation ) {
                    capabilities = new RepositorySourceCapabilities(capabilities.supportsSameNameSiblings(), capabilities.supportsUpdates(),
                                                                    capabilities.supportsEvents(), allowWorkspaceCreation,
                                                                    capabilities.supportsReferences());
                }

            So when this method is called we build a new RepositorySourceCapabilities object but we do not provide any autoGarbageCollection.

             

            As for which DBMS I do not think that it matters.

             

            Thank you for the other comments. We may have to workaround large files by maybe splitting them.

             

            • 3. Re: LargeValue: gc and streaming
              rhauch

              Okay, I think I understand. I've logged the issue as MODE-1327. While there's not a workaround, as soon as the bug is fixed you could try getting the latest code and building locally.

              • 4. Re: LargeValue: gc and streaming
                jamat

                Actually in my case the workaround is to simply not use 'creatingWorkspacesAllowed' in the configuration file.

                 

                BTW unrelated comment. We use modeshape on jboss as 6 as a service. What we do is modify the

                configuration file (deploy/modeshape-services.jar/modeshape-config.xml) for our needs.

                Is this the correct way to do ?

                Also if I want to define some nodeTypes as we have those by default :

                <jcr:nodeTypes>
                                <mode:resource>/org/modeshape/sequencer/teiid/teiid.cnd</mode:resource>
                ...

                what is the best way to do ? IOW where should I put my cnd files and how to reference them ?

                (what I am doing for now what to define them programmatically in my app)

                 

                Thank you again.

                • 5. Re: LargeValue: gc and streaming
                  rhauch

                  Actually in my case the workaround is to simply not use 'creatingWorkspacesAllowed' in the configuration file.

                  Okay. I didn't understand that was an option.

                   

                   

                  BTW unrelated comment. We use modeshape on jboss as 6 as a service. What we do is modify the

                  configuration file (deploy/modeshape-services.jar/modeshape-config.xml) for our needs.

                  Is this the correct way to do ?

                  Yes, that's the correct configuration file to change.

                   

                   

                  Also if I want to define some nodeTypes as we have those by default :

                  <jcr:nodeTypes>
                                  <mode:resource>/org/modeshape/sequencer/teiid/teiid.cnd</mode:resource>
                  ...

                  what is the best way to do ? IOW where should I put my cnd files and how to reference them ?

                  (what I am doing for now what to define them programmatically in my app)

                  I think programmatic registration is the best way to go, for a number of reasons. First, registering node types via the "mode:resource" CND mechanism only works if its on the classpath, and that's not something that's always possible (e.g., when the web app contains the CND file). Secondly, programmatic registration uses the standard API. Thirdly, we're going to move toward strongly encouraging this style registration in 3.0.

                   

                  The disadvantage is that the application may want to read the node types from one or more CND files, and that's not something covered by the JCR API. ModeShape has a CndNodeTypeReader class that can be used, but we're going to be making this significantly easier in 2.7 and 3.0.

                  We're defining a new interface 'org.modeshape.jcr.api.nodetype.NodeTypeManager' that extends 'javax.jcr.nodetype.NodeTypeManager' and that will be implemented by ModeShape, and this new interface will have methods to register node types by supplying CND files. See MODE-1328 for the details.