9 Replies Latest reply on Feb 4, 2015 4:00 PM by rhauch

    ModeShape 4.1 and disk space

    mhunter-aap

      I've just started using ModeShape, but I've run into some issues.  Essentially, I'm building a tree with approximately 136000 nodes in the entire tree structure.  The data is originating from a relational database, but I'm trying to remove that datastore from the equation and use ModeShape as my source of record.  The tree consists of 48 nodes at the root level of the tree, with the distribution as follows:

       

      1: 6

      2: 156

      3: 117

      4: 7

      5: 8

      6: 77

      7: 28

      8: 5

      9: 16

      10: 3

      11: 8

      12: 5

      13: 2678

      14: 8103

      15: 2151

      16: 96

      17: 70

      18: 103

      19: 3

      20: 9

      21: 253

      22: 30

      23: 3

      24: 5

      25: 134

      26: 10

      27: 6

      28: 3371

      29: 331

      30: 3

      31: 9

      32: 26

      33: 10

      34: 4

      35: 4

      36: 2652

      37: 2726

      38: 58

      39: 6

      40: 5

      41: 81

      42: 2051

      43: 12

      44: 5

      45: 22

      46: 110215

      47: 3

      48: 373

      As I add new nodes into the structure, I'm noticing the Infinispan file size grow out of control.

      Record countFile size
      5000250MB
      10000761MB
      150001.5GB
      200002.0 GB
      4000013 GB

      At that point, performance starts to visibly degrade, but given the size of the objects I'm trying to store I wouldn't ever think the file size could be that large.  The raw data file size I have from an SQL query is only 7.5 kb.

      Here's the code I'm using to build the nodes in my workspace:

       

      @Named
      @RequestScoped
      public class Service {
           @Inject
           @Repository // Internal Qualifier
           private Session session;
      
           @Inject
           private Logger log;
      
           public void addComponent(final int record, final Component component) {
                try {
                     /*
                                  Component Object Structure consists of a component name, a type, and a key value
                                  "Component" / "Component Type" / 1 / Properties:["C";"A";3;"Catalog"...]
                              */
                     Node rootNode = session.getNode("/");
                     Node childNode = addNode(rootNode, component.getNodeLabel());
                     childNode = addNode(childNode, component.getComponentType());
                     childNode = addNode(childNode, component.getComponentKey() + "");
                     
                     childNode.setProperty("child-type", component.getComponentChildType());
                     childNode.setProperty("parent-type", component.getComponentParentType());
                     childNode.setProperty("level", component.getComponentLevel());
                     childNode.setProperty("priority", component.getComponentPriorityOrder());
                     childNode.setProperty("source-primary-key", component.getSourcePrimaryKey());
                     session.save();
                     log.info("Added component #%d: %s",record , component.getComponentType());
                } catch (RepositoryException e) {
                     log.error("Caught a repository exception trying to add component: %s [%s]", 
                               component, e.getMessage());
                }
           }
      
           private Node addNode(final Node rootNode, final String path) throws RepositoryException {
                Node childNode = null;
      
                try {
                     childNode = rootNode.getNode(path);
                } catch (PathNotFoundException ex) {
                     log.debug("Need to add path for %s", path);
                } 
                
                if (childNode == null) {
                     childNode = rootNode.addNode(path);
                }
                
                return childNode;
           }
      }
      

      Here is my configuration in WildFly:

              <subsystem xmlns="urn:jboss:domain:infinispan:2.0">
                  <cache-container name="modeshape" default-cache="sample" module="org.modeshape">
                      <local-cache name="sample">
                          <locking isolation="READ_COMMITTED"/>
                          <transaction mode="NON_XA" locking="PESSIMISTIC"/>
                          <eviction strategy="LRU" max-entries="10000"/>
                          <file-store shared="true" passivation="false" purge="false" path="modeshape/store/sample"/>
                      </local-cache>
                  </cache-container>
              </subsystem>
              <subsystem xmlns="urn:jboss:domain:modeshape:2.0">
                  <repository name="sample" anonymous-roles="admin">
                      <workspaces>
                          <workspace name="default"/>
                          <workspace name="other"/>
                      </workspaces>
                  </repository>
              </subsystem>

       

       

      Environment:

      WildFly 8.2.0.Final with ModeShape 4.1

      Centos 6.5

      Java SE Runtime (HotSpot 1.7.0_55)

        • 1. Re: ModeShape 4.1 and disk space
          hchiorean

          ModeShape stores the nodes as BSON documents in Infinispan, but has absolutely no control over the storage itself, which is up to Infinispan. So the only thing I can suggest is looking at other cache stores (JDBC, LevelDB etc) and comparing the storage size of these stores.

          • 2. Re: ModeShape 4.1 and disk space
            mhunter-aap

            Can I define an alternate cache store for ModeShape?  I assumed it was file store only.  Are there example configurations for using either of those you mention?

            • 3. Re: ModeShape 4.1 and disk space
              hchiorean

              You basically need to define/configure the appropriate ISPN cache in Wildfly: for an example of a JDBC based store you can look here: https://github.com/ModeShape/modeshape/blob/master/integration/modeshape-jbossas-integration-tests/src/main/resources/kit/jboss-wf8/standalone/configuration/standalone-modeshape.xml#L521

              In addition, if you want other storage options (for example LevelDB) you need to add the appropriate cache loader as a dependency in Wildfly and configure it. You should search the Infinispan documentation/Github for an example of how to do this.

              • 4. Re: ModeShape 4.1 and disk space
                mhunter-aap

                Now I'm running into errors with my configuration when I use that example (I'm sure I'm missing something silly).  The exception is related to the JCR Configuration:

                 

                org.modeshape.jcr.ConfigurationException: The 'sample' repository cannot be started because transactions are not enabled for the 'sample' cache. This can happen either because the <transaction> element is not present in the Infinispan configuration file, or the 'vehicle' cache name from the repository configuration does not match the name of the cache from the Infinispan configuration.

                 

                Infinispan section in standalone.xml:

                <cache-container name="sample">

                                <local-cache name="sample-data">

                                    <locking isolation="READ_COMMITTED"/>

                                    <transaction mode="NON_XA"/>

                                    <eviction strategy="LIRS" max-entries="5"/>

                                    <string-keyed-jdbc-store shared="false" preload="false" passivation="false" purge="false" datasource="jboss/datasources/ModeshapeBinaryStoreDS">

                                        <string-keyed-table prefix="stringbased">

                                            <id-column name="id" type="VARCHAR(200)"/>

                                            <data-column name="datum" type="BYTEA"/>

                                            <timestamp-column name="version" type="BIGINT"/>

                                        </string-keyed-table>

                                    </string-keyed-jdbc-store>

                                </local-cache>

                                <local-cache name="sample-metadata">

                                    <locking isolation="READ_COMMITTED"/>

                                    <transaction mode="NON_XA"/>

                                    <eviction strategy="LIRS" max-entries="2000"/>

                                    <string-keyed-jdbc-store shared="false" preload="false" passivation="false" purge="false" datasource="jboss/datasources/ModeshapeBinaryStoreDS">

                                        <string-keyed-table prefix="stringbased">

                                            <id-column name="id" type="VARCHAR(200)"/>

                                            <data-column name="datum" type="BYTEA"/>

                                            <timestamp-column name="version" type="BIGINT"/>

                                        </string-keyed-table>

                                    </string-keyed-jdbc-store>

                                </local-cache>

                            </cache-container>

                 

                ModeShape section in standalone:

                <repository name="sample" anonymous-roles="admin">

                                <cache-binary-storage data-cache-name="sample-data" metadata-cache-name="sample-metadata" cache-container="sample"/>

                • 5. Re: Re: ModeShape 4.1 and disk space
                  rhauch

                  Why aren't you using "standalone-modeshape.xml"?

                   

                  Anyway, your repository config is:

                  <repository name="sample" anonymous-roles="admin">

                                  <cache-binary-storage data-cache-name="sample-data" metadata-cache-name="sample-metadata" cache-container="sample"/>

                   

                  The reason is does not work is because the "cache-binary-storage" is setting up the *BINARY* storage, but not the normal *NODE* storage. So, you're not defining an Infinispan cache for the repository, so it uses an in-memory transient one (which does not have transactions enabled).

                   

                  To fix it, use:

                   

                      <repository name="sample" cache-container="simple" cache-name="sample-storage" anonymous-roles="admin">

                   

                  and then add another cache named "sample-storage" to your Infinispan subsystem section.

                   

                  But there are a couple of other problems with the Infinispan section. As noted below, you're using the same "prefix" value for both binary data and metadata tables. Don't do that, since that will make ModeShape use the same table for both, and that won't work. To fix, make each of these "prefix" values unique within the same database. Perhaps "data" for the first one, and "metadata" for the second.

                   

                  <cache-container name="sample">

                                  <local-cache name="sample-data">

                                      <locking isolation="READ_COMMITTED"/>

                                      <transaction mode="NON_XA"/>

                                      <eviction strategy="LIRS" max-entries="5"/>

                                      <string-keyed-jdbc-store shared="false" preload="false" passivation="false" purge="false" datasource="jboss/datasources/ModeshapeBinaryStoreDS">

                                          <string-keyed-table prefix="stringbased">

                                              <id-column name="id" type="VARCHAR(200)"/>

                                              <data-column name="datum" type="BYTEA"/>

                                              <timestamp-column name="version" type="BIGINT"/>

                                          </string-keyed-table>

                                      </string-keyed-jdbc-store>

                                  </local-cache>

                                  <local-cache name="sample-metadata">

                                      <locking isolation="READ_COMMITTED"/>

                                      <transaction mode="NON_XA"/>

                                      <eviction strategy="LIRS" max-entries="2000"/>

                                      <string-keyed-jdbc-store shared="false" preload="false" passivation="false" purge="false" datasource="jboss/datasources/ModeshapeBinaryStoreDS">

                                          <string-keyed-table prefix="stringbased">

                                              <id-column name="id" type="VARCHAR(200)"/>

                                              <data-column name="datum" type="BYTEA"/>

                                              <timestamp-column name="version" type="BIGINT"/>

                                          </string-keyed-table>

                                      </string-keyed-jdbc-store>

                                  </local-cache>

                              </cache-container>

                   

                   

                  BTW, ModeShape adopts some naming conventions. If you use "modeshape" for the name of the cache-container in the Infinispan section, then you don't have to name it on each "repository" element in the ModeShape section (since "modeshape" is the default). Also, by default, ModeShape looks for an Infinispan cache that has the same name as the repository. This is explained in the comment that appears in the Infinispan subsystem config in the "standalone-modeshape.xml" file:

                   

                  <!-- Each ModeShape repository uses one (or more) cache in a cache container. We define a single container
                                   named "modeshape" (other names require specifying the container names in each repository configuration,
                                   with a "sample" cache (each repository assumes the cache name matches the repository name). -->
                  

                   

                  Also, by default ModeShape stores binaries as files on disk (one file per unique binary value), but you can specify this in the configuration via the "<file-binary-storage/>" element (rather than the "<cache-binary-storage/>" element you used). You can even specify the location of the directory via the "relative-to" and "path" attributes (both are required). This is very efficient storage, and will be much easier for your initial testing. The only time not to use it is if you are clustering your ModeShape instances, in which case you'll want all of the instances to share all storage (including binary storage).

                   

                  Start simple. Here's what I'd probably use to start out:

                          <subsystem xmlns="urn:jboss:domain:infinispan:2.0">
                              ...
                              <!-- Each ModeShape repository uses one (or more) cache in a cache container. We define a single container
                                   named "modeshape" (other names require specifying the container names in each repository configuration,
                                   with a "sample" cache (each repository assumes the cache name matches the repository name). -->
                              <cache-container name="modeshape" default-cache="sample" module="org.modeshape">
                                  <!--ModeShape using JDBC persistence-->
                                  <local-cache name="sample">
                                      <!--READ_COMMITTED is required to ensure multiple writer threads can update the same node(s)-->
                                      <locking isolation="READ_COMMITTED"/>
                                      <transaction mode="NON_XA"/>
                                      <string-keyed-jdbc-store datasource="java:jboss/datasources/ModeshapeJDBCStoreDS" shared="false"
                                                               preload="false" passivation="false" purge="false">
                                          <string-keyed-table prefix="my-app">
                                              <!-- The "type" fields depend on the DBMS you're using-->
                                              <id-column name="id" type="VARCHAR(200)"/>
                                              <data-column name="datum" type="BYTEA"/>
                                              <timestamp-column name="version" type="BIGINT"/>
                                          </string-keyed-table>
                                      </string-keyed-jdbc-store>
                                  </local-cache>
                              </cache-container>
                          </subsystem>
                          <subsystem xmlns="urn:jboss:domain:modeshape:2.0">
                              <!-- A sample repository that uses the "sample" cache in the "modeshape" container. All content, binary values,
                                   and indexes are stored within the server's data directory. This is the simplest way to configure a repository
                                   that uses defaults for everything; feel free to change and specify other configuration options.  -->
                              <repository name="sample" anonymous-roles="admin">
                                 ...
                                 <file-binary-storage relative-to="jboss.server.data.dir" path="modeshape/sample/binaries"/>
                              </repository>
                              ...
                  

                   

                  Obviously the "..." lines should not be used, but are there just to show that there may be other lines in those sections.

                  • 6. Re: Re: Re: ModeShape 4.1 and disk space
                    mhunter-aap

                    Thanks, Randall (and Horia).  I'm getting much closer.  First off, I did use standalone-modeshape.xml (minor oversight in typing the message).  Also, I found that the datasource name was incorrect.


                    datasource="jboss/datasources/ModeshapeBinaryStoreDS">

                    This should have been:

                    datasource="java:jboss/datasources/ModeshapeBinaryStoreDS">

                     

                    Then I ran into something that may be more related to multiple datasource definitions and transactions (not ModeShape/Infinispan configuration):

                     

                    13:20:00,111 WARN  [com.arjuna.ats.arjuna] (EJB default - 1) ARJUNA012140: Adding multiple last resources is disallowed. Trying to add LastResourceRecord(XAOnePhaseResource(LocalXAResourceImpl@2645db[connectionListener=74e0321d connectionManager=16076236 warned=false currentXid=< formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffffc0a8b182:-53a099d3:54d2628b:17, node_name=1, branch_uid=0:ffffc0a8b182:-53a099d3:54d2628b:44, subordinatenodename=null, eis_name=java:jboss/datasources/ModeshapeBinaryStoreDS > productName=H2 productVersion=1.3.173 (2013-07-28) jndiName=java:jboss/datasources/ModeshapeBinaryStoreDS])), but already have LastResourceRecord(XAOnePhaseResource(LocalXAResourceImpl@76111797[connectionListener=fe596d1 connectionManager=3d1f8dbe warned=false currentXid=< formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffffc0a8b182:-53a099d3:54d2628b:17, node_name=1, branch_uid=0:ffffc0a8b182:-53a099d3:54d2628b:1c, subordinatenodename=null, eis_name=java:jboss/datasources/source-data > productName=PostgreSQL productVersion=9.2.9 jndiName=java:jboss/datasources/source-data]))

                    13:20:00,115 ERROR [org.infinispan.persistence.jdbc.connectionfactory.ManagedConnectionFactory] (EJB default - 1) ISPN008018: Sql failure retrieving connection from datasource: java.sql.SQLException: javax.resource.ResourceException: IJ000457: Unchecked throwable in managedConnectionReconnected() cl=org.jboss.jca.core.connectionmanager.listener.TxConnectionListener@74e0321d[state=NORMAL managed connection=org.jboss.jca.adapters.jdbc.local.LocalManagedConnection@599c706 connection handles=0 lastUse=1423074000114 trackByTx=false pool=org.jboss.jca.core.connectionmanager.pool.strategy.OnePool@32d95dc9 mcp=SemaphoreArrayListManagedConnectionPool@6cadc53d[pool=ModeshapeBinaryStoreDS] xaResource=LocalXAResourceImpl@2645db[connectionListener=74e0321d connectionManager=16076236 warned=false currentXid=null productName=H2 productVersion=1.3.173 (2013-07-28) jndiName=java:jboss/datasources/ModeshapeBinaryStoreDS] txSync=null]

                     

                    I added the following property to my standalone-modeshape.xml configuration:

                    <system-properties>

                        <property name="com.arjuna.ats.arjuna.allowMultipleLastResources" value="true"/>

                    </system-properties>

                     

                    And my initial data load was successful.  However, I did see the following warning at the end of the load:

                    13:39:45,843 WARN  [com.arjuna.ats.arjuna] (EJB default - 1) ARJUNA012141: Multiple last resources have been added to the current transaction. This is transactionally unsafe and should not be relied upon. Current resource is LastResourceRecord(XAOnePhaseResource(LocalXAResourceImpl@2475b7ea[connectionListener=7453f2a5 connectionManager=3d3bff94 warned=false currentXid=null productName=H2 productVersion=1.3.173 (2013-07-28) jndiName=java:jboss/datasources/ModeshapeBinaryStoreDS]))

                    • 7. Re: Re: Re: ModeShape 4.1 and disk space
                      rhauch

                      If you were using your configuration, then I would have expected this since you had multiple caches using the same database (table). Please test with my fixes as mentioned above, and let us know how that goes.

                       

                      BTW, I've never heard about anyone needing to use the "com.arjuna.ats.arjuna.allowMultipleLastResources" property. Sounds like you may be trying to fix the symptom of another problem.

                      • 8. Re: Re: Re: ModeShape 4.1 and disk space
                        mhunter-aap

                        I agree about the "com.arjuna.ats.arjuna.allowMultipleLastResources" property.  I think it's unrelated.

                         

                        The configuration you gave is working, but I may be running into a GC memory issue.  Things start slowing down after loading about 18k nodes very quickly and then it begins to crawl.  About 10-15 nodes are inserted, and then it pauses in the console -- presumably there's some garbage collection taking place.  The heap is maxed out for both used and committed (~1024MB).

                        • 9. Re: Re: Re: ModeShape 4.1 and disk space
                          rhauch

                          My example configuration did not specify any eviction settings, which means everything is kept into memory and the JVM will eventually bog down. You can add under the "local-cache" element via:

                           

                               <eviction strategy="LRU" max-entries="1000"/>

                           

                          or see it in context. Be sure to use a value that is right for your JVM and machine. Too small and you'll be throwing things out too often and not fully utilizing your JVM; too large and it will bog down. Also, the larger your nodes (children and properties, excluding binary values and large strings), the fewer you'll be able to fit into memory.


                          Oh, and 1GB or RAM is probably pretty small for a database-like service. Both ModeShape and Infinispan use extra memory very well and make everything much faster, so if you can provide more.