This article is applicable only for ModeShape 2.x, and does NOT apply to the new architecture in ModeShape 3.x. See the 3.x documentation for more details.
Once you have started to build your application, you will probably start thinking about ways to make it faster. Hopefully, you have already chosen the connectors for your repository based on the heuristics in the "How To Select the Right Connectors" article. Regardless, there are some easily modifiable settings and guidelines for each connector and for your JCR repository configuration itself that can help boost ModeShape's performance in your environment.
In general, ModeShape attempts to ship with default settings that provide the best performance unless doing so would break backwards compatibility. This guide will identify the settings that improve performance and attempt to note how changing them may or may not break backwards compatibility for your application.
Configuring the JCR Repository for Performance
The behavior of the ModeShape JCR repository is configured through the repository options. Please refer to the ModeShape Reference Guide for details on how to change repository options in your configuration. Multiple examples are provided.
Consider changing the following options to improve performance:
- INDEX_READ_DEPTH - This setting is similar to READ_DEPTH, but is only used when building or rebuilding indexes. The default value of "4" will be sensible for most applications.
- PERFORM_REFERENTIAL_INTEGRITY_CHECKS - This setting has a default value of "true", indicating that every node removal will be checked against an index to ensure that no REFERENCE properties still refer to the removed node. Setting this value to "false" will improve the performance of Session.save() and Node.remove() calls, but will essentially make any REFERENCE properties behave as WEAKREFERENCE properties. If your application uses REFERENCE properties or makes use of any feature that uses REFERENCE properties (like versioning), you should leave this setting set to "true".
- QUERY_EXECUTION_ENABLED - This defaults to true, meaning that JCR queries are supported. However, supporting queries requires ModeShape to index all content. If you are certain that your application will not be using queries, set this to false to reduce the I/O and CPU load generated by ModeShape.
- QUERY_INDEXES_REBUILT_SYNCHRONOUSLY - If the query indexes need to be rebuilt on startup based on the REBUILD_QUERY_INDEX_ON_STARTUP option, this setting controls whether the JCR repository will allow logins before the rebuild completes. The default value for this setting, "true", indicates that the JCR repository will not allow logins until the rebuild has completed. This can cause the JCR repository to appear unresponsive for a very long time if the repository contains a lot of data. Changing the value of this setting to "false" will cause the repository to be available as soon as the rebuild process begins, greatly decreasing apparent startup time. However, queries that are performed while the indexes are being rebuilt are likely to generate incomplete results.
- QUERY_INDEXES_UPDATED_SYNCHRONOUSLY - This setting controls whether method calls that modify data (e.g., session.save(), item.remove()) will return before the query indexes are fully updated. This defaults to true, meaning that your data modifications will be reflected in results of queries as soon as the method returns. Of course, this means that the method will take longer to return. If your application is tolerant of slightly stale queries, set this value to false to make your method calls return faster. The query indexes will still be updated, but the update will now be asynchronous.
- READ_DEPTH - This value controls whether ModeShape eagerly loads descendants each time that a node is read. The default value is "1", indicating that no descendants should be read. However, if your repository access patterns frequently cause reads of subgraphs, this can be set to a higher number, slowing the read time for the first time, but greatly reducing subsequent read times for descendants. The eagerly read descendants are essentially prefetched into the session's node cache, but it is important to note that the cache is flushed each time that the session is saved or refreshed.
- REMOVE_DERIVED_CONTENT_WITH_ORIGINAL - This setting defaults to "true", indicating that every node removal will require a check to see if any content was derived from that node through the execution of a sequencer. If any derived nodes are found, they will be deleted. If your application does not use sequencers, there will not be any derived content and you can skip this extra step by changing the value of this setting to "false". If your application does use sequencers, you may still be able to change the value of this setting to "false" and clean up extraneous derived content through a custom process that you can run during periods of low usage.
- REBUILD_QUERY_INDEX_ON_STARTUP - The default value of this setting, "ifMissing" only rebuilds the query index on startup if the entire index is missing. Setting this value to "always" will cause the entire index to be rebuilt each time ModeShape starts up. Since rebuilding the query index can take a very long time, don't even consider setting this value to "always" unless your repository is based on connectors that allow external applications to modify underlying data. In that case, you may want to rebuild the indexes to make sure that they reflect any external changes.
- VERSION_HISTORY_STRUCTURE - This setting controls whether all version history nodes will be stored directly under /jcr:system/jcr:versionHistory or whether they will be nested in a hierarchical structure. The default value of "hierarchical" will provide improved performance for most repositories.
Configuring the Connectors for Performance
Each of the connectors has unique configuration properties, some of which can be modified to improve performance. The ModeShape Reference Guide has complete documentation for the properties on each connector, but this guide will describe the properties that affect performance.
In-Memory Connector
The in-memory connector does not have any properties that can be modified to improve performance.
Infinispan Connector
The primary way to improve performance for the Infinispan connector is to tune the underlying Inifinspan configuration. Despite the documentation in the Reference Guide, you may be able to use DIST_ASYNC for your replication mode if your application is tolerant of different nodes in your ModeShape cluster having slightly out-of-date data.
JPA Connector
There are many properties that can be modified to improve the performance of the JPA connector. Consider changing each of the following properties:
- autoGenerateSchema - This controls whether the JPA connector will attempt to drop and recreate your database schema on startup, attempt to update your schema, or validate your schema. For production ModeShape instances, the only recommended value for this property is "disabled". Production ModeShape schemas should be generated with the DDL generator that ships with ModeShape. The schema should then be given to a database administrator for review, index tuning, filespace allocation/extent tuning, and installation onto the production database. In addition to keeping ModeShape from dropping all of your data, setting this property to "disabled" allows ModeShape to start without waiting for additional database schema checks to occur.
- cacheConcurrencyStrategy - This value defaults to "read-write". If your repository happens to not allow updates, set this to "read-only". If you're using the Infinispan cache provider (q.v., the cacheProviderClassName property), set this to "transactional". Otherwise, leave the setting as "read-write".
- cacheProviderClassName - This value of this property should be set to the name of a Hibernate cache provider. The default value of "null" means that no Hibernate second level cache should be used, which is quite detrimental for performance. If your application is clustered, use the Infinispan cache provider ("org.hibernate.cache.infinispan.InfinispanRegionFactory" or "org.hibernate.cache.infinispan.JndiInfinispanRegionFactory") and tune Infinispan appropriately. Alternatively you could use EHCache by specifying one of the EHCache region factories ("net.sf.ehcache.hibernate.EhCacheRegionFactory" or "net.sf.ehcache.hibernate.SingletonEhCacheRegionFactory"). If your application is not clustered or for testing purposes, you could use the Hashtable cache provider ("org.hibernate.cache.HashtableCacheProvider"), even though we recommend using Infinispan or Ehcache for production. (ModeShape 2.6 uses Hibernate 3.5, but if you're using a version of Hibernate earlier than 3.3, use the CacheProvider implementation class name instead.)
- cacheManagerLookup - (Added in ModeShape 2.7) This value corresponds to the location in JNDI where the Infinispan cache manager can be found. This should only be set when setting the cacheProviderClassName to "org.hibernate.cache.infinispan.JndiInfinispanRegionFactory", and is typically set to "java:CacheManager/entity" with JBoss AS6 or AS7. For more information, see the Infinispan documentation.
- compressData - This setting defaults to "true", indicating that the serialized properties of each node should be compressed with GZip before being stored in the database. This reduces the size of the data stored in the database and hence the I/O between ModeShape and the database, but increases CPU load on the ModeShape server. You may want to experiment with setting this property to "false" to see if this performs better in your environment.
- largeValueSizeInBytes - Individual property values that exceed the size specified in this property will be stored outside of the node's row in the database. This allows other properties to be set and modified on the node without requiring the large properties to be rewritten each time. Large values are also shared between nodes. That is, if /node1 and /node2 each have properties with the same value, only one copy of the value will be stored in the database. The value will be preserved until no references to it exist. After that, it will be eligible for automatic deletion at a future time. Nodes with large values require two queries to load instead of one, so it is advisable to set this property high enough that most nodes will not have a large value. The default value of "1024" may be too low for the data in your environment.
It hopefully goes without saying that you may be able to tune your database to improve response time and thus ModeShape performance. Such tuning would be database specific and is outside the scope of this document.
Disk Connector
This connector has several properties that can be modified to improve performance:
- largeValueSizeInBytes - Much like the JPA connector, the disk connector will also store large properties in a separate area apart from the owning node. Also like the JPA connector, only one copy of each large value will be stored, even if several nodes have a property with the same large value. Storing a large value requires a two separate writes to disk, one for the large value itself and another for a file containing back-references from the large value to any nodes that are using the large value. These are in addition to the write for the node itself. Reading large values from disk also requires one extra read per large value. ModeShape lazily loads large values, only reading them from disk if and when they are accessed, so using this feature properly can greatly improve performance. The default value, "8192", may be too small for many repositories. Try setting this to larger values to see if performance is improved for your access patterns.
- lockFileUsed - This property defaults to "false" and should be set to "true" if and only if ModeShape is installed in a clustered environment. Setting this to "true" will cause a slight performance hit.
- nodeCachePolicy - The default value for this property indicates that no caching should occur. Currently, there is only one cache implementation, so use that or write your own. You can use ModeShape's cache implementation by creating a new instance of InMemoryNodeCache$MapCachePolicy, setting the timeToLive property on the policy to something sensible for your repository, like 300 seconds, and then setting the nodeCachePolicy property to that cache policy object. This might sound complicated, but it only takes three lines of code. An XML-based example is also provide in the Reference Guide.
Setting the repositoryRootPath property to a location on the fastest disk possible will also have a large impact on performance. Solid state disks should provide a noticeable boost to this connector's performance.
File System Connector
The file system connector has several properties that can be modified to improve performance:
- customPropertiesFactory - Setting this property to any value other than the default will allow the connector to store extra properties for each node type, but will impact performance, as an the number of disk accesses to read or write a node will likely double.
- nodeCachePolicy - The default value for this property indicates that no caching should occur. Currently, there is only one cache implementation, so use that or write your own. You can use ModeShape's cache implementation by creating a new instance of InMemoryNodeCache$PathCachePolicy, setting the timeToLive property on the policy to something sensible for your repository, like 300 seconds, and then setting the nodeCachePolicy property to that cache policy object. This might sound complicated, but it only takes three lines of code. An XML-based example is also provide in the Reference Guide.
- temporaryStoragePath - When nodes in this connector are modified, the new version of all nodes in the transaction is first written to disk in the temporary area specified by the temporaryStoragePath property. After all nodes in the transaction are written to this area, then are moved to their final location when the transaction is committed, or deleted if the transaction is rolled back. If the temporary area and the final location of the node are on the same filesystem, the temporary nodes can be moved to their final location very efficiently with a call to File.renameTo(File). If the temporary nodes are on a different file system than their final location, the entire temporary node has to be re-read and re-written to the final location. In the case of large file, this can cause a very significant performance penalty. The default value of this property is "/tmp".
Setting the workspaceRootPath property to a location on the fastest disk possible will also have a large impact on performance. Solid state disks should provide a noticeable boost to this connector's performance.
JCR Connector
Modifying the defaultCachePolicy property will impact performance.
JDBC Metadata Connector
There is one property on the connector that will impact performance:
- nodeCachePolicy - The default value for this property indicates that no caching should occur. Currently, there is only one cache implementation, so use that or write your own. You can use ModeShape's cache implementation by creating a new instance of InMemoryNodeCache$PathCachePolicy, setting the timeToLive property on the policy to something sensible for your repository, like 300 seconds, and then setting the nodeCachePolicy property to that cache policy object. This might sound complicated, but it only takes three lines of code. An XML-based example is also provide in the Reference Guide.
Subversion Connector
There are two ways to improve Subversion connector performance. Deploying ModeShape on a server that has fast access to the Subversion repository will provide an enormous impact on the connector's performance. Deploying ModeShape on the same server that hosts the Subversion repository is ideal from a performance standpoint, but may not be practical for other reasons.
There is one property on the connector that will impact performance:
- nodeCachePolicy - The default value for this property indicates that no caching should occur. Currently, there is only one cache implementation, so use that or write your own. You can use ModeShape's cache implementation by creating a new instance of InMemoryNodeCache$PathCachePolicy, setting the timeToLive property on the policy to something sensible for your repository, like 300 seconds, and then setting the nodeCachePolicy property to that cache policy object. This might sound complicated, but it only takes three lines of code. An XML-based example is also provide in the Reference Guide.
JBoss Cache Connector
The JBoss Cache connector is only provide for legacy compatibility. Tuning JBoss Cache will provide the most performance improvements for this cache.
Example: nodeCachePolicy for in memory usage
Leveraging an in memory caching aproach for the nodeCachePolicy used by Disk Connector, File System Connector, JDBC Connector and Subversion Connector is quite easy.
All one has to do is to put this into the mode:source definition of the connector:
{code:xml}
<mode:cachePolicy jcr:name="nodeCachePolicy"
mode:classname="org.modeshape.graph.connector.base.cache.InMemoryNodeCache$MapCachePolicy"
mode:timeToLive="300"/>
{code}
In this example the caching is set to 300 secods (mode:timeToLive="300");
Comments