1 2 Previous Next 17 Replies Latest reply on Sep 6, 2013 7:27 AM by K. Bachl

    ModeShape 3.5: performance

    K. Bachl Novice

      Hi,

       

      we finally switched from ModeShape 2.8.1 to ModeShape 3.5.0. All works very well so far, however we have seen some slow rates in term of performance compared to 2.8 (we used the DiskStore).

      Our whole web-app stores its content within a modeshape JCR repo and therefore the page rendering times clearly can state the difference. With 2.8 we got on a static page ~120-160 ms while we now get about ~220-260ms for the same page in warmed up server. This increase of about 100ms is visible over all pages. This is now using the "org.infinispan.loaders.file.FileCacheStore" which we chosen after we've seen even worse numbers with "org.infinispan.loaders.bdbje.BdbjeCacheStore" and the worst: "org.infinispan.loaders.jdbm.JdbmCacheStore";

       

      Interestingwise especially the BdbJE shoult outperform the FileCacheStore by a magnitude according to infinispan docs. After some testing it turns out that the not very fast IO from the HD on our server might be the reason for this. As if I test it here on a FusionDrive (meaning on SSD part) the speed difference not only disappeas, but mode 3.5 is even faster than old mode 2.8.

       

      For testing purposes I completely switched to an pure in-memory and then the numbers were blazing, meaning the time for the test page down to ~80-100ms;

       

      So it seems the cachestore config is the vital part thats not going so wall here. Our current production json-config reads as follows:

       

      {
          "name": "wwRepo",
          "jndiName": "",
      
          "workspaces": {
              "default": "default",
              "allowCreation": true
          },
      
          "security": {
              "anonymous": {
                  "useOnFailedLogin": true
              }
          },
      
          "storage": {
              "cacheConfiguration": "cache-ms3.xml",
              "cacheName": "wwRepoCache",
              "transactionManagerLookup": "org.infinispan.transaction.lookup.GenericTransactionManagerLookup",
              "binaryStorage": {
                  "type": "file",
                  "directory": "/z-modeshape/appshop-mode3/jcr",
                  "minimumBinarySizeInBytes": 524288
              }
          },
      
          "sequencing": {
              "removeDerivedContentWithOriginal": true,
              "sequencers": {
                  "cndSequencer": {
                      "description": "CND Sequencer Same Location",
                      "classname": "CNDSequencer",
                      "pathExpressions": [ "default://(*.cnd)/jcr:content[@jcr:data]" ]
                  }
              }
          }
      }
      
      
      
      
      
      

       

      With infinispan as this:

       

      <infinispan
              xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
              xsi:schemaLocation="urn:infinispan:config:5.1 http://www.infinispan.org/schemas/infinispan-config-5.1.xsd"
              xmlns="urn:infinispan:config:5.1">
          <!-- Global settings shared by all caches managed by this cache container. -->
          <global/>
          <!-- Individually named caches. -->
          <namedCache name="wwRepoCache">
              <transaction
                      transactionManagerLookupClass="org.infinispan.transaction.lookup.GenericTransactionManagerLookup"
                      transactionMode="TRANSACTIONAL"
                      lockingMode="OPTIMISTIC"/>
              <loaders
                      passivation="false"
                      shared="false"
                      preload="true">
      
      
                  <loader class="org.infinispan.loaders.file.FileCacheStore"
                          fetchPersistentState="false"
                          purgeOnStartup="false">
                      <!-- See the documentation for more configuration examples and flags. -->
                      <properties>
                          <!-- We have to set the location where we want to store the data. -->
                          <property name="location" value="/z-modeshape/appshop-mode3/filecache"/>
                      </properties>
        
                  </loader>
              </loaders>
          </namedCache>
      </infinispan>
      
      
      

       

       

      So, this config is rather basic and I wonder how can we configure modeshape and infinispan that it performs well by storing the cache by its full size in memory and also in a cachstore, while not querying the cachstore when data needed as it should be all in memory though we still dont loose all data when system is shutdown?

       

      Regards,

       

      KB

       

      PS: the write times dont really matter so far, as we write very very rare new data (which then is usually a whole new workspace that gets imported or cloned over an existing one);

        • 1. Re: ModeShape 3.5: performance
          Horia Chiorean Master

          Unfortunately in ISPN 5.x (which ModeShape uses) the FileCacheStore is known to be very poor performing with lots of issues (e.g. https://issues.jboss.org/browse/ISPN-2806). Until we'll be able to switch to Infinispan 6 (which has an updated and hopefully better performing implementation), I can think of the following things you can try:

           

          • replace altogether the FileCacheStore with another cache store implementation. For example (as you mentioned above) the BdbjeCacheStore should be a lot faster
          • ISPN will store, by default, data both in memory and in the cache store. One of the settings which controls how much data is stored in memory is the eviction strategy: https://docs.jboss.org/author/display/ISPN/Eviction and Eviction Examples - Infinispan 6.0 - Project Documentation Editor. In ModeShape there are 2 places where you can tweak this configuration: via the global ISPN configuration (cache-ms3.xml above) and at a workspace-level, by providing a custom ISPN configuration file which can be used by the workspace caches. By workspace level I mean the fact that you can add a "cacheConfiguration" element to the "workspaces" element of the JSON configuration:

                  "workspaces" : {

                     "default" : "default",

                     "allowCreation" : true,

                     "cacheConfiguration" : "default-workspace-cache-config.xml"

                    }

                  which will control how many nodes are stored/evicted from a JCR workspace. The configuration used by default is: https://github.com/ModeShape/modeshape/blob/master/modeshape-jcr/src/main/resources/org/modeshape/jcr/default-workspace-…

          • 2. Re: ModeShape 3.5: performance
            K. Bachl Novice

            Thank you for the answer Horia. I took a look at default-workspace-cache-config.xml and there is a comment in it:

             

            <!--

              Define the configuration for each workspace cache. This configuration will be used if there is no "namedCache"

              definition for the workspace name. Note that the names of caches following the convention

              "{repositoryName}/{workspaceName}".

              -->

             

            Can it be that in my case not even this strategy is choosen as I use a specific namedCache here?

             

            Because if the default would be used, then I at least, need to see better numbers for the firs 10 000 entries in the first 60 to 120 secs afterwards.

             

            Can you also maybe explain what 1 infinispan entry can be seen compared to ModeShape JCR?  - Is 1 entry  equal to 1nodevalue?

            • 3. Re: ModeShape 3.5: performance
              Horia Chiorean Master

              Can it be that in my case not even this strategy is choosen as I use a specific namedCache here?

               

              Because if the default would be used, then I at least, need to see better numbers for the firs 10 000 entries in the first 60 to 120 secs afterwards.

              The default-workspace-cache-config.xml, which only contains out-of-the-box a <default> section, is used as a "default" configuration for each workspace cache. The comment from the XML file simply means that, as an alternative (perhaps in a complex scenario), a user is able to define different caching behaviors for different workspaces of a repository. If one wished to do so, he/she would have to define (in an ISPN xml configuration file) named caches using the naming convention: <namedCache name=repository_name/workspace_name> and then pass this configuration file to the workspace attribute (as described above). However, if that is not the case, whatever is configured in the <default/> section will be used.

               

              It's also worth repeating that ModeShape actually uses 2 layers of ISPN caches: one is always in-memory and used by each individual workspace to cache its nodes (this is 99% of time transparent to users) and one is the repository-wide cache, which is required in the JSON configuration file and which represents the primary data store.

               

              In your particular case, you need to profile your application in order to determine exactly where the bottleneck lies. It might not be related in any way to eviction & passivation; I was only mentioning the above as "advanced" tuning options which you could try. Again, this is highly dependent on each application.

               

              Can you also maybe explain what 1 infinispan entry can be seen compared to ModeShape JCR?  - Is 1 entry  equal to 1nodevalue?

               

              Conceptually, each JCR node is transformed by ModeShape into a BSON document (JSON extended with binary value support) and this document is then stored in Infinispan as a cache entry. So 1 JCR node = 1 entry in the cache. However, based on the ISPN cache store instance, 1 cache entry may materialize (on disk for example) in any number of different ways. AFAIK this is highly dependent on the cache store implementation so you'd have to check the ISPN documentation of a particular cache store for more information.

              • 4. Re: ModeShape 3.5: performance
                K. Bachl Novice

                Intersting. I tested the caching behaviour and it seems that - at least in my config posted above - the eviction and expiration weren't respected somehow. When I change the xml for the name cache to include following

                <namedCache name="wwRepoCache">

                        <clustering mode="LOCAL"/>

                        <eviction maxEntries="1000000" strategy="LIRS"/>

                        <expiration lifespan="-1" maxIdle="3600000"/>

                ....

                 

                the numbers immediately went up dramatically, meaning the time got from "worse than" modeShape 2.8 went to "way better" than mode 2.8;  One thing that I noticed is that if setting the maxEntries in eviction too high it dramatically will lead to "hickups" in the application cycle. So if you set it to e.g.: 100 000 000 instead of the above 1 000 000 then the response rythm looks like: 90ms - 90ms - 700ms - 90ms - 90ms - 700ms ....

                 

                If the size is down to 1 000 000 then rythm is: 95ms - 95ms - 100 ms - 95ms - 95ms - 100ms ....

                 

                Seems as infinispans doesnt like too big numbers. I also tried to make the cache async, however, if I then still use the FileCacheStore I sometimes get weird JCR exceptions at workspace import and cloning. If I change the cacheStore to BerkeleyDB or JDBM this doesn't seem to happen; In either way will put my new config into production today to see how it performs in "realy" (tm)

                • 5. Re: ModeShape 3.5: performance
                  K. Bachl Novice

                  <namedCache name="wwRepoCache">

                          <clustering mode="LOCAL"/>

                          <eviction maxEntries="1000000" strategy="LIRS"/>

                          <expiration lifespan="-1" maxIdle="3600000"/>

                  ....

                   

                  WARNING: DO NOT USE THIS CONFIGURATION!

                  it turns out that the modeshape jcr will lose unused data after 3600000 ms as infinispan will empty the entries and therefore modeshape looses data;

                   

                  Back to topic:

                   

                  I now changed to BdbJE and have it in production. It works a slight bit faster than filecachestore, which is still big way slower than modeshape 2.8 used to be

                   

                  Interestingwise, I put a config without a store into prouction to test for a few minutes. It turns out that the performance practically is not strong effected by the cacheStore itself, meaning even in pure memory mode the performance was very very bad! This, however, only occurs on our production machine - if we use it on our developement machine the performance is way better (compare 250ms vs. 70ms for a siimple test-page).

                   

                  How can this happen?

                  I somehow fear that it might be the production setup with firewall and/ or multiple ip's on the machine.

                  Are there any certain VM options known to cause trouble?

                  How is modeShapes default for jgroup and what might help there?

                   

                  Any help would be really appreciated!

                  • 6. Re: ModeShape 3.5: performance
                    Horia Chiorean Master

                    Unfortunately there aren't any "silver-bullet" JVM settings which I can recommend for any given setup. This performance bottleneck is very context (environment) specific.

                     

                    However, I can think of the following things you could try:

                    • assuming it's a webapp, make sure you isolate the tier where the problem is occurring. Check/monitor request/response round trip latency and make sure the bottleneck is on the server (ModeShape) and not on the client and/or network.
                    • if the persistence tier is the problem, you really need to profile it and analyze where on the server the bottleneck is occurring. One very easy way you can do this is to attach VisualVM (http://visualvm.java.net/) to the server JVM and monitor: memory consumption, thread activity, CPU usage. This alone will tell you if you're dealing with a CPU/IO bottleneck, excessive memory consumption/large GC pauses, thread starvation etc.
                    • JGroups is only relevant in a clustered setup. Based on the configurations you listed in the first post, you don't seem to have ModeShape clustering configured. If you want to run in a cluster, you need to a) configure ModeShape in cluster mode and b) configure the ISPN cache in either replicated/distributed mode. If you were running in a cluster however, both Infinispan and ModeShape will use JGroups for inter-node communication. There isn't any JGroups configuration "we recommend", but for testing we use this file: https://github.com/ModeShape/modeshape/blob/master/modeshape-jcr/src/test/resources/config/jgroups-test-config.xml
                    • 7. Re: ModeShape 3.5: performance
                      Randall Hauch Master

                      You do NOT want to use eviction or expiration in your normal Infinispan cache configuration, because this will definitely throw persisted data away!

                       

                      The only place you should be using eviction and/or expiration is in the workspace cache configuration (e.g., the "org/modeshape/jcr/default-workspace-cache-config.xml" file on the classpath). It is okay to do this here because the workspace cache is a more traditional in-memory cache that simply fronts the backend persistent storage (e.g., the normal Infinispan cache). If anything gets thrown out of the workspace cache, it is simply reloaded from the normal Infinispan cache that contains the persisted content.

                       

                      When you store content on a slow hard drive (using any of the file-based cache stores), then you are going to have slower I/O. I'd suggest getting a faster drive, since you've already said that ModeShape is much faster. You may also consider using the JDBC cache store (probably string-based form) that stores the content in a remote database. Normally, this will not be very fast, but in your case with a slow local drive it may actually be faster (assuming the network and the remote database are fast).

                       

                      It is difficult to say what is going on with the production machine, which appears slow even when running in-memory. Running in-memory will always be faster than using a cache store, and you saw this yourself earlier, when you said:

                      For testing purposes I completely switched to an pure in-memory and then the numbers were blazing, meaning the time for the test page down to ~80-100ms;

                      Horia's suggestion to use VisualVM is great -- at this point, you need to know what's going on in that machine.

                      • 8. Re: ModeShape 3.5: performance
                        K. Bachl Novice

                        Thank you for your answers Horia and Randall, I really appreciate your time

                         

                        I did what you suggested and got some numbers fresh from our running live server. Whats interesting is that it seems our live server is hit by it compared to our dev as the live server got traffic that our dev hasn't, but heres the CPU-Burn:

                         

                        Bildschirmfoto 2013-09-03 um 18.06.45.png

                         

                        You can see that 11% is burned by LinkedBlockingQ'.poll and jcr.bus.RepositoryChangeBus..;

                        And that concurrent lock is also bad... where LinkedTransferQueue has 28% - seems grizzly here, not sure about what that can be.

                         

                        I also got some metrics regarding my app:

                         

                        Bildschirmfoto 2013-09-03 um 18.19.20.png

                         

                        as well as

                         

                        Bildschirmfoto 2013-09-03 um 18.22.56.png

                         

                         

                        So basically we loose 26% of all wall clock time to org.brixcms.jcr.SessionBehavior.wrap, a rather simple usage:  https://github.com/brix-cms/brix-cms/blob/master/brix-core/src/main/java/org/brixcms/jcr/SessionBehavior.java

                         

                        If using modeshape 2.8 this timeloss wasnt that bad AFAIK, here are some old shots with mode-2.7 at that time:

                         

                        Bildschirmfoto 2013-09-03 um 18.29.19.pngBildschirmfoto 2013-09-03 um 18.29.00.png

                        Bildschirmfoto 2013-09-03 um 18.28.25.png

                        Bildschirmfoto 2013-09-03 um 18.31.00.png

                         

                        You see here that under Mode 2.x we had a way better burn rate (3% CPU, and 0,67% WallClock CPU Time vs. 11% and 11,7% WallClock CPU Time), and that no real lock- and race conditions where met. Any ideas?

                         

                        PS: results were taken by probing from newrelic.com at our live system. Only threads/ functions having an impact are shown.

                        • 9. Re: ModeShape 3.5: performance
                          Horia Chiorean Master

                          Regarding RepositoryChangeBus#ChangeSetDispatcher : the current code does indeed use a BlockingQueue.poll, but this code is executed asynchronously from the main repository thread. Nonetheless, I did some research (http://supercoderz.in/2012/02/04/using-linkedblockingqueue-for-high-throughput-java-applications/) and it seems that for CPU load, blockingQueue#poll isn't the best option. I'll open a JIRA issue and try to see if we can improve the code there.

                           

                          From your application's usage, I see that out of the 28%, 11% (almost half) is spent in MenuNode$1#canWrap, so you'd have to look into that further. It would be great if the tool you are using could go deeper into the stack traces, to actually show ModeShape & Infinispan calls.

                          • 10. Re: ModeShape 3.5: performance
                            K. Bachl Novice

                            Hello Horia,

                             

                            got some deeper insights. These data is the result from 2829 samples collected over some time. It also has the granularity visible, mean we now where the time is spent, see here:

                             

                            Bildschirmfoto 2013-09-05 um 03.11.38.png

                             

                            Bildschirmfoto 2013-09-05 um 03.12.59.png

                             

                            I'm further investigating this. Let me know what you think about this. All data here is by using modeShape 3.5 with Berkley DB JavaEdition;

                            • 11. Re: ModeShape 3.5: performance
                              K. Bachl Novice

                              PS: I've spent some time looking at brix's code, much where the trouble seems to coming from is code like this:

                               

                              public void load(BrixNode node) {
                                      detach();
                                      if (node.hasProperty(PROP_INNER_CONTAINER_STYLE_CLASS)) {
                                          setInnerContainerStyleClass(node.getProperty(PROP_INNER_CONTAINER_STYLE_CLASS).getString());
                                      }
                                      if (node.hasProperty(PROP_OUTER_CONTAINER_STYLE_CLASS)) {
                                          setOuterContainerStyleClass(node.getProperty(PROP_OUTER_CONTAINER_STYLE_CLASS).getString());
                                      }
                                      if (node.hasProperty(PROP_SELECTED_ITEM_STYLE_CLASS)) {
                                          setSelectedItemStyleClass(node.getProperty(PROP_SELECTED_ITEM_STYLE_CLASS).getString());
                                      }
                                      if (node.hasProperty(PROP_ITEM_WITH_SELECTED_CHILD_STYLE_CLASS)) {
                                          setItemWithSelectedChildStyleClass(node.getProperty(PROP_ITEM_WITH_SELECTED_CHILD_STYLE_CLASS)
                                                  .getString());
                                      }
                                      if (node.hasProperty(PROP_MENU)) {
                                          setMenuNode((BrixNode) node.getProperty(PROP_MENU).getNode());
                                      }
                                      if (node.hasProperty(PROP_START_AT_LEVEL)) {
                                          setStartAtLevel((int) node.getProperty(PROP_START_AT_LEVEL).getLong());
                                      }
                                      if (node.hasProperty(PROP_RENDER_LEVELS)) {
                                          setRenderLevels((int) node.getProperty(PROP_RENDER_LEVELS).getLong());
                                      }
                                  }
                              
                              

                               

                              This isn't something special, IMHO. This above code (or similar one) can be run dozens of time for a single page rendering - and that of course will happen in parallel. Can it be that because of some kind of transaction isolation that reads are also somehow get locked as there might be a write somewhere?

                               

                              In the pov from our CMS, the write's to the JCR are really exotic, meaning write performance isn't an issue. The massive parallel reads are what we need, and IMHO, infinispans numbers for that kind of scenario are magnitude over what I even would dream of: http://infinispan.blogspot.de/2010/02/infinispan-as-local-cache.html

                               

                              Is there a way to completely disable transactions for the infinispan part? I tried to disable it, but I failed;

                              • 12. Re: ModeShape 3.5: performance
                                Horia Chiorean Master

                                Before you tweak ISPN transactions, you should really look deeper into the stack traces so that you inspect calls from the org.infinispan.* package.

                                One thing which seems very odd to me about the above images is that, for example, calls to org.modeshape.jcr.AbstractNodeType#isType are above calls to org.brixcms.*. In other words, it's as if ModeShape code is calling brix code and not the other way around.

                                This doesn't make sense to me. I would expect to see (from a stack trace perspective) org.brixcms.*  -> org.modeshape.* -> org.infinispan.*

                                 

                                Regarding transactions, you can configure the transactions in ISPN in a number of ways:

                                • you can test with DummyTransactionManagerLookup as a transaction manager - although this is not really recommended in a production environment (see below)
                                • you can configure a cache to be non-transactional:
                                     <namedCache name="transactional">

                                       <transaction transactionMode="NON_TRANSACTIONAL"/>

                                     </namedCache>

                                The documentation on ISPN transactions can be found here: https://docs.jboss.org/author/display/ISPN/Infinispan+transactions
                                You should always configure an ISPN transaction manager however, because ModeShape can't work without one. I haven't tested this, but I'm thinking you should be able to disable ISPN transactions combining the 2 settings from above (dummy transaction manager + NON_TRANSACTIONAL cache)
                                If you know for a fact your application doesn't require user/custom transactions, you should also add this to the ModeShape JSON configuration: "transactionMode" : "none". This will make ModeShape defer all transactional work to ISPN's transaction manager. It also means that ModeShape will *not be aware* of any transaction contexts outside of itself (ISPN).
                                • 13. Re: ModeShape 3.5: performance
                                  K. Bachl Novice

                                  @Horia: Regarding the images: those are ordered by "bottom up", means the one below calls the one above, as this is the only way to have the profiled times ordered by bad performance, else you have very long rows going down and ending up at the same calls;

                                  • 14. Re: Re: ModeShape 3.5: performance
                                    K. Bachl Novice

                                    I've now also taken some speed measurements to compare modeshape 2.8 vs. 3.4/ 3.5 (later both are equal); I expected 3.5 to be faster than 2.8, yet the opposite happened. Both are run against the same app, only modeshape was swapped; Both face exact same workspaces; Apache JMeter was used to query all that data on a localhost setup, so that no external influence could happen and all numbers were taken after about 2-3 min of each test-cycle when the numbers settled on a level for at minimum of 30 seks;

                                     

                                    parallel threadsthroughput/ min - ModeShape 2.8throughput/ min - ModeShape 3.4/ 3.5
                                    426502550
                                    1626502500
                                    3226002400
                                    6425002350
                                    1282350-24002150

                                     

                                    throughput/ min = refers to delivered html pages per minute where those pages contain pure logic and data that is backed by modeshape JCR and no other DB calls are made; A set of 3 different pages was used where a big part came from the same outer nodes that put the layout on the page (simplified spoken);

                                     

                                    Whats interesting is that mode 2.8 only seems affected by available CPU and degrades much better when threads increase compared to 3.4/ 3.5; Mode 3.4/ .5 loosed momentum much faster even coming from an lower throughput level (Mode 2.8: 100% at 4 threads to 88,7% at 128 threads vs. Mode 3.4/3.5: 100% at 4 threads to  84% at 128 threads) ;

                                     

                                    ModeShape 2.8 was backed by diskStore and a org.modeshape.graph.connector.base.cache.InMemoryNodeCache$MapCachePolicy;

                                    ModeShape 3.4/ 3.5 was backed by BdbJE 5.0.84, infinispan-cachestore-bdbje 5.2.7.Final, Optimistic transaction setting with useSynchronization="true";

                                     

                                    Data was on a SSD (apple fusion dive) to minimize impact from storage device, and to ignore the performance issue I'm currently facing on our prod. server;

                                    1 2 Previous Next