14 Replies Latest reply on May 10, 2012 9:43 AM by dmitry.zhuravlev

    Using FileSystemBinary?

    d95sld95

      Using ModeShape with a FileSystemSource I am trying to add a Binary file of 3GB. I am getting a OutOfMemoryException. After a little investigation I believe the default implementation uses InMemoryBinary which load all bytes into memory. My heap was only 256M.

       

      Does anyone know how to enable the FileSystemBinary class instead of InMemoryBinary?

        • 1. Re: Using FileSystemBinary?
          bcarothers

          It should be using FileSystemBinary by default, unless you set eagerFileLoading to true on the FileSystemSource.  Could you post a stack trace from the OOM?

          • 2. Re: Using FileSystemBinary?
            d95sld95

            Here is my configuration of the FileSystemSource repository. Maybe something is wrong in the configuration?

             

                      JcrConfiguration configuration = new JcrConfiguration();
                      configuration.repositorySource("store")
                                 .usingClass(FileSystemSource.class)
                                 .setDescription("The repository for our content")
                                 .setProperty("workspaceRootPath", "/home/nextgen/content")
                                 .setProperty("updatesAllowed", true);
            
                      configuration.repository(repositoryId)
                                 .setSource("store");
            
                      try {
                                  // Start the ModeShape engine ...
                                  this.engine = configuration.build();
                                  this.engine.start();
            
                                  // Now get the JCR repository instance ...
                                  this.repository = this.engine.getRepository(repositoryId);
                       } catch (Exception e) {
                                  this.repository = null;
                                  throw e;
                       }
            
            

             

            Below is the code that inserts the large file

             

                      // Insert a folder "video" and add a "abc.mp4" video file
                      Node root = session.getRootNode();
            
                      // Create folder node
                      Node videoNode = root.addNode("video", "nt:folder");
                      Node fileNode = videoNode.addNode("abc.mp4", "nt:file");
            
                      // Insert file
                      Node resNode = fileNode.addNode ("jcr:content", "nt:resource");
                      resNode.setProperty("jcr:mimeType", "video/mp4");
                      File file = new File("/home/nextgen/abc.mp4");
                      Binary binary = (session.getValueFactory().createBinary(new FileInputStream(file)));
                      resNode.setProperty("jcr:data",binary);
                      session.save();
            
                      binary.dispose();
            

             

            and here is the stacktrace as I receive the OutOfMemoryException. Heapsize is set to 512mb.

             

            java.lang.OutOfMemoryError: Java heap space
                      at java.util.Arrays.copyOf(Arrays.java:2786)
                      at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
                      at org.modeshape.common.util.IoUtil.readBytes(IoUtil.java:66)
                      at org.modeshape.graph.property.basic.AbstractBinaryValueFactory.create(AbstractBinaryValueFactory.java:229)
                      at org.modeshape.graph.property.basic.AbstractBinaryValueFactory.create(AbstractBinaryValueFactory.java:55)
                      at org.modeshape.graph.property.basic.AbstractValueFactory.create(AbstractValueFactory.java:123)
                      at org.modeshape.jcr.JcrValueFactory.createBinary(JcrValueFactory.java:111)
                      at org.modeshape.jcr.JcrValueFactory.createBinary(JcrValueFactory.java:45)
                      at com.nextgen.core.repository.ModeShapeLargeFileInsertTest.testInsert(RespositoryTest.java:132)
                      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
                      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                      at java.lang.reflect.Method.invoke(Method.java:597)
                      at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
                      at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
                      at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
                      at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
                      at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
                      at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
                      at org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79)
                      at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71)
                      at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49)
                      at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
                      at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
                      at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
                      at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
                      at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
                      at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
                      at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:49)
                      at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
                      at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
                      at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
            
            • 3. Re: Using FileSystemBinary?
              d95sld95

              Add the configuration property "eagerFileLoading=false" (which is the default according to the docs) does not change anything.

              • 4. Re: Using FileSystemBinary?
                bcarothers

                No, you're doing everything right.  We don't have any provision for using a FileSystemBinary when writing to a repository, only when reading from a FileSystemSource.  With hindsight, that looks a fairly important omission.

                 

                Would you mind creating a JIRA issue for this?  I'm pretty confident that we could turn around a fix ASAP.

                • 5. Re: Using FileSystemBinary?
                  d95sld95

                  I created JIRA issue MODE-1201

                  • 6. Re: Using FileSystemBinary?
                    bcarothers

                    Thanks, Steen.  We should be able to get this fix into the trunk by Monday.

                    • 7. Re: Using FileSystemBinary?
                      bcarothers

                      I've got a pull request in for this at https://github.com/ModeShape/modeshape/pull/131.  You can apply it locally if you're brave enough to build from trunk[1].  Thanks for the great description of the issue and the very helpful steps-to-reproduce.  I've incorporated a very similar test to verify that this is no longer an issue once the patch is applied.

                       

                      The patch still has to pass review before it gets added into trunk though, so it may or may not get in on Monday.

                       

                      [1] - Actually, you don't have to be particularly brave to do this.  Our trunk almost always compiles.

                      • 8. Re: Using FileSystemBinary?
                        rhauch

                        I'll be merging that change into the 'master' branch this morning. Thanks for working on this, Brian, and thanks Steen for finding and reporting this in a very thorough manner! That helped a lot!

                         

                        [1] - Actually, you don't have to be particularly brave to do this.  Our trunk almost always compiles.

                        Our 'master' branch (aka, trunk) is very stable at this point. We do all our development in other branches, and merge to 'master' only when things are ready. So our 'master' branch not only almost always compiles, it's almost always very stable.

                        • 9. Re: Using FileSystemBinary?
                          rhauch

                          I've merged the changes into the 'master' branch, and resolved the issue.

                           

                          If you want to try it, get the latest code and build locally, and the "2.6-SNAPSHOT" version will go into your local Maven repository. You can use it in your Maven application by then specifying "2.6-SNAPSHOT" in your POM. Let us know if you have any problems.

                          • 10. Re: Using FileSystemBinary?
                            d95sld95

                            Thanks for the quick turnaround. I tried out the fix and it works well.

                             

                            I noticed that the insert time (on my system) for a 3GB file using JCR Binary is about 167 seconds, but reading the file is about 88 seconds. Just copying the file (no JCR) using apache-commons IOUtils.copyLarge(InputStream, OutputStream) takes about 49 seconds.

                             

                            I am not sure if I am doing anything wrong or if there are room for performance optimizations somewhere in the code?

                             

                             

                            This takes about 49 seconds

                            @Test
                            public void copy() throws IOException {
                                      long begin = System.currentTimeMillis();
                                      InputStream is = new FileInputStream(new File("/opt/vmware/Windows 7 x64/Windows7x64.jpg"));
                                      OutputStream os = new FileOutputStream(new File("/home/steen/vm.vm"));
                                      long copied = IOUtils.copyLarge(is, os);
                                      System.out.println("Total time: " + (System.currentTimeMillis() - begin) + " to copy " + copied + " bytes");
                            }
                            
                            • 11. Re: Using FileSystemBinary?
                              rhauch

                              Glad it worked. We're doing a few more things than the copy utility, including writing the file to a temporary file before moving it over any existing file (to handle any error conditions during reads; we don't want to corrupt the file that's there if there's an error reading the new binary value). Also, we're not using Apache Commons' IOUtils, and our utility is using a smaller byte buffer. Not sure how much difference that makes.

                              • 12. Re: Using FileSystemBinary?
                                bcarothers

                                Steen,

                                 

                                By any chance, is your /tmp directory on a different filesystem than where your FileSystemSource.repositoryRootPath is located?  Even if it's on the same HDD, being on a different filesystem would make a big difference atm.  I'm profiling some of the impact now, but that could explain the very large discrepancy.

                                • 13. Re: Using FileSystemBinary?
                                  bcarothers

                                  The more I think about this, the more I think that we're not quite doing this right.  As Randall noted above, our current algorithm for updating file content goes like this:

                                   

                                  1.  Write the content to a temp file in java.io.tmpdir to make sure that we have a safe copy of the data

                                  2.  Delete the existing target file (if it exists)

                                  3.  Rename the temp file to the target file

                                   

                                  This isn't the worst solution, but it could be improved.  First, if java.io.tmpdir happens to point to a different filesystem than the target file is on, the rename turns from a call to File.renameTo() into another file copy and delete.  I'm pretty sure that's what Steen is seeing above, because I get roughly equivalent performance on my MBP (with only one filesystem) whether I copy a 3G file directly with Commons IO or ModeShape's FileUtil or whether I write the 3G file into a file system connector.

                                   

                                  I've opened MODE-1206 to describe this and will submit a patch that allows users to specify the temporary directory that is used, allowing them to keep everything on one filesystem. 

                                   

                                  I added a pull request at https://github.com/ModeShape/modeshape/pull/132.

                                  • 14. Re: Using FileSystemBinary?
                                    dmitry.zhuravlev

                                    As I understand you are rejected this solutions. If so, why MODE-1201 marked as "Closed"? This problem still exist in modeshape 2.7. Please provide some patch for this problem to 2.x modeshape versions.