-
1. Re: Using FileSystemBinary?
bcarothers Jun 16, 2011 10:11 PM (in response to d95sld95)It should be using FileSystemBinary by default, unless you set eagerFileLoading to true on the FileSystemSource. Could you post a stack trace from the OOM?
-
2. Re: Using FileSystemBinary?
d95sld95 Jun 17, 2011 8:03 AM (in response to bcarothers)Here is my configuration of the FileSystemSource repository. Maybe something is wrong in the configuration?
JcrConfiguration configuration = new JcrConfiguration(); configuration.repositorySource("store") .usingClass(FileSystemSource.class) .setDescription("The repository for our content") .setProperty("workspaceRootPath", "/home/nextgen/content") .setProperty("updatesAllowed", true); configuration.repository(repositoryId) .setSource("store"); try { // Start the ModeShape engine ... this.engine = configuration.build(); this.engine.start(); // Now get the JCR repository instance ... this.repository = this.engine.getRepository(repositoryId); } catch (Exception e) { this.repository = null; throw e; }
Below is the code that inserts the large file
// Insert a folder "video" and add a "abc.mp4" video file Node root = session.getRootNode(); // Create folder node Node videoNode = root.addNode("video", "nt:folder"); Node fileNode = videoNode.addNode("abc.mp4", "nt:file"); // Insert file Node resNode = fileNode.addNode ("jcr:content", "nt:resource"); resNode.setProperty("jcr:mimeType", "video/mp4"); File file = new File("/home/nextgen/abc.mp4"); Binary binary = (session.getValueFactory().createBinary(new FileInputStream(file))); resNode.setProperty("jcr:data",binary); session.save(); binary.dispose();
and here is the stacktrace as I receive the OutOfMemoryException. Heapsize is set to 512mb.
java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at org.modeshape.common.util.IoUtil.readBytes(IoUtil.java:66) at org.modeshape.graph.property.basic.AbstractBinaryValueFactory.create(AbstractBinaryValueFactory.java:229) at org.modeshape.graph.property.basic.AbstractBinaryValueFactory.create(AbstractBinaryValueFactory.java:55) at org.modeshape.graph.property.basic.AbstractValueFactory.create(AbstractValueFactory.java:123) at org.modeshape.jcr.JcrValueFactory.createBinary(JcrValueFactory.java:111) at org.modeshape.jcr.JcrValueFactory.createBinary(JcrValueFactory.java:45) at com.nextgen.core.repository.ModeShapeLargeFileInsertTest.testInsert(RespositoryTest.java:132) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:49) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
-
3. Re: Using FileSystemBinary?
d95sld95 Jun 17, 2011 8:05 AM (in response to d95sld95)Add the configuration property "eagerFileLoading=false" (which is the default according to the docs) does not change anything.
-
4. Re: Using FileSystemBinary?
bcarothers Jun 17, 2011 8:06 AM (in response to d95sld95)No, you're doing everything right. We don't have any provision for using a FileSystemBinary when writing to a repository, only when reading from a FileSystemSource. With hindsight, that looks a fairly important omission.
Would you mind creating a JIRA issue for this? I'm pretty confident that we could turn around a fix ASAP.
-
5. Re: Using FileSystemBinary?
d95sld95 Jun 17, 2011 9:04 AM (in response to bcarothers)I created JIRA issue MODE-1201
-
6. Re: Using FileSystemBinary?
bcarothers Jun 17, 2011 9:18 AM (in response to d95sld95)Thanks, Steen. We should be able to get this fix into the trunk by Monday.
-
7. Re: Using FileSystemBinary?
bcarothers Jun 18, 2011 9:58 PM (in response to bcarothers)I've got a pull request in for this at https://github.com/ModeShape/modeshape/pull/131. You can apply it locally if you're brave enough to build from trunk[1]. Thanks for the great description of the issue and the very helpful steps-to-reproduce. I've incorporated a very similar test to verify that this is no longer an issue once the patch is applied.
The patch still has to pass review before it gets added into trunk though, so it may or may not get in on Monday.
[1] - Actually, you don't have to be particularly brave to do this. Our trunk almost always compiles.
-
8. Re: Using FileSystemBinary?
rhauch Jun 20, 2011 9:27 AM (in response to bcarothers)I'll be merging that change into the 'master' branch this morning. Thanks for working on this, Brian, and thanks Steen for finding and reporting this in a very thorough manner! That helped a lot!
[1] - Actually, you don't have to be particularly brave to do this. Our trunk almost always compiles.
Our 'master' branch (aka, trunk) is very stable at this point. We do all our development in other branches, and merge to 'master' only when things are ready. So our 'master' branch not only almost always compiles, it's almost always very stable.
-
9. Re: Using FileSystemBinary?
rhauch Jun 20, 2011 4:21 PM (in response to rhauch)I've merged the changes into the 'master' branch, and resolved the issue.
If you want to try it, get the latest code and build locally, and the "2.6-SNAPSHOT" version will go into your local Maven repository. You can use it in your Maven application by then specifying "2.6-SNAPSHOT" in your POM. Let us know if you have any problems.
-
10. Re: Using FileSystemBinary?
d95sld95 Jun 21, 2011 12:25 PM (in response to rhauch)Thanks for the quick turnaround. I tried out the fix and it works well.
I noticed that the insert time (on my system) for a 3GB file using JCR Binary is about 167 seconds, but reading the file is about 88 seconds. Just copying the file (no JCR) using apache-commons IOUtils.copyLarge(InputStream, OutputStream) takes about 49 seconds.
I am not sure if I am doing anything wrong or if there are room for performance optimizations somewhere in the code?
This takes about 49 seconds
@Test public void copy() throws IOException { long begin = System.currentTimeMillis(); InputStream is = new FileInputStream(new File("/opt/vmware/Windows 7 x64/Windows7x64.jpg")); OutputStream os = new FileOutputStream(new File("/home/steen/vm.vm")); long copied = IOUtils.copyLarge(is, os); System.out.println("Total time: " + (System.currentTimeMillis() - begin) + " to copy " + copied + " bytes"); }
-
11. Re: Using FileSystemBinary?
rhauch Jun 21, 2011 1:25 PM (in response to d95sld95)Glad it worked. We're doing a few more things than the copy utility, including writing the file to a temporary file before moving it over any existing file (to handle any error conditions during reads; we don't want to corrupt the file that's there if there's an error reading the new binary value). Also, we're not using Apache Commons' IOUtils, and our utility is using a smaller byte buffer. Not sure how much difference that makes.
-
12. Re: Using FileSystemBinary?
bcarothers Jun 25, 2011 12:14 PM (in response to d95sld95)Steen,
By any chance, is your /tmp directory on a different filesystem than where your FileSystemSource.repositoryRootPath is located? Even if it's on the same HDD, being on a different filesystem would make a big difference atm. I'm profiling some of the impact now, but that could explain the very large discrepancy.
-
13. Re: Using FileSystemBinary?
bcarothers Jun 25, 2011 8:26 PM (in response to bcarothers)The more I think about this, the more I think that we're not quite doing this right. As Randall noted above, our current algorithm for updating file content goes like this:
1. Write the content to a temp file in java.io.tmpdir to make sure that we have a safe copy of the data
2. Delete the existing target file (if it exists)
3. Rename the temp file to the target file
This isn't the worst solution, but it could be improved. First, if java.io.tmpdir happens to point to a different filesystem than the target file is on, the rename turns from a call to File.renameTo() into another file copy and delete. I'm pretty sure that's what Steen is seeing above, because I get roughly equivalent performance on my MBP (with only one filesystem) whether I copy a 3G file directly with Commons IO or ModeShape's FileUtil or whether I write the 3G file into a file system connector.
I've opened MODE-1206 to describe this and will submit a patch that allows users to specify the temporary directory that is used, allowing them to keep everything on one filesystem.
I added a pull request at https://github.com/ModeShape/modeshape/pull/132.
-
14. Re: Using FileSystemBinary?
dmitry.zhuravlev May 10, 2012 9:43 AM (in response to d95sld95)As I understand you are rejected this solutions. If so, why MODE-1201 marked as "Closed"? This problem still exist in modeshape 2.7. Please provide some patch for this problem to 2.x modeshape versions.