5 Replies Latest reply on Mar 26, 2012 10:55 AM by cluxig

    How to store large binaries

    cluxig

      Hi,

      we have to store large binaries in a content repository using a DiskStore. If we submit InputStreams with very large content, greater than java heap space, we get an OutOfMemory Exception. This is caused by the AbstractBinaryValueFactory, which extracts the bytes from the InputStream (via IoUtil.readBytes) and tries to create an InMemoryBinary by default.

       

      We are using ModeShape 2.8.0.Final via JCR. I tried to provide our own ValueFactories (to provide another BinaryFactory), but this seems to be impossible, because your JcrRepository implementation makes a call in constructor like ExecutionContext.with(NamespaceRegistry). This call creates a new ExecutionContext without re-referencing the (patched) ValueFactories:

      public ExecutionContext with( NamespaceRegistry namespaceRegistry ) {
              // Don't supply the value factories or property factories, since they'll have to be recreated
              // to reference the supplied namespace registry ...
              return new ExecutionContext(getSecurityContext(), namespaceRegistry, null, getPropertyFactory(), getMimeTypeDetector(),
                                          getTextExtractor(), getClassLoaderFactory(), getData(), getProcessId());
      }
      

       

      So what happens: The StandardValueFactories are used again, providing the default BinaryFactory.

       

      My questions are:

      1. How to store large binaries in ModeShape?
      2. If you have no solution to 1. Is it possible to provide a custom value factory or a custom javax.jcr.Binary?

       

      Thanks in advance

        • 1. Re: How to store large binaries
          rhauch

          We should allow you to provide a custom BinaryFactory. Would you mind logging an issue so we can fix that in 2.8.1.Final?

           

          Unfortunately, providing your own javax.jcr.Binary implementation won't really work, since we expect the implementation to also implement some other interfaces. You could try implementing those, too, but I'm not sure what'll happen.

           

          BTW, 3.0 already does the right thing here. First of all, our javax.jcr.Binary values never store the content in-memory. When clients create a new Binary object (given an InputStream), we always stream that directly into persistent storage (the kind of storage is configurable, but defaults to storing the values on the file system keyed by SHA-1). Secondly, we will accept any javax.jcr.Binary implementation; yes, if it doesn't implement our Binary extension to javax.jcr.Binary, we'll stream it into our own storage. However, I don't think there's much advantage to using a custom Binary implementation in 3.x, since I think we're handling them much better than in 2.x.

          • 2. Re: How to store large binaries
            jonathandfields

            Hi, could you please point to some examples, docs, or  or code in 3.0 illustrating the use of the configurable binary storage? I am quite interested in this aspect of Modeshape, and would like to see if it fulfills my requirements. Thanks!

            • 3. Re: How to store large binaries
              rhauch

              I'll start a new discussion thread for the 3.0 handling of binaries.

              • 4. Re: How to store large binaries
                jonathandfields

                Great, thank you.

                • 5. Re: How to store large binaries
                  cluxig

                  Actually i do a trick by creating an EnhancedExecutionContext and overriding all with(...) methods. The custom BinaryFactory is used there, but the main goal could not reached by that.

                  Now i got a NotSerializableException, because the internal InputStream of my custom binary is (of course) not serializable. I used the DiskStore to store Nodes. But the DiskStore stores all the properties and nodes in Java ObjectStreams. So in that case only a simple type like byte[] could be serialized. And this array must be (of course) in memory.

                  Actually i don't see a solution to resolve the storing problem with ModeShape. Should we try the FileSystemStore here besides our custom Binary? Hopefully with ModeShape 3.0 we could handle that easily. Any further hints/ideas?