3 Replies Latest reply on Jun 13, 2013 12:20 PM by rhauch

    asynchronized custom binary store

    jay709

      Hi All,

       

      Could anybody please give me any suggestions how to implement my own binary store in asynchronize mode?

       

      We have data in AWS Glacier, when we send the get request, we have to wait for about 4 hours for the stream to be available to download. At that time, a notifiction message would come to say the stream is ready.

       

      Does ModeShape support the asynchronize getStream()?

       

      Currently I am thinking to temporaly save the files to local from Glarcier. When End user request that file, Modeshape checks if it is cached locally. If not, send info back to tell the user try 4 hours later. If it is cached, return the stream back.

       

      But I would like to know what is the opinion from modeShape view.

       

      Thanks,

       

      Jay

        • 1. Re: asynchronized custom binary store
          rhauch

          There might be a way you can do this within our current SPI and API.

           

          IIUC, client applications creating new binary values doesn't seem to be a problem. Rather, the challenge is that a ModeShape client application might get a property value from some node, and that value might be a BINARY value whose content is not yet available. Is this correct?

           

          If my assumption is correct, then one way to solve this might be for the GlacierBinaryStore implementation to return a different Binary implementation - perhaps FutureBinary. This FutureBinary might have a Future that clients can use to wait for the availability. (BTW, does it really make sense that a client might wait 4 hours for the content? If not, maybe the FutureBinary does not contain a Future and instead is simply a signal to the client that the value is not yet available, and to check back later.)

           

          Unfortunately, adding another Binary implementation class currently requires us to change some internal code that serializes and deserializes BinaryValue instances to and from our internal JSON documents that we use to represent node states. We could improve this so that it can go back to the BinaryStore to obtain the instance (upon deserialization), but it would require a little work on our part.

           

          Does this help at all?

          • 2. Re: asynchronized custom binary store
            jay709

            Thanks Randall, I can always denpend on you

             

             

             

            IIUC, client applications creating new binary values doesn't seem to be a problem. Rather, the challenge is that a ModeShape client application might get a property value from some node, and that value might be a BINARY value whose content is not yet available. Is this correct?

             

            Yes, That is correct from user's view. The content is in Glacier, but it is not available for user to download. After 4 hours since the request, the stream is ready. But it is only available for another 24 hours. After that, user has to start over again even if he want the same content. This is the Glacier rule.

             

             

            If my assumption is correct, then one way to solve this might be for the GlacierBinaryStore implementation to return a different Binary implementation - perhaps FutureBinary. This FutureBinary might have a Future that clients can use to wait for the availability. (BTW, does it really make sense that a client might wait 4 hours for the content? If not, maybe the FutureBinary does not contain a Future and instead is simply a signal to the client that the value is not yet available, and to check back later.)

             

            Yes, I agree nobody could wait all the way for 4 hours. But they could wait while doing something else, mean asynchronizing.

             

            In the upper layer, the user could get immediatly an error message that node value is only be available after 4 hours. But in the GlacierBinaryStore, it would interact with Glacier in background.

            It will send the request to Glacier, wait for 4 hours. Once it receive the notification from Glacier, it would download the stream somewhere for local cache pupose. And then the later same request for the same node value from upper layer user, ModeShape could return the stream right away same as other usual node value.

             

            For local cache pupose(or other cache mechanism), at least we need maintain the mapping info between the cached tempory node and the original node properity, by which the user could get exact content he want.

            But where is the best place to keep such info, in binaryStory or other?

            Or, appending new properity in the same node to setup such mapping?

             

            Rather, in a non-cache way, the GlacierBinaryStore could return the Glacier stream (not the locally cached version, while it is from Glacier) directly to the upper layer user 4 hours later after the user's initial request. But the user could only access it for 24 hours, like he requests directly to Glacier. It is better if ModeShape sends the stream ready notification to the upper layer user.(By GlacierBinaryStory or others?)

             

            Please let me know what you think,

             

            Thanks a lot,

             

            Jay

            • 3. Re: asynchronized custom binary store
              rhauch

              I don't think the GlacierBinaryStore should expose the Glacier stream. Instead, I think it should get the content from Glacier when it is ready and store it locally (likely in another BinaryStore) for however long it is configured to do so. After that, it would get removed from the other BinaryStore.

               

              Have you looked at the new CompositeBinaryStore? It might be possible to use a CompositeBinaryStore with a GlacierBinaryStore (e.g., "long-term") and another binary store (e.g., file system, database, or ISPN cache) for "short-term" storage. When creating the binary value for the first time, the user might be able to use a storage hint to tell the system to store it in short-term or long-term. Or, there might be some other logic (somewhere) that determines which binary values should be moved from short-term to long-term storage by moving it from one BinaryStore to the Glacier BinaryStorag. (If this latter capability is desirable, we'd have to think more about where this logic should be.)

               

              When a client makes a request to read a BINARY value, ModeShape would delegate to the CompositeBinaryStore which would then consult each of the underlying stores. Here, order is important, so I'd put the short-term BinaryStore first, and the long-term GlacierBinaryStore last. (There could even be multiple "short-terms".)

               

              BTW, if Glacier did return a custom BinaryValue implementation (e.g., AsynchronousBinaryValue?) that contained a Future<BinaryValue>, then the client could call "await" on that future and be notified when the underlying GlacierBinaryStore successfully obtained the value from Glacier (and loaded it into the short-term store).

               

              Alternatively, the AsynchronousBinaryValue could have methods to allow a client to register a callback that the GlacierBinaryStore would invoke when completed. However, this approach requires more interfaces and methods, and actually could be implemented on top of the Future approach.

               

              In both approaches, I'd expect the GlacierBinaryStore would keep the AsynchronousBinaryValue instances that are "in-process" and awaiting availability in Glacier. That way, if multiple requests came in for the same BinaryValue (e.g., same SHA-1), the GlacierBinaryStore would reuse and return the same AsynchronousBinaryValue instance. So, if client A asked for a particular BinaryValue, and 3hr 45min later client B asked for the same one, client B only has to wait 15 more minutes.

               

              WDYT?