6 Replies Latest reply on Nov 7, 2007 10:44 AM by jahlborn

    remoting streams

    jahlborn

      Earlier this year I had an email exchange with Tom Elrod about the jboss remoting implementation of an InputStream. I had identified a variety of weaknesses in the implementation, and, more importantly, I wanted to get Tom's feelings on replacing it with a package I was working on. At the time, my company was working on open sourcing the package, but for a variety of reasons, this did not actually happen until very recently. Long story short, the http://openhms.sourceforge.net/rmiio/ package has now been open sourced under the LGPL, but Tom no longer works at JBoss. I was wondering if there are any other developers at JBoss on the remoting team that might be interested in working with me to look at integrating the package.
      thanks,
      -james

        • 1. Re: remoting streams
          dmlloyd

          The Remoting 3 prototype has a generic streaming mechanism that's pretty similar to what you propose. Basically any object that is recognized as a stream type is replaced with a proxy that moves the data over the wire. This way you can create custom stream types beyond the predefined type, and a single request can contain multiple streams; data can be pushed from server to client (e.g. pass an OutputStream as part of the request), or pulled from client to server (e.g. pass an InputStream)

          The predefined stream types are defined here:

          http://anonsvn.jboss.org/repos/sandbox/david.lloyd/remoting3/api-proto/src/main/java/org/jboss/cx/remoting/stream/

          Also, Reader/Writer and Input/OutputStreams will be supported by default.

          Please have a look over this material and let me know if there are any key features that are missing... specifically if you have any thoughts about things like flow control, or any insights based on your implementation.

          • 2. Re: remoting streams
            jahlborn

            So, a couple of things jump out at me initially. the first is that it is a very interesting choice to put the iterator-like API beneath the stream API. the rmiio library has a remote iterator API which is built on top of the streaming API. the choice made in the remoting impls leads to some performance consequences. every remote retrieval operation for inputstream requires two remote calls (hasNext and next). additionally, for the collection based impls (IteratorObjectSource), every object retrieval requires two remote calls! since rmiio implements the iterator api on top of the streaming api (arguably more complicated), multiple objects can be retrieved in one call. and, there is no remote "hasNext" equivalent, just "read" which merely returns null when EOD is reached. lastly, since rmiiio uses streams, over-the-wire compression can be utilized to dramatically reduce the data transmission.

            the other big thing that stuck out to me is the lack of any idempotent guarantees. it's possible this is provided by the underlying framework, but, if not, it's a significant weakness. when systems get burdened, it's not unlikely for a given call to fail. looking at these implementations alone, that means lost data. the rmiio implementations allow the current read/write operation to be infinitely retried.

            another problem that's more related to remoting as a whole (at least in previous remoting versions), is the lack of an interface similar to java.rmi.server.Unreferenced. this becomes more important for things like streams which can be holding system resources which are relatively scarce (open file handles, database connections, etc). it's not uncommon for a client to die unexpectedly before the "close" method is invoked on the server. this will leave the stream resource open indefinitely. the rmiio implementations utilize the Unreferenced API so that when used with RMI proper, server resources will eventually be reclaimed if clients die before calling the remote "close".

            one last minor point is that i see no support for duplicate close calls (again, this could be managed elsewhere by the underlying framework). often, with appropriate usage of finally blocks, you may attempt to close a remote source multiple times. since the first successful call often shuts down the remote server, this can result in lots of annoying exceptions on the client side (because the source no longer exists when the second call goes through).

            • 3. Re: remoting streams
              dmlloyd

               

              "jahlborn" wrote:
              the choice made in the remoting impls leads to some performance consequences. every remote retrieval operation for inputstream requires two remote calls (hasNext and next). additionally, for the collection based impls (IteratorObjectSource), every object retrieval requires two remote calls!


              Ah, but the secret here is that you don't make a network round-trip for each method call. Every time an item or a batch of items is sent, an indicator can easily be added to signify that there are more items (or not). This is because each stream handler has its own wire implementation. So it's actually more efficient than RPC-style invocations (in the spirit of Remoting 3 in general, which seeks to move away from the RPC style).

              Not only does this mean that you don't make the extra remote request for "hasNext", but you can actually have higher throughput overall since the stream handler can forward multiple objects at one time rather than making a round trip for each one.

              "jahlborn" wrote:
              the other big thing that stuck out to me is the lack of any idempotent guarantees. it's possible this is provided by the underlying framework, but, if not, it's a significant weakness.


              Yes, the idea is that reliability is provided by the framework, and the user need not worry about it. If a request fails irreparably, a RemotingException is thrown.

              "jahlborn" wrote:
              another problem that's more related to remoting as a whole (at least in previous remoting versions), is the lack of an interface similar to java.rmi.server.Unreferenced. this becomes more important for things like streams which can be holding system resources which are relatively scarce (open file handles, database connections, etc). it's not uncommon for a client to die unexpectedly before the "close" method is invoked on the server. this will leave the stream resource open indefinitely.


              This is up to the individual stream handlers, however for the default stream handlers the close() method will be called when the remote side either (a) closes the stream explicitly or (b) lets it die unreferenced while still open. It is possible that there is a use case to handle these situations differently, but so far I know of no such case.

              "jahlborn" wrote:
              one last minor point is that i see no support for duplicate close calls (again, this could be managed elsewhere by the underlying framework).


              Every usage of close() uses the java.io.Closeable interface, which specifies that "If the stream is already closed then invoking this method has no effect". I do intend to keep those semantics.

              • 4. Re: remoting streams
                jahlborn

                It sounds like a lot of the interesting code is not in the classes you originally referenced. where is the code that handles this stuff (the batching, retrying, close handling)?

                • 5. Re: remoting streams
                  dmlloyd

                  This is all in the implementation classes. I posted the interfaces because I was explaining the "what", not the "how". There's not a lot of code at all in the API. The implementation is still in flux, but what there is can be found here:

                  http://anonsvn.jboss.org/repos/sandbox/david.lloyd/remoting3/core-proto

                  Though I rewrite large chunks of it on a pretty regular basis at this point.

                  Also the JIRA bug tracker has some milestone tasks.

                  • 6. Re: remoting streams
                    jahlborn

                    I wasn't able to get much out of the implementation classes, probably because i don't know enough about the how the framework as a whole works. Anyway, all of the features i mentioned in the long post have been implemented in the rmiio package, and have been heavily used/tested in our company. so, if you are interested in integrating an mature, existing package instead of writing one from scratch, I'd be happy to work with you to do that.