CompressingMarshaller causing improper socket closure
bpiper Dec 22, 2009 9:00 PMI'm using JBoss 5.1.0.GA with JBoss Remoting 2.5.2, and EJB 2 invocations over the default 'socket' transport with the UnifiedInvoker. In order to make use of CompressingMarshaller and CompressingUnMarshaller, I've just written a thin wrapper that overrides the zero-arg constructor to call the constructor that wraps another marshaller, passing InvocationMarshaller and InvocationUnMarshaller. The client application is a Swing-based application using the appropriate JBoss libraries. The bandwidth that I'm working with (which drove my colleagues and I to look at using compression) is 512 kbps, and the link latency is about 300 ms. Hopefully that's enough background info.
Firstly, the actual problem is not fatal... the application works fine with the compressing (un)marshaller enabled, which is to say that it's functionally correct. However, what I noticed is that while EJB calls that returned a large amount of data were quicker with compression enabled, calls that returned virtually nothing were actually significantly slower (far more so than you would expect from any fixed CPU overhead of using compression).
So after digging around with both client and server debugging and Wireshark, I found that the server socket that handles the remote invocations from the client is getting closed after an invocation, instead of blocking on a socket read, waiting for the next invocation (whenever that might be) as it would normally do with compression disabled. The reason it gets closed is that after it handles one invocation, the GZIPInputStream has read the trailer and considers everything to be finito (per a boolean variable called 'eos')... so when it then tries to do another read (the one that would wait for the next invocation normally), it throws an EOFException, the socket gets closed, and when the client tries to re-use that connection from its pool, it gets a RST, and has to then do a retry and create a new connection, which is costly when you're working with 300 ms latency. Of course, from the user's perspective, everything seems ok, but performance suffers for small invocations.
It just seems like GZIPInputStream isn't appropriate for re-use across multiple invocations. Each invocation will have a gzip trailer (as it must to be decompressed) to read, thus only one invocation can be processed per GZIPInputStream instance. However, org.jboss.remoting.transport.socket.ServerThread is essentially making the assumption that the input stream is re-usable across multiple invocations and doesn't need 'refreshing'. Is that a fair analysis or am I missing something?
Using 'http' transport gets around this issue, since it just re-creates the unmarshaller for each request, but I would prefer to stick with socket transport if possible, as the HTTP overhead isn't really wanted.
Does anyone have any suggestions on anything I might be doing wrong or may have misconfigured? I imagine that a re-usable (or perpetual) GZIPInputStream is possible in theory, but I'm not in a rush to go and write one if I can avoid it.
Thanks,
Ben