Serialization with NIO
dmlloyd Jul 9, 2007 8:15 AMLooking at adapting JBoss Remoting to NIO, and I've encountered a problem with serialization. Ideally, using NIO, a small pool of threads can handle a large number of connections. The amount of time that a server thread spends processing a request should be minimized; ideally only invocations should be using a thread for any substantial amount of time.
Because the serialization API uses the old stream I/O methodology, it requires that blocking I/O be used to process the serialization/deserialization of objects. This causes a scalability issue (thread-per-connection) that cannot be completely solved with NIO. Two ways in which the problem may partially be solved:
1. Use (simulate) blocking I/O only during (de)serialization. There are several ways this can be done with similar results.
2. On the sending side, serialize objects to a local buffer first; use NIO to send the buffer (perhaps with a size header). On the receiving side, read the network data into a buffer, and once the buffer is filled, deserialize the buffer data.
The advantage to 1 is that it streams the buffer data; however, a server thread will be monopolized during the entire transmission of a serialized object. If the sending side is slow or intermittent, or if the data to transmit is large, a thread pool can be consumed indefinitely by a small number of clients. Blocking timeouts can mitigate (but not solve) this problem.
The second approach does not suffer from this problem. The serialization process can run from start to finish without blocking; the buffers can then be sent in a non-blocking fashion. The receiving side does not begin deserialization until the whole buffer is in memory, so it will not block either. However, memory overhead might become substantial for larger objects.
An ideal solution to this problem (from my perspective) would be to provide an NIO-compatible API. For serialization, the API would have the user push in an object, and then push in buffers repeatedly until the object is fully serialized:
public interface Serializer { void setObject(Object source); /** @return {@code true} when the object is serialized fully */ boolean fillBuffer(ByteBuffer target); }
For deserialization, the API would accept buffers, and allow the user to pull objects (accounting for the possibility of more than one object coming from a single buffer):
public interface Deserializer { boolean hasNext(); Object next(); void setBuffer(ByteBuffer input); }
This API would be used like this:
.... we just got a buffer .... deserializer.setBuffer(input); while (deserializer.hasNext()) { Object nextObj = deserializer.next(); .... do something with nextObj .... }
This API would allow for maximum scalability by lifting the blocking I/O requirement. A stream API can be implemented on top of this API; however the reverse is not true.
Is this type of API a possibility for JBoss Serialization? For extra credit, would it be possible to make an implementation that is wire-compatible with Java Serialization?