The ServerInvoker has a bounded "ServerThread"s thread pool, the number of simultaneous invocations that can be handled at a time is limited by invoker's maxPoolSize configuration parameter. Once a "ServerThread" is allocated to a client connection, it will stay associated to that connection until the client closes the socket or a socket timeout occurs.This could lead to thread starvation in presence of a large number of clients. Increasing the number of server threads will draw down the performance because the large number of context switches.
A non-blocking NIO model would probably more appropriate here. Other messaging projects use this approach already.
Tom Elrod wrote:
So guess would be good to cover scenarios and best approaches for them. The first scenario would be having a limited set (in the low hundreds) of clients that each regularly make invocations. In this scenario, having optimal throughput is probably the desired behavior and the current remoting code is designed for this. At the other end of the spectrum would be having a large number of clients (in the thousands) that each make periodic requests (meaning is not in tight loop that continually makes invocations). The best scenario for this is to use non-blocking i/o. The main reason for this is if use blocking i/o (besides running out of resources) will have thread per request and will be slow as balls because of all the thread context switching for all the concurrent requests. Then there are the scenarios in between (btw can add config so that remoting does not hold connection after invocation on the server side so that will free worker threads faster, which I am pretty sure is already in jira).
IIRC, Tom took a lot from the PooledInvoker. The PooledInvoker had a LRU queue that closed idle connections when one was needed so that there was no starving. Also, the PooledInvoker tried to avoid a large number of context switches by associating the client connection to a specific server thread. Therefore, no thread context switches were required. Of course, this design was predicated on "keep alive" clients. So if the use case was very "keep alive" oriented, it performed really well.
Bill and Tom raised good issues. Where is the cutt-off line? For what number of clients an asynchronous non-blocking server-side approach becomes more efficient than the current model?
Intuitively, one can imagine, and it can be probably very easily proven, that if only two clients connect to the server, then the thread-per-connection model is as efficient (if not more efficient) than the asynchronous approach. That probably doesn't stand true anymore for a 1000 clients.
Moreover, the usage pattern for a typical JMS application is to open a connection and keep it open for the life-time of the client instance (unless one uses the anti-pattern of creating a Connection for each message sent). This seem to me a "keep alive" type of client.
We could poll our users for usage patterns, that would be interesting data.
So, in the end, it's not a matter of beliefs, but statistics. You optimize your implementation for the most common usage pattern. If the average usage pattern of a typical JMS application is to create 10-100 clients that keep sending messages periodically over a long-running Connection, I don't think it makes sense to even think about NIO at this point. Add to this that with the new distributed destinations, you can "spread" processing across nodes, so if you have 1000 clients and 4 machines, that yields 250 clients/node.
The "right" long-term solution would be Remoting to support both patterns and make this configurable. That'll make everyone happy.
I think you should make a few prototypes and bench.
At the end of the day, we will be benchmarked against QPid, ActiveMQ and others and the bottom line is, it doesn't matter how wonderful our internal architecture is, we won't be able to touch them because the benchmarks will be decided on how fast we can push bytes over sockets, and we will lose right now. Period.
Very good points. One more reason to start benchmarking during the early implementation phase, so we won't have surprises at the end. I am totally with you both on this.
Much as I hate to say it, our competition has it right when it comes to their approach to remoting, actually they all seem to pretty much do it the same way (apart from us).
You're probably right. Do you have numbers? Maybe they're right, maybe they aren't. Would you bet your house without seeing numbers?
I agree that there is a conflict in terms of timeframe and existing features. I don't see that there is any inherent conflict with an efficient messaging transport and an rpc transport. Its just an issue of correctly layering the concepts. The only question in my mind is whether there are sufficient issues in the tradeoff between performance and the overhead of a layered transport implementation. At this point I don't see that we can make that call.
Not exactly sure what you mean by "layered transport implementation". Could you please clarify?