Here's a quick summary of the changes I've recently made.
I have implemented consumer prefetch - this has dramatically increased our throughput :) I have produced a new set of performance graphs which show we are better than MQ in everything I've tried.
In order to get throughput boosted I've changed the way a few things work. Now *everything* is pushed from the channel to the server consumer endpoint to the client. There are no active receives() from the client to the server.
Channel delivery and channel handle methods are now executed on the same thread (deliver is never called directly) resulting in less lock contention - this works using a "SEDA style" approach where the channel maintains a QueuedExecutor and requests are handed off to it, rather than executing directly. Right now this is only done for handle and deliver but we should consider doing it for acknowledge and cancel too.
Once remoting supports a non-blocking API this means we can implement a full end to end SEDA approach and should be able to scale to many thousands of connections :)
Other things to note:
All receive() calls are now handled on the client side (they don't go to the server) - this is vital for good performance. This includes receiveNoWait(), so it means there is a chance receiveNoWait() returns null after a send to the destination has returned ok. I looked into the semantics of receiveNoWait() and this seems ok to me, I don't think receiveNoWait makes any guarantees that it will definitely return a message if send has returned. It also seems that other messaging systems work this way. We could make receiveNoWait go to the server but I don't think this is necessary and it will screw performance.
I have combined ChannelState and ChannelSupport - since the division between the two was arbitrary since we don't have different types of state any more. Also the extra API having to be maintained added complexity.
The pluggable thread pool has gone. This is because when we send messages from server to client we need to ensure that all messages from a particular consumer are sent in sequence. Using a threadpool doesn't give us that guarantee since a later delivery could overtake an earlier delivery in flight. So now the consumer always uses the same QueuedExecutor for it's life time. Since each QueuedExecutor has it's own thread we don't want to have a unique one for each consumer (run out of threads), there is a pool of QueuedExcetors maintained by the server. QueuedExecutors are allocated from the pool in a rotating fashion so it's possible two or more consumers could be using the same pooled executors.
(Anyway it shouldn't really be pluggable in the first place since it's part of the *design* of the server, not something that we should allow to configure)
I have also made some changes around session recovery (since we were doing it wrong).
The remote API is now much simpler :)
BTW I have changed some tests since now sematics have changed (particularly of receiveNoWait) - so beware!
Also one test in CTSMiscallaneousTest I don't think can be correct since it applies that all receives must act directly on the channel which would prevent us from doing buffering. I have left that failing for now.
In all it's been a bit of a mission (more complex than I thought) but necessary for great performance.