I've just completed a set of changes, mainly to enhance performance.
The good news is we're seeing some really quite good performance figures for non-persistent messages, especially considering we have a lot more avenues for optimization to explore yet :)
Just to give you a taste, where the JMS client and JMS server are in the same VM our performance already exceeds JBossMQ by a lot - this is for straight message sends, concurrent sends and receives etc.
On my laptop I'm seeing around 19000 msgs/sec for this use case.
I put this down largely to the smart copying I implemented some time back. With this only those parts of the message that need to be cloned are cloned lazily. This means that if a message is sent, then received in the same vm, and nothing is changed on it, then no cloning occurs - i.e. it's fast.
For inter-VM interactions, with JMS client in one VM, and JMS Server in other VM but on the same machine, performance is also very good.
Until a few days ago, on my laptop, I was typically seeing send rates for simple messages with no payloads of around 800 msgs/sec, this compares with JBossMQ which handles about 5000 on my box.
Now, we are seeing send rates aroundof 7500 msgs/sec, which already exceeds JBossMQ and, as I say, we have much more scope for further optimisation. Receive rates and concurrent send/receive rates are also very good.
This is not far off the capacity of 100MB/s ethernet.
In fact we have already exceeded JBossMQ by a long way in all tests involving non-persistent messages.
Persistent messages performance is not so good, but this is to be expected since we have a bunch of things to do to optimise the db access which we should be looking at very shortly.
To get the performance boost, it involved some changes to our read/writeExternal implementations, but the main thing was the creation of our own custom marshaller which gives us fine grained control over the wire format.
Previously for a simple send message, the message was being sent in a AOP Invocation object, sitting inside a Remoting InvocationRequest object which was all serialized and sent down the socket.
Then the response came back as a serialized InvocationResponse object in a lot of cases simply signifying a null response (this could be sent in one byte).
The overhead of the serialized objects added a lot of overhead when all we really wanted to send is the message (and a couple of other pieces of information).
This is what we now do. This accounts for the bulk of the performance gain.
As I mentioned earlier there are a bunch of other optimisations yet to be done, including:
1) Batching operations in same JDBC tx. - particularly useful for transactions.
2) Optimisations of the message and message ref tables.
3) Smart serialization of messages - this one deserves a bit more discussion.
For a persistent message being sent from a client in one VM to a server in another VM, then back to a receiving client in another VM, we have
a) Message is serialized just before it is sent.
b) Message is deserialized on receipt in server
c) Message is serialized just before it it persisted
d) Message is serialized just before being sent to the receiver.
e) Message is deserialized after receipt at the receiver.
When you consider that the message payload is opaque in the server, most of the above are unnecessary. In fact the message only needs to be serialized once (at send) and deserialized once (at receipt), it can be stored in the database as a byte (blob).
Implementing this should (hopefully) give a significant perf boost.
4) More fine tuning of readExternal, writeExternal
5) Various other bits and pieces.
Moreover, Clebert is also making some optimisations to JBossSerialization (which we are using for our serialization) so hopefully this will make a difference too.
Fantastic news! And this is the beginning ...
As it is very often the case, it's just a matter of setting your objectives straight, and then pursuing them one step at a time.
Our objective is now to make Messaging the best messaging system on the planet! :)
Back to earth:
- You should produce JBossMQ/Messaging comparison graphs. Send them to me so I'll post them on the wiki page. We need a system in place to produce these continuously. We will talk about this during the London meeting.
- The next step should be, as you mentioned, persistence optimization. This is critical.
1) Batching operations in same JDBC tx. - particularly useful for transactions.
You should also look at sharing the same JDBC tx for multiple JMS transactions/units
This will improve performance (throughput) at the expense of a bit of individual latency,
and a failure will affect more client units of work.
In more detail, when you have a lot of concurrent units of work by the client(s),
you put them all in 1 JTA/JDBC transaction and do them all in one request.
The JTA/JDBC transaction proceeds when it has N units of work
or there has been certain amont of time since the first unit of work was added.
Each unit of work gets the result of the whole group.
This should be internal to the persistence manager, the user of the
persistence manager just does add/remove/prepare/commit() as normal
but doesn't know it might block for a bit while other units are added.
I have looked through the current jboss-jms tree usage of serialization and we are making the same mistake with respect to versioning of the stream format. There is no versioning. Each writeExternal/readExternal pair should have a version number that allows for the contents to evolve over time in a backward compatible way. Adding data that can be reasonably defaulted is a backward compatibile change that we need to support.
I'll make sure we fix this.
I'd like to explore the compatibility requirements for JBossMessaging a bit more as this is currently a bit of a grey area for me.
I'm assuming a version x client should work with any version y server where y >= x.
Should a version x client work with a version y server where x > y ?
When can we make changes to interface/wire format? Is it only on major releases?
I'd like to get a better understanding of this so we can make sure we're building it in properly.
The requirement is long term compatibility or else the messaging framework is not suitable for the backbone of an ESB which by definition will operate in a heterogeneous environment. If an binary incompatible change has to be introduced, this is really a new invoker protocol that needs to be a derivative of the existng protocol. This is like introducing UIL2 in addition to UIL. Both will need to be supported at that point. Certinly one can have features never available in the other.
There is no x, y for which interoperability does not exist.
In other words, total backward compatibility.
Of course, we need to add tests for that (TODO)
But more importantly. Scott wants forwards compatibility as well ;-)
Depending on how this is designed/works, the protocol/version could be negotiated once
at initial connection rather than passing it on every request.
There's some "interesting" edge cases when a client
wants to know if it can take advantage of certain features in the later versions
of the protocol.
The problem being if it is talking to a cluster with transparent failover
and the cluster is heterogenous in terms of versions supports.
Interesting. How about client and the server are "open" to the protocol they use, they negociate this using some sort of handshake.
Or even better, the client has the wiring to "adapt" it's wire protocol based on the information it founds in the ConnectionFactory. The server "advertises" the wire protocol it understands, and the client has a built-in mechanism to adapt to that.
I (we) need to think about this.
Note that if you're processes are on the same machine, and you're using TCP to send messages, your NIC will *not* be used, so performance measurements will ignore the network and bypass it using TCP's loopback facility.
I suggest having the 2 processes on separate machines to really measure the thrpoughput and/or message rate.
Yes, this is really just a test of serialization performance.
We have been prevented in doing real tests over a real network due to some problems with jboss serialization which are now resolved, but initial results with a network already show we are significantly faster than JbossMQ, but I don't have anything more concrete here as yet.