That is the new JBoss Messaging! Or at least it will be when it is finished.
Except it doesn't need active/standby/singleton like HAJMS or what you
describe for Sonic.
If you have enough memory/nodes you run it "serverless" with the cluster
acting as the "backing store".
As long as the whole cluster never goes down
and you have "buddy replication" you never lose a message.
If you don't have enough memory/nodes, the senders can persist messages
for offline receivers.
Or if they can't persist, they can find somebody else (preferably more than one)
in the cluster that can (a sort of distributed persistent store).
Running both clustered and the distributed persistent store gives you
extra reliabilty, obviously.
I even want a variation where a sender/receiver outside the cluster can "work offline".
It persists the sends/acknowledgements locally and sends them in a
batch when it is able to connect.
And a receiver can pull down its current batch of messages into its local
persistent store to work on later.
Of course, "off line" semantics, requires a looser definition of transaction.
But that is no different to "message bridges".
The keys tasks still to work out will be cleverly routing messages
to stores/nodes close to the receivers and moving receivers/subscriptions
(that are movable like MDBs) to distribute the load evenly.
The aim being to minimize network traffic and overly complicated tx semantics across machines.
Like you say, transaction failures are never transparent.
And RAID is very limiting in terms of inhomogenous hardware and its locality.
Okay, this has nothing to do with current JBoss Messaging development, it's just a thought since I've been playing around with their software and read their documentation.
Quite the opposite, it has lots to do with current JBoss Messaging development :)
Anyway, the neat thing about SonicMQ is that they have essentially a active/standby configuration dubbed "Continuous Availability Architecture" which duplicates message states (via a dedicated network channel) and a proprietary message store that is synchronized between the two nodes.
The nice thing about it is that persistence is guaranteed without RAID or use of a stand alone database or replicated database configuration.
I am envisioning the same thing for Messaging. The key here is "in-memory replication" which can be an intermediate QoS level, so we can achieve close-to-persistent reliability without actually using a persistent store. We can use replication across the cluster, which is very safe but relatively slow, or wait for the new "buddy replication" JGroups feature.
As Tim put it, everything would go fine as long as the power doesn't go down in the datacenter, for which case we would need persistence. However, this degree of reliability may be acceptable for some, in exchange for better performance.
I also wonder if replicated state over a network layer is really any better than replicated data files over a RAID configuration.
Good question. I don't have enough data yet to answer this question. We will measure and see.
Another interesting subject in this category is "federation", as in routing over WAN as opposite to LAN. I have some ideas in this area (I come from a JGroups background where I worked on the WAN support for JGroups), but I would first like to release an version that's stable and reasonable performant in a standalone and LAN configuration, and then go further.
I am reasonably confident that the current "peer" architecture together with appropriate support in JGroups would prove flexible enough to add the features I was talking about relatively painlessly.