Currently it is possible to configure a wildfly/AS server to have both a live and a backup HornetQ server, The configuration of this and how it works can be improved tho and this is what I am looking at.
The first step is to update the configuration to allow backup servers to be configured within the main servers configuration, so you would see something like:
<backup-server name="myBackup" port-offset="100" inherit-configuration="true">
This would create a new HornetQ Server that inherits the configuration of the live but changing the name and the ports used by the connectors and the acceptors (NB you can also nest another configuration with overrides ). You can actually look at my branch at https://github.com/andytaylor/hornetq/tree/colocated-backups where I have prototyped this.
Currently I am creating the new servers within the HornetQServerImpl.java class but I'm not sure if we should do it here, use JMSServer or maybe create a new 'ColocatedServer' class that manages all this.
All I am doing at the minute is simplifying the way these can be configured so as a backup comes live it will take over full responsibilities for its live server, including clients and cluster connections. In an ephemeral topology tho where you want an elastic cluster you dont really want this, so Im looking at adding the following:
- the journal is replayed on start and the messages merged into the journal of the live server.
- Clients are informed that they must reconnect to another live node
- Cluster connections are told that they must stop and redistribute any messages they have.
Of course this will break ordering semantics so it would be configurable as much as possible, users can choose to have the backup just operate in normal mode as a node in itself (as it is now), Also Im not sure how to deal with in flight transactions at the minute or recovery.
Basically what we would do is inject the live server into the backup server and on startup redirect everything via the live post office.
a couple of other things im not sure about
- do we tell clients to move to the live after they have failed to the backup or just inform them that the live is actually where they reconnect to on startup.
- should the client wait to failover until the backup journal has been replayed?
any thoughts or ideas are most welcome