We don't have it on the roadmap yet. But it maybe a good time to kickoff the discussion.
My first question would be: Why would you cluster errai? What do you strive for?
- Faul tolerance?
Scalability could already be achieved when using load balancing with sticky sessions.
Faul tolerance (failover) is a different topic.
But regardless of the outcome of this discussion, we should look into leveraging existing
technologies that ship with JBoss or any other EE5 server.
I..e before diving into the low level details like jgroups, we should evaluate usage of
JMS, HTTP Session replication, etc.
I am pretty sure we can find a suitable approach, without implementing any of the low level bits ourselves.
In my case we cluster for both scalability and high availability.
I don't entirely agree that scalability can be achieved using sticky sessions unless you assume that information for each session is isolated from all other sessions. A quick example of this is a chat room. If two users try to join a room with a given "topic" and they become load balanced to separate nodes in the cluster they will essentially be in different chat rooms as things stand now.
The "One Degree of Separation" paradigm (http://www.jboss.org/errai/ErraiBus) is the limitation here. I can certainly understand why it is important but I would propose that it be extended such that traversing a cluster on the server side is included in that one degree.
Note that the Atmosphere project (https://atmosphere.dev.java.net/) has attempted to tackle this issue as well. In fact, publishing messages to "other" sources (JMS, EJB, Cluster) seems to have been one of their reasons for doing this http://weblogs.java.net/blog/jfarcand/archive/2009/07/cluster_cluster.html. I have not had a chance to try their method but they claim to support both Shoal (Glassfish) and JGroups clustering. I would guess that this means they came up with an abstraction for clustering their nodes and are now able to add providers. That sounds reasonable and would allow Errai to be extended.
I do plan on making this a priority in the coming releases.
It's worth mentioning that one community user has been successful clustering ErraiBus using Terracotta. But obviously it would be nice for there to be some flexibility here, and the design of the bus architecture practically begs for this kind of functionality in any case.
I'd like to also note that I have this accomplished using the JBoss.org 5.1 embedded JBossCache as my communications mechanism. I can provide the code if someone else would like to use it but essentially doing this comes down to 3 things and there is really very little code related to Errai once you figure out the two hook points into the Errai server bus.
1 - Extend the servlet implementation. In my case I want to use the JBoss Comet servlet so I extended it in order to inject my own ServerMessageBus implementation. Note that this could easily be configurable in the future.
2 - Extend the ServerMessageBusImpl. The main thing here is to be able to attach listeners to your message bus implementation. To do this I find my communication mechanism during the ServerMessageBusImpl constructor (I just used a singleton for now) and pass a reference to the bus.
3 - Write a communication mechanism. I used JBossCache but you can use whatever you want. You need to use the "addGlobalListener" method to listen for all messages in order to pass them to other nodes. You also need to be able to listen messages from other nodes and then use the "sendGlobal" method to pass messages to your local bus.
A few things to note. You probably want to disregard messages from the "ServerBus" subject. The heartbeats and subscription messages are not needed on the other nodes since you are adding a listener for all messages. I chose to create a special MessagePart called "ShareMessage" so that I can determine whether to send a given message to other nodes. This may not be ideal but it works to limit the number of inter-node communications. Without that mechanism there are some fun scenarios where messages get fired multiple times on other nodes.
If the node the GWT app was loaded from goes down, the first question is:
- how does this relate to SOP (same origin policy)?
SOP basically means an Ajax client can only connect to the host it was loaded from.
The second question would then be:
- what HTTP load balancing strategy is in place?
Round robin basically sends the next Errai request to an active node.
A new node needs to establish a new session, which means client bus re-connect and re-initializes.
It's basically like hitting the refresh button. From that point on you should be working of another node like before.
If there is any server side state that you rely on, make sure it's replicated and serves failover. But that basically nothing errai needs
to deal with.
The bottom line is:
You cannot simply "send a message to a different node in a cluster". There is a HTTP connection involved that
needs to needs to failover. This directly impacts the same origin policy (most likely). Any server side side state should not be subject
to be managed by Errai.
Hope that makes sense.
In my case, I was using HTTP Load balancing via Mod_Cluster with sticky sessions in place. Mod_cluster, and for that matter most load balancer based solutions prevent any problems due to SOP because you always use the same URL. The load balancer decides which back-end machine will process the requests. Sticky sessions also gets around the problem of round robin requests from a single user to some extent however it actually did not help me at all in my scenario. I actually had two different machines that needed to share Errai sessions and they were not aware of each other's HTTP sessions. My solution that I described above was to create an Errai bus listener on each node in the cluster. Whenever Node X would receive a message it would then fire that exact same message to the other nodes in the cluster using JGroups/JBossCache as the transport mechanism. There were a few types of message I had to be careful of but I was able to get this working. It was very fast even across nodes and was very reliable as well.
If I understand both of you correctly you are trying to broadcast messages created on a single node, to any GWT client on any other node, right?
(Think of a chat example maybe)
Before you reach for something as low level as jgroups (which works, but isn't advised) I would more likely look at the JMS integration and delegate the responsibility to distribute messages in a cluster to the JMS server implementation.
Errai-JMS allows you to directly listen to JMS topics from GWT clients and send messages to them. But you might as well just use JMS from your service implementation to broadcast messages within a cluster. IMO that more simple to maintain and portable as well.