In our installation we are using an instance of the juddi web application per node. But those instances work over the same shared db schema. I included a diagram.
Our ESB application is performing "internal routing". By that I mean that an ESB service called via a gateway might invoke several internal ESB services using the ServiceInvoker. When both servers in the cluster are up and running the cluster works fine. It has the following property though. For a single call to the esb from an external application some of the work might be performed one one node while some of the work on another. Depending on where the ServiceInvoker routed the "internal service" requests.
We have noticed the following problem that relates to this clustering setup.
When a server fails (or is taken down on purpose), the ServiceInvoker routes the requests to the server that is down and waits for a timeout before resending the request to a server that is up (or that is also down). We think this happens because it is just asking juddi for the service of "category x", "servicename y" and juddi can respond with an endpoint to any of the servers in the cluster.
So when a node in the cluster is down, the cluster works but is slow.
We were wondering if anybody else has seen it and if there are any suggestions. I was thinking that probably the right thing would be for juddi to only return endpoints local to the node of the cluster. This would make sense also in the case when the server is running (because it would reduce network traffic).
One way it could work is by appending to the "category x" name the name of the node, but I do not know if there is an easy way to do that in the configuration xml files (other than changing the jboss-esb.xml files for all .esb archives - also for the ones that come with jboss.esb).
If anyone has any thoughts on this they would be highly appreciated.
cluster.pdf 12.7 KB