we get intermittent NameNotFound-Exceptions when a client makes a lookup against our JBoss-4.0.2-Cluster. The client uses the HA-JNDI-Ports. When he uses the non HA-JNDI-Ports we don't see the exceptions. At the same time another client can make lookups without any problems. When the client gets the first exception a high number of following lookups fail too. Sometimes, after a non predictable time, the client has successful lookups again, sometimes not.
Has anybody seen this too and is there a solution known? Or in other words, is it ok to make a non-HA-JNDI-Call for getting clustered Session-Facades?
Here are the details:
We have a JBoss-4.0.2-Cluster with three nodes. The cluster has its own partition name and multicast address.
The clients are a number of servers with a clustered webapplication. For short we can asume that there are only two client machines. The cluster-nodes and client machines are in the same subnet and see each others multicasts and broadcasts.
We stripped the code on the clients and build a servlet which has the following code snipped:
Hashtable env = new Hashtable(); env.put("java.naming.factory.initial","org.jnp.interfaces.NamingContextFactory"); env.put("java.naming.factory.url.pkgs","org.jboss.naming:org.jnp.interfaces"); env.put("java.naming.provider.url","jnp://server1:<ha-port>,jnp://server2:<ha-port>,jnp://server3:<ha-port>"); Context ctx = new InitialContext(env); Object obj = ctx.lookup(<existing path>); <call create on home interface after narrowing etc.>
All variables are local, so no singleton with shared data between two servlet calls etc. We use non standard port numbers, so no default can affect the behaviour. The port number is the same on all cluster nodes. But there is another server in the subnet which uses the default ports. When the client fails, we get a NameNotFoundException in the lookup-line.
We played around with the property for switching off the auto discovery (jnp.disableDiscovery), setting a connection timeout (jnp.timeout). But nothing changes.
We called the servlet on two machines of the webclient cluster. Both used the HA-JNDI-Ports of the three JBoss-Nodes. After a few seconds one of the webclients got the above mentioned exception. After this a lot of following calls got the same exception. We made a tcpdump and saw that the network traffic on the JNDI-Ports stopped after the exception. At the same time the other webclient was able to make lookups. Sometimes the webclient with exceptions went back to normal, but not everytime. We used a small script on a third machine to call the two servlets. We inserted a sleep of 1 second between two calls to one webclient-node. Then we saw the described behaviour. If we run the script without 'sleep' both webclients got exceptions after a few seconds and mostly they don't went back to normal.
The interesting point is, that we can't reproduce the problem when we make the jndi-calls from a development machine from a different network. Then the JBoss-Cluster answered all JNDI-Lookups without any exceptions.
Another interesting point is that the exceptions on the webclients were gone when we changed the port in the provider URL to the non HA-JNDI-Port. This port is a non standard one too.
We would like to use the HA-JNDI-Ports again, is there a known solution for this problem?
Thanks for all your help!