1 Reply Latest reply on Aug 31, 2006 10:03 AM by bernd.koecke

Lookup Failure with HA-JNDI

bernd.koecke Jul 20, 2006 8:46 AM

Hi,

we get intermittent NameNotFound-Exceptions when a client makes a lookup against our JBoss-4.0.2-Cluster. The client uses the HA-JNDI-Ports. When he uses the non HA-JNDI-Ports we don't see the exceptions. At the same time another client can make lookups without any problems. When the client gets the first exception a high number of following lookups fail too. Sometimes, after a non predictable time, the client has successful lookups again, sometimes not.

Has anybody seen this too and is there a solution known? Or in other words, is it ok to make a non-HA-JNDI-Call for getting clustered Session-Facades?

Here are the details:

We have a JBoss-4.0.2-Cluster with three nodes. The cluster has its own partition name and multicast address.
The clients are a number of servers with a clustered webapplication. For short we can asume that there are only two client machines. The cluster-nodes and client machines are in the same subnet and see each others multicasts and broadcasts.
We stripped the code on the clients and build a servlet which has the following code snipped:


Hashtable env = new Hashtable();
env.put("java.naming.factory.initial","org.jnp.interfaces.NamingContextFactory");
env.put("java.naming.factory.url.pkgs","org.jboss.naming:org.jnp.interfaces");
env.put("java.naming.provider.url","jnp://server1:<ha-port>,jnp://server2:<ha-port>,jnp://server3:<ha-port>");

Context ctx = new InitialContext(env);
Object obj = ctx.lookup(<existing path>);
<call create on home interface after narrowing etc.>

All variables are local, so no singleton with shared data between two servlet calls etc. We use non standard port numbers, so no default can affect the behaviour. The port number is the same on all cluster nodes. But there is another server in the subnet which uses the default ports. When the client fails, we get a NameNotFoundException in the lookup-line.
We played around with the property for switching off the auto discovery (jnp.disableDiscovery), setting a connection timeout (jnp.timeout). But nothing changes.
We called the servlet on two machines of the webclient cluster. Both used the HA-JNDI-Ports of the three JBoss-Nodes. After a few seconds one of the webclients got the above mentioned exception. After this a lot of following calls got the same exception. We made a tcpdump and saw that the network traffic on the JNDI-Ports stopped after the exception. At the same time the other webclient was able to make lookups. Sometimes the webclient with exceptions went back to normal, but not everytime. We used a small script on a third machine to call the two servlets. We inserted a sleep of 1 second between two calls to one webclient-node. Then we saw the described behaviour. If we run the script without 'sleep' both webclients got exceptions after a few seconds and mostly they don't went back to normal.
The interesting point is, that we can't reproduce the problem when we make the jndi-calls from a development machine from a different network. Then the JBoss-Cluster answered all JNDI-Lookups without any exceptions.
Another interesting point is that the exceptions on the webclients were gone when we changed the port in the provider URL to the non HA-JNDI-Port. This port is a non standard one too.

We would like to use the HA-JNDI-Ports again, is there a known solution for this problem?

Thanks for all your help!
Bernd

1. Re: Lookup Failure with HA-JNDI

bernd.koecke Aug 31, 2006 10:03 AM (in response to bernd.koecke)

Hi,

for all who may be interested, we found the reason for the described behaviour:
The problem was, that there were two clusters with different multicast addresse etc., but the same partition name. And both cluster were accessed from the same client for different services. When we installed these clusters, we looked only at the cluster nodes, that only the nodes of one cluster talked to each other. But here the problem happens on the client. If a client connects to more than one cluster, theses clusters must have distinct partition names. I think this is noted somewhere in the docs, but I don't remember an explanation why the partition names must be distinct, when the servernames and multicast addresses are distinct.

Some details:
The HA-JNDI client has an internal cache for the connection data. There is more than one stage for accessing this cache at jndi lookup time. The first stage maps the cluster nodes servername to an internal object instance. This instance has a static map (a class field) in which the partition name is the key for getting the connection object. All jndi lookups on the client share this static map with the partition names as key. The result is that the last jndi lookup will overwrite the internal connection data of the first one. And all lookups two the first cluster are internaly mapped two the second. And so every lookup to a service on the first cluster gets a NameNotFoundException after the first successful lookup to the second cluster.

Best regards,
Bernd
Actions