3 Replies Latest reply on Dec 22, 2010 11:48 AM by galder.zamarreno

HAPartition deadlocked

lexsoto Dec 1, 2010 10:56 AM

Hello,

I am running JBoss 5.1.0.GA. The web application and EJBS are in separate servers in the same cluster. A deadlock occurred where the web application blocked when looking up the EJBs in JNDI.

A thread dump from the deadlocked web application server shows multiple threads with the same trace:

- waiting on <0x59cfe529> (a java.util.concurrent.locks.ReentrantLock$NonfairSync
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
org.jboss.seam.Component.getInstanceFromFactory(Component.java:2082)

They are all waiting on synchronization object 0x59cfe529.

The one thread that appears to be holding this lock shows:

- waiting on <0x7091fea5> (a org.jboss.ha.framework.server.ClusterPartition$ThreadGate)
java.lang.Object.wait(Native Method)
org.jboss.ha.framework.server.ClusterPartition$ThreadGate.await(ClusterPartition.java:2336)
org.jboss.ha.framework.server.ClusterPartition.callMethodOnCluster(ClusterPartition.java:1089)
org.jboss.ha.framework.server.ClusterPartition.callMethodOnCluster(ClusterPartition.java:1074)
org.jboss.ha.jndi.HAJNDI.lookupRemotely(HAJNDI.java:248)
org.jboss.ha.jndi.HAJNDI.lookup(HAJNDI.java:206)
sun.reflect.GeneratedMethodAccessor624.invoke(Unknown Source)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
java.lang.reflect.Method.invoke(Method.java:597)
org.jboss.ha.framework.interfaces.HARMIClient.invoke(HARMIClient.java:318)
$Proxy424.lookup(Unknown Source)
org.jnp.interfaces.NamingContext.lookup(NamingContext.java:726)
org.jnp.interfaces.NamingContext.lookup(NamingContext.java:686)
javax.naming.InitialContext.lookup(InitialContext.java:392)
org.jnp.interfaces.NamingContext.resolveLink(NamingContext.java:1346)
org.jnp.interfaces.NamingContext.lookup(NamingContext.java:817)
org.jnp.interfaces.NamingContext.lookup(NamingContext.java:833)
org.jnp.interfaces.NamingContext.lookup(NamingContext.java:686)
javax.naming.InitialContext.lookup(InitialContext.java:392)

........
 


Locked synchronizers : 
- locked <0x59cfe529> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

I thought about enabling deadlock detection on the HAPartition configuration, but I found that this is discouraged here: https://jira.jboss.org/browse/JBAS-5821

This JIRA entry suggest:

Use of properly sized thread pools does the same job

So should I increase the thread pool?

How do I go about avoiding this deadlock problem?

Any hints will be greatly appreciated.

TIA

1. Re: HAPartition deadlocked

galder.zamarreno Dec 21, 2010 9:36 AM (in response to lexsoto)

Hmmm, you're only showing part of the problem there. Clearly, the lookup cannot be resolved locally, so it tries to query the cluster for an EJB binding. But, what's the node that contains that binding doing? Did you get a thread dump from the other node? A way to workaround is to deploy all the EJBs everywhere and then the lookups will be resolved locally and hence no cluster wide calls are made.
Actions
2. Re: HAPartition deadlocked

lexsoto Dec 21, 2010 9:59 AM (in response to galder.zamarreno)
Firstly, thank you for replying.

I am no longer experiencing this problem. I think the problem was somehow related to Seam component/factory scopes.

I changed the scope of the class containing factory methods from Application to Event scope. I was also caching the EJB references in the application scope; I am not doing that anymore.

That said, I can't explain why the dead lock. I am concerned that the problem may come back. The code ClusterPartition.java seem to have a timeout on the wait call.

org.jboss.ha.framework.server.ClusterPartition$ThreadGate.await(ClusterPartition.java:2336)

I would expect this wait to be interrupted after some time, but it didn't; it got stuck at that point.
Can you confirm or deny if this call is supposed to timeout?

Thanks
Actions
3. Re: HAPartition deadlocked

galder.zamarreno Dec 22, 2010 11:48 AM (in response to lexsoto)

Alex, this is open source, so you can actually check this yourself

http://anonsvn.jboss.org/repos/jbossas/tags/JBoss_5_1_0_GA/cluster/src/main/org/jboss/ha/framework/server/ClusterPartition.java

As you can see from the code, the wait is timed.
Actions

Go to original post