4 Replies Latest reply on Sep 13, 2006 12:31 PM by brian.stansberry

Jgroups jboss cache deadlock

lmouton Sep 12, 2006 7:20 AM

We are running jgroups 2.3 and jboss cache 1.2.4.SP2 on Red Hat Linux Enterprise Application Server 4r3. Our jgroups is setup with UDP and tree cache is in replicated synch mode. Sometimes when we access the cache, on both put and get, it does not respond. At the same time that the cache stop responding, all the requests to our own jgroup channels also timeout. The JVM dumps below were taken at the time this occurred. This is a high transactional live environment and we would appreciate any assistance.

Dump 1
"http-0.0.0.0-8906-10" daemon prio=1 tid=0x0000002c5e0c7a40 nid=0x7b41 in Object.wait() [0x0000000044622000..0x0000000044623db0] at java.lang.Object.wait(Native Method) - waiting on <0x0000002bdf0898c8> (a java.util.HashMap) at org.jgroups.blocks.GroupRequest.doExecute(GroupRequest.java:492) - locked <
0x0000002bdf0898c8> (a java.util.HashMap) at org.jgroups.blocks.GroupRequest.execute(GroupRequest.java:188) at org.jgroups.blocks.MessageDispatcher.castMessage(MessageDispatcher.java:417) at org.jgroups.blocks.RpcDispatcher.callRemoteMethods(RpcDispatcher.java:165) at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:3327) at org.jboss.cache.TreeCache.callRemoteMethods(TreeCache.java:3357) at org.jboss.cache.interceptors.ReplicationInterceptor.handleReplicatedMethod(ReplicationInterceptor.java:122) at org.jboss.cache.interceptors.ReplicationInterceptor.invoke(ReplicationInterceptor.java:87) at org.jboss.cache.TreeCache.invokeMethod(TreeCache.java:4172) at org.jboss.cache.TreeCache.put(TreeCache.java:2914) at org.jboss.cache.TreeCache.put(TreeCache.java:2855) at sun.reflect.GeneratedMethodAccessor162.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.jboss.mx.interceptor.ReflectedDispatcher.invoke(ReflectedDispatcher.java:141) at org.jboss.mx.server.Invocation.dispatch(Invocation.java:80) at org.jboss.mx.server.Invocation.invoke(Invocation.java:72) at org.jboss.mx.server.AbstractMBeanInvoker.invoke(AbstractMBeanInvoker.java:245) at org.jboss.mx.server.MBeanServerImpl.invoke(MBeanServerImpl.java:644) at mammoth.cache.TreeCacheAOP.put(TreeCacheAOP.java:83)
..Our application code starts here

Dump 2
"http-0.0.0.0-8906-10" daemon prio=1 tid=0x0000002c5e0c7a40 nid=0x7b41 waiting for monitor entry [0x0000000044622000..0x0000000044623db0] at mammoth.cache.TreeCacheAOP.put(TreeCacheAOP.java:81) - waiting to lock <0x0000002ae6454350> (a mammoth.cache.TreeCacheAOP)

1. Re: Jgroups jboss cache deadlock

brian.stansberry Sep 12, 2006 6:21 PM (in response to lmouton)

Sorry, these thread dumps don't tell me anything.

The first shows a thread that's replicating a put() call waiting for responses from other nodes in the cluster. Whether there is anything meaningful there depends on how long it waits.

The second dump shows its not waiting for responses anymore; it's blocked in your code, presumably on a synchronized (this) call. From it's name it looks like your class is a wrapper around TreeCacheAOP; perhaps synchronizing on it is causing you problems?
Actions
2. Re: Jgroups jboss cache deadlock

ben.wang Sep 12, 2006 8:25 PM (in response to lmouton)

If you guys are using TreeCacheAop, are you mixing put and putObject?
Actions

3. Re: Jgroups jboss cache deadlock

lmouton Sep 13, 2006 2:03 AM (in response to lmouton)

We only use put.

We do call to the cache in a synchronised method. This is how we call the cache.

public synchronized Object put(String fqn, Object key, Object value) throws CacheImplementationException {
 if (isCacheActive()) {
 try {
 StringBuffer buffer = new StringBuffer();
 buffer.append(fqn);
 buffer.append(FORWARD_SLASH);
 buffer.append(key.toString());

 HashMap<String, MarshalledValue> map = new HashMap<String, MarshalledValue>(1, 1.0f);
 map.put(key.toString(), new MarshalledValue(value));

 return server.invoke(cacheService, "put",
 new Object[] { buffer.toString(), map },
 new String[] { String.class.getName(), Map.class.getName() });
 } catch (Exception e) {
 e.printStackTrace();
 throw new CacheImplementationException(e);
 }
 }
 else {
 return null;
 }
 }

Do we need to have this synchronised? In our TreeCache setup we set the SyncReplTimeout to 10000ms.

The first dump we posted looks like it is blocking in

org.jgroups.blocks.GroupRequest.doExecute(GroupRequest.java:492)

which is jgroups itself. Are we reading this right?

4. Re: Jgroups jboss cache deadlock

brian.stansberry Sep 13, 2006 12:31 PM (in response to lmouton)

In general, you don't want to synchronize on the cache to do a put. I don't know your use case in enough detail (i.e. who's putting what where when) to absolutely say you shouldn't, but JBC already provides concurrency control at the individual Node level.

I don't know what your isCacheActive() call does; perhaps some synchronization is needed there.

Blocking in the GroupRequest for a while during cache replication is perfectly normal. The caller thread sends the RPC call out to the cluster, and then blocks waiting for responses that come in separately from each cluster node. Seeing a thread blocking there in a thread dump means nothing. Seeing the thread continually blocking there for a long time over more than one thread dump tells you that for some reason it's taking a while for responses to come back. The issue then is why it's taking a long time to get responses, which could be all sorts of reasons.
Actions

Go to original post