7 Replies Latest reply on Jul 17, 2011 10:55 AM by davewebb

Possible concurrency problem ... leads to cluster crash

davewebb Dec 22, 2009 2:37 PM

I recently upgraded to JBoss 5.1.0 using JDK 1.6.0_17. Previously my application ran without clustering issued on 4.2.3/1.5.0_16 for almost a year.

Since my upgrade, I see warnings in the log like the ones below.

2009-12-22 13:37:16,534 WARN [org.jboss.web.tomcat.service.session.distributedcache.impl.jbc.CacheListener] (Incoming-6,192.168.1.50:33638) Possible concurrency problem: Replicated version id 10 is less than or equal to in-memory version for session T23FueOPl97HKZdI22n7cg__

2009-12-22 13:37:43,220 WARN [org.jboss.web.tomcat.service.session.distributedcache.impl.jbc.CacheListener] (Incoming-6,192.168.1.50:33638) Possible concurrency problem: Replicated version id 12 is less than or equal to in-memory version for session T23FueOPl97HKZdI22n7cg__

2009-12-22 13:37:55,171 WARN [org.jboss.web.tomcat.service.session.distributedcache.impl.jbc.CacheListener] (Incoming-3,192.168.1.50:33638) Possible concurrency problem: Replicated version id 172 is less than or equal to in-memory version for session nZOCiXSQ5U3HwJlmMvz+ZA__

2009-12-22 13:38:26,091 WARN [org.jboss.web.tomcat.service.session.distributedcache.impl.jbc.CacheListener] (Incoming-7,192.168.1.50:33638) Possible concurrency problem: Replicated version id 57 is less than or equal to in-memory version for session 7g3PCvqKFeDseo9s7QxAng__

2009-12-22 13:38:26,704 WARN [org.jboss.web.tomcat.service.session.distributedcache.impl.jbc.CacheListener] (Incoming-1,192.168.1.50:33638) Possible concurrency problem: Replicated version id 255 is less than or equal to in-memory version for session TRKySkT5WGJoX2j-IHertg__

Shortly thereafter, all the nodes in the cluster start logging warnings and error such as:

[96386 : 96388 (96388) (size=2, missing=0, highest stability=96386)]
2009-12-22 14:07:12,065 WARN  [org.jgroups.protocols.pbcast.NAKACK] (OOB-18162,192.168.1.92:55020) (requester=192.168.1.50:33638, local_addr=192.168.1.92:55020) message 192.168.1.92:55020::66124 not found in retransmission table of 192.168.1.92:55020:
[96386 : 96388 (96388) (size=2, missing=0, highest stability=96386)]
2009-12-22 14:07:12,065 WARN  [org.jgroups.protocols.pbcast.NAKACK] (OOB-18163,192.168.1.92:55020) (requester=192.168.1.50:33638, local_addr=192.168.1.92:55020) message 192.168.1.92:55020::66118 not found in retransmission table of 192.168.1.92:55020:
[96386 : 96388 (96388) (size=2, missing=0, highest stability=96386)]
2009-12-22 14:07:12,065 WARN  [org.jgroups.protocols.pbcast.NAKACK] (OOB-18094,192.168.1.92:55020) (requester=192.168.1.50:33638, local_addr=192.168.1.92:55020) message 192.168.1.92:55020::18575 not found in retransmission table of 192.168.1.92:55020:

The entire cluster crashes and becomes unresponsive. mod_jk sees the nodes in ERR (all at the same time) and stops routing traffic to any nodes.

This kind of defeats the purpose of setting up the cluster since all nodes go bad at once. Any help here is appreciated since this is a productions system. Also before recommending support, I have filled out the form to request a support quote 3 times in the last week and no one from JBoss or RedHat has contacted me. I realize you may need more info. I have the logs archived from 2 different crashes and can provide any information required to assist me with this. Thank you in advance!!!!

1. Re: Possible concurrency problem ... leads to cluster crash

brian.stansberry Dec 22, 2009 4:23 PM (in response to davewebb)

You can send me logs at bstansberry at jboss dot com. Please include all nodes in the cluster, preferably both the part showing the startup of the nodes and the period when the problems occurred.

Please send your mod_jk configuration as well.
Actions
2. Re: Possible concurrency problem ... leads to cluster crash

davewebb Dec 22, 2009 6:43 PM (in response to brian.stansberry)

Sent requested artifacts via email. Thank you very much for the assistance.
Actions
3. Re: Possible concurrency problem ... leads to cluster crash

kingluke Jan 4, 2011 12:54 PM (in response to davewebb)

Hi David,

Did you ever figure this problem out? I think i may have the same issue as you.

Thanks,
Will
Actions
4. Re: Possible concurrency problem ... leads to cluster crash

augustsimonelli Jul 15, 2011 12:33 AM (in response to davewebb)

Me too ... anyone else know more?
Actions
5. Re: Possible concurrency problem ... leads to cluster crash

davewebb Jul 15, 2011 10:04 AM (in response to davewebb)

Guys,

I solved this issue by moving my platform from virtual servers to physical servers.

Once we went 100% physical nodes and physical mod_jk machine, everything runs smooth!

Hope that helps.

Dave
Actions
6. Re: Possible concurrency problem ... leads to cluster crash

augustsimonelli Jul 17, 2011 6:12 AM (in response to davewebb)

Thanks David ... i wound up raising it with JBoss support (we are lucky enough to be using EAP) and basiclly the word from them was that multiple session requests from ajax, css, etc essentially get the sessions out of whack. The advice was to use sticky sessions ... this seems like a workaround in a way but also a decent way of keeping things in order. So far, so good. :-)
Actions
7. Re: Possible concurrency problem ... leads to cluster crash

davewebb Jul 17, 2011 10:55 AM (in response to augustsimonelli)

August,

Sounds like good advice. We also use sticky sessions with the load balancer for the same reason. Good luck, and let me know if you run into anything else.

Dave
Actions

Go to original post