7 Replies Latest reply on Jul 17, 2011 10:55 AM by davewebb

    Possible concurrency problem ... leads to cluster crash

    davewebb

      I recently upgraded to JBoss 5.1.0 using JDK 1.6.0_17.  Previously my application ran without clustering issued on 4.2.3/1.5.0_16 for almost a year.

       

      Since my upgrade, I see warnings in the log like the ones below.

       

       

      2009-12-22 13:37:16,534 WARN  [org.jboss.web.tomcat.service.session.distributedcache.impl.jbc.CacheListener] (Incoming-6,192.168.1.50:33638) Possible concurrency problem: Replicated version id 10 is less than or equal to in-memory version for session T23FueOPl97HKZdI22n7cg__
      

      2009-12-22 13:37:43,220 WARN  [org.jboss.web.tomcat.service.session.distributedcache.impl.jbc.CacheListener] (Incoming-6,192.168.1.50:33638) Possible concurrency problem: Replicated version id 12 is less than or equal to in-memory version for session T23FueOPl97HKZdI22n7cg__

      2009-12-22 13:37:55,171 WARN  [org.jboss.web.tomcat.service.session.distributedcache.impl.jbc.CacheListener] (Incoming-3,192.168.1.50:33638) Possible concurrency problem: Replicated version id 172 is less than or equal to in-memory version for session nZOCiXSQ5U3HwJlmMvz+ZA__

      2009-12-22 13:38:26,091 WARN  [org.jboss.web.tomcat.service.session.distributedcache.impl.jbc.CacheListener] (Incoming-7,192.168.1.50:33638) Possible concurrency problem: Replicated version id 57 is less than or equal to in-memory version for session 7g3PCvqKFeDseo9s7QxAng__

      2009-12-22 13:38:26,704 WARN  [org.jboss.web.tomcat.service.session.distributedcache.impl.jbc.CacheListener] (Incoming-1,192.168.1.50:33638) Possible concurrency problem: Replicated version id 255 is less than or equal to in-memory version for session TRKySkT5WGJoX2j-IHertg__

       

      Shortly thereafter, all the nodes in the cluster start logging warnings and error such as:

       

       

      [96386 : 96388 (96388) (size=2, missing=0, highest stability=96386)]
      2009-12-22 14:07:12,065 WARN  [org.jgroups.protocols.pbcast.NAKACK] (OOB-18162,192.168.1.92:55020) (requester=192.168.1.50:33638, local_addr=192.168.1.92:55020) message 192.168.1.92:55020::66124 not found in retransmission table of 192.168.1.92:55020:
      [96386 : 96388 (96388) (size=2, missing=0, highest stability=96386)]
      2009-12-22 14:07:12,065 WARN  [org.jgroups.protocols.pbcast.NAKACK] (OOB-18163,192.168.1.92:55020) (requester=192.168.1.50:33638, local_addr=192.168.1.92:55020) message 192.168.1.92:55020::66118 not found in retransmission table of 192.168.1.92:55020:
      [96386 : 96388 (96388) (size=2, missing=0, highest stability=96386)]
      2009-12-22 14:07:12,065 WARN  [org.jgroups.protocols.pbcast.NAKACK] (OOB-18094,192.168.1.92:55020) (requester=192.168.1.50:33638, local_addr=192.168.1.92:55020) message 192.168.1.92:55020::18575 not found in retransmission table of 192.168.1.92:55020:

      The entire cluster crashes and becomes unresponsive.  mod_jk sees the nodes in ERR (all at the same time) and stops routing traffic to any nodes.

       

      This kind of defeats the purpose of setting up the cluster since all nodes go bad at once. Any help here is appreciated since this is a productions system.  Also before recommending support, I have filled out the form to request a support quote 3 times in the last week and no one from JBoss or RedHat has contacted me.  I realize you may need more info.  I have the logs archived from 2 different crashes and can provide any information required to assist me with this.  Thank you in advance!!!!