1 Reply Latest reply: Apr 26, 2012 4:53 AM by Wolf-Dieter Fink RSS

    compacting garbage collector timeouts and cluster in jboss 5.1

    Armin Haaf Newbie

      we have a jboss 5.1 cluster with 3 nodes, each 2GB heap. The VMs runs with "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC", which works most of the time without problems.

       

      However sometimes a VM does a compacting garbage collection -> this means a stop of  50-80seconds. In this time the node got suspected by the other nodes.

      After the node gets responsive again it logs:

       

      server.log.2012-04-24_08-28-37:2012-04-24 08:10:11,385 WARN  [org.jgroups.protocols.FD] [T:125798] I was suspected by 10.199.18.13:39310; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK

      server.log.2012-04-24_08-28-37:2012-04-24 08:10:11,386 WARN  [org.jgroups.protocols.FD] [T:125798] I was suspected by 10.199.18.13:39310; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK

      server.log.2012-04-24_08-28-37:2012-04-24 08:10:11,386 WARN  [org.jgroups.protocols.FD] [T:125798] I was suspected by 10.199.18.13:39310; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK

      server.log.2012-04-24_08-28-37:2012-04-24 08:10:11,387 DEBUG [org.jgroups.protocols.pbcast.FLUSH] [T:127] Received START_FLUSH at 10.199.18.11:45393 but I am not flush participant, not responding

      server.log.2012-04-24_08-28-37:2012-04-24 08:10:11,387 DEBUG [org.jgroups.protocols.pbcast.FLUSH] [T:127] Received START_FLUSH at 10.199.18.11:45393 but I am not flush participant, not responding

      server.log.2012-04-24_08-28-37:2012-04-24 08:10:11,387 DEBUG [org.jgroups.protocols.pbcast.FLUSH] [T:127] Received START_FLUSH at 10.199.18.11:45393 but I am not flush participant, not responding

      server.log.2012-04-24_08-28-37:2012-04-24 08:10:11,388 DEBUG [org.jgroups.protocols.pbcast.GMS] [T:127] view=[10.199.18.12:39800|9] [10.199.18.12:39800, 10.199.18.13:39310]

      server.log.2012-04-24_08-28-37:2012-04-24 08:10:11,388 DEBUG [org.jgroups.protocols.pbcast.GMS] [T:127] [local_addr=10.199.18.11:45393] view is [10.199.18.12:39800|9] [10.199.18.12:39800, 10.199.18.13:

      39310]

      server.log.2012-04-24_08-28-37:2012-04-24 08:10:11,388 WARN  [org.jgroups.protocols.pbcast.GMS] [T:127] I (10.199.18.11:45393) am not a member of view [10.199.18.12:39800|9] [10.199.18.12:39800, 10.199

      .18.13:39310], shunning myself and leaving the group (prev_members are [10.199.18.12:34166, 10.199.18.13:60923, 10.199.18.11:45393, 10.199.18.12:39800, 10.199.18.13:39310], current view is [10.199.18.1

      1:45393|8] [10.199.18.11:45393, 10.199.18.12:39800, 10.199.18.13:39310])

       

      After this the cluster is broken and at least the node with the compacting GC must be restarted, sometimes the whole cluster is broken and must be restarted.

       

      Is there a configuration to avoid such problems ?