2 Replies Latest reply on Mar 24, 2011 11:36 AM by Andrew DuFour

    Odd issue with JBoss 4.2.3GA, JGroups and Clustering

    Andrew DuFour Newbie

      My environment:


      4 clustered nodes running:


      RHEL 5.5

      JBoss 4.2.3GA

      Java(TM) SE Runtime Environment (build 1.6.0_20-b02)

      Java HotSpot(TM) Server VM (build 16.3-b01, mixed mode)

      JGroups 1.4.1SP-4


      Yesterday I had an extremely strange issue.


      Consider the below scenerio:


      Node 1 - Up

      Node 2 - Up

      Node 3 - Up <- coordinator

      Node 4 - Up

      For some reason the JVM for Node 3 hangs, won't respond to the shutdown command, appears the container is still open but when you hit it nothing is ever rendered. A kill command is issued, bringing node 3 down. A standard jboss restart is performed but on restart the log file shows the following:



      GMS Address: x.x.x.x:xxxx



      Then every few seconds the log line:


      WARN [org.jgroups.protocols.pbcast.GMS] Join(x.x.x.x:xxxx) to <address of coordinator that was killed> timed out, retrying


      I decided to try a restart of Node 1 with jgroups on trace in the log4j to see if it would generate the same symptoms, and it did. I also turned jgroups to trace on Node 2. I could then see Node 1 start up, send the appropriate REQ's and get a response back from node 2 stating the dead coordinator as still the active coordinator. Obviously Node 1 now can't contact the dead coordinator, resulting in the node never joining the cluster, never retrieving state and its' container never starting.


      Now I have:


      Node 1 - Down

      Node 3 - Down (dead coordinator)


      Node 2 - Up -> reporting coord_addr as Node 3

      Node 4 - Up -> I'd imagien reporting the same


      I had to bring the whole cluster down and start it fresh to get it working again.


      I'm lucky enough this was a QA environment, but I'm just curious if anyone's run into this before/is it a known issue with JGroups 1.4.1, a config issue, etc?


      Thanks for your assistance!