6 Replies Latest reply on Mar 26, 2015 7:46 AM by wdfink

    JBOSS EAP5.2 whole cluster hanged up

    sabir_mustafa

      Hi Everyone:

      A few days back i had faced a strange problem in existing infrastructure running JBOSS EAP5.2 and having 12 JBOSS instances in a single cluster spanning on 06 machines . The whole application stopped responding. While investigating "server.log" file following message was appearing

       

      5 WARN  [NAKACK] 192.168.3.10:55200] discarded message from non-member 192.168.3.14:55200, my view is [192.168.3.2:55200|12] [192.168.3.2:55200, 192.168.3.10:55200, 192.168.3.12:55200, 192.168.3.13:55200, 192.168.3.15:55200, 192.168.3.16:55200, 192.168.3.17:55200, 192.168.3.18:55200, 192.168.3.19:55200, 192.168.3.20:55200, 192.168.3.21:55200]

       

      It was strange that only one node went out of cluster but the whole application stopped responding. first i just stopped the JBOSS instance on the affected machine and application resumed its services normally. Then started back the affected instance and everything became normal.

       

      Anyone, please guide me how to avoid such problems in future, where shall i look for rectification.

       

       

      Thanks