0 Replies Latest reply on Jan 6, 2009 7:00 AM by rana24

    Cluster node stops processing when other node goes down

      Hi,
      We are running JBoss AS 4.2.3 with JBM1.4.0 SP3 with 2 cluster members forming the cluster. All the nodes are getting high no. of messages which starts application which again heavily uses messaging for business logic.
      We face this problem....when one cluster node goes down because of OutofMemory exception , the other cluster node also stops processing , it seems they try to establish some communication and it fails.
      Is this expected behaviour ? Ideally, the other node should keep processing right ??

      I hope i am discussing with right group of people.

      Please find following console log for 3 node cluster.

      Thank you in advance.


      
      //10.31.3.22/AC70Mod16.pdf
      15:28:14,071 WARN [FD] I was suspected by 10.31.2.85:1516; ignoring the SUSPECT
       message and sending back a HEARTBEAT_ACK
      15:28:14,071 WARN [FD] I was suspected by 10.31.2.85:1509; ignoring the SUSPECT
       message and sending back a HEARTBEAT_ACK
      15:28:14,071 WARN [FD] I was suspected by 10.31.2.85:1521; ignoring the SUSPECT
       message and sending back a HEARTBEAT_ACK
      15:28:14,086 WARN [FD] I was suspected by 10.31.2.85:1503; ignoring the SUSPECT
       message and sending back a HEARTBEAT_ACK
      15:28:18,681 WARN [GMS] I (10.31.4.242:1755) am not a member of view [10.31.2.8
      5:1521|13] [10.31.2.85:1521, 10.31.2.11:3017], shunning myself and leaving the g
      roup (prev_members are [10.31.2.85:1521 10.31.2.11:1253 10.31.4.242:1755 10.31.2
      .11:3017 ], current view is [10.31.2.85:1521|12] [10.31.2.85:1521, 10.31.4.242:1
      755, 10.31.2.11:3017])
      15:28:18,681 WARN [GMS] I (10.31.4.242:1752) am not a member of view [10.31.2.8
      5:1509|13] [10.31.2.85:1509, 10.31.2.11:3025], shunning myself and leaving the g
      roup (prev_members are [10.31.2.85:1509 10.31.2.11:1254 10.31.4.242:1752 10.31.2
      .11:3025 ], current view is [10.31.2.85:1509|12] [10.31.2.85:1509, 10.31.4.242:1
      752, 10.31.2.11:3025])
      15:28:18,681 WARN [GMS] I (10.31.4.242:1754) am not a member of view [10.31.2.8
      5:1503|13] [10.31.2.85:1503, 10.31.2.11:3023], shunning myself and leaving the g
      roup (prev_members are [10.31.2.85:1503 10.31.2.11:1252 10.31.4.242:1754 10.31.2
      .11:3023 ], current view is [10.31.2.85:1503|12] [10.31.2.85:1503, 10.31.4.242:1
      754, 10.31.2.11:3023])
      15:28:18,696 WARN [GMS] I (10.31.4.242:1753) am not a member of view [10.31.2.8
      5:1516|13] [10.31.2.85:1516, 10.31.2.11:3027], shunning myself and leaving the g
      roup (prev_members are [10.31.2.85:1516 10.31.2.11:1255 10.31.4.242:1753 10.31.2
      .11:3027 ], current view is [10.31.2.85:1516|12] [10.31.2.85:1516, 10.31.4.242:1
      753, 10.31.2.11:3027])
      15:28:19,493 INFO [STDOUT]
      -------------------------------------------------------
      GMS: address is 10.31.4.242:4655
      -------------------------------------------------------
      15:28:19,493 INFO [STDOUT]
      -------------------------------------------------------
      GMS: address is 10.31.4.242:4654
      -------------------------------------------------------
      15:28:19,509 INFO [STDOUT]
      -------------------------------------------------------
      GMS: address is 10.31.4.242:4652
      -------------------------------------------------------
      15:28:19,509 INFO [STDOUT]
      -------------------------------------------------------
      GMS: address is 10.31.4.242:4653
      -------------------------------------------------------
      15:28:21,525 INFO [TreeCache] viewAccepted(): [10.31.4.242:4654|0] [10.31.4.242
      :4654]
      15:28:21,525 INFO [EXA] New cluster view for partition EXA (id: 0, delta: -2) :
       [10.31.4.242:1099]
      15:28:21,525 INFO [TreeCache] viewAccepted(): [10.31.4.242:4653|0] [10.31.4.242
      :4653]
      15:28:21,525 INFO [TreeCache] viewAccepted(): [10.31.4.242:4655|0] [10.31.4.242
      :4655]
      15:28:21,525 INFO [EXA] I am (10.31.4.242:1099) received membershipChanged even
      t:
      15:28:21,525 INFO [EXA] Dead members: 2 ([10.31.2.85:1099, 10.31.2.11:1099])
      15:28:21,525 INFO [EXA] New Members : 0 ([])
      15:28:21,525 INFO [EXA] All Members : 1 ([10.31.4.242:1099])
      15:28:44,060 INFO [TreeCache] viewAccepted(): MergeView::[10.31.2.11:3017|14] [
      10.31.2.11:3017, 10.31.2.85:1521, 10.31.4.242:4653], subgroups=[[10.31.2.85:1521
      |13] [10.31.2.85:1521, 10.31.2.11:3017], [10.31.4.242:4653|0] [10.31.4.242:4653]
      ]
      15:28:46,701 INFO [TreeCache] viewAccepted(): MergeView::[10.31.2.11:3027|14] [
      10.31.2.11:3027, 10.31.2.85:1516, 10.31.4.242:4655], subgroups=[[10.31.2.85:1516
      |13] [10.31.2.85:1516, 10.31.2.11:3027], [10.31.4.242:4655|0] [10.31.4.242:4655]
      ]
      15:28:49,436 WARN [NAKACK] 10.31.4.242:4652] discarded message from non-member
      10.31.2.85:1509, my view is [10.31.4.242:4652|0] [10.31.4.242:4652]
      15:28:52,343 WARN [NAKACK] 10.31.4.242:4654] discarded message from non-member
      10.31.2.85:1503, my view is [10.31.4.242:4654|0] [10.31.4.242:4654]
      15:28:52,343 INFO [TreeCache] viewAccepted(): MergeView::[10.31.2.11:3023|14] [
      10.31.2.11:3023, 10.31.2.85:1503, 10.31.4.242:4654], subgroups=[[10.31.2.85:1503
      |13] [10.31.2.85:1503, 10.31.2.11:3023], [10.31.4.242:4654|0] [10.31.4.242:4654]
      ]
      15:28:59,078 INFO [EXA] New cluster view for partition EXA: 14 ([10.31.2.11:109
      9, 10.31.2.85:1099, 10.31.4.242:1099] delta: 2)
      15:28:59,078 INFO [EXA] Merging partitions...
      15:28:59,078 INFO [EXA] Dead members: 0
      15:28:59,094 INFO [EXA] Originating groups: [[10.31.2.85:1509|13] [10.31.2.85:1
      509, 10.31.2.11:3025], [10.31.4.242:4652|0] [10.31.4.242:4652]]
      15:29:57,932 INFO [TreeCache] viewAccepted(): [10.31.2.85:1521|15] [10.31.2.85:
      1521, 10.31.4.242:4653]
      15:30:03,901 INFO [TreeCache] viewAccepted(): [10.31.2.85:1516|15] [10.31.2.85:
      1516, 10.31.4.242:4655]
      15:30:10,606 INFO [EXA] Suspected member: 10.31.2.11:3025
      15:30:13,028 INFO [TreeCache] viewAccepted(): [10.31.2.85:1503|15] [10.31.2.85:
      1503, 10.31.4.242:4654]
      15:30:15,747 INFO [EXA] New cluster view for partition EXA: 15 ([10.31.2.85:109
      9, 10.31.4.242:1099] delta: -1)
      15:30:15,747 INFO [EXA] I am (10.31.4.242:1099) received membershipChanged even
      t:
      15:30:15,747 INFO [EXA] Dead members: 1 ([10.31.2.11:1099])
      15:30:15,747 INFO [EXA] New Members : 0 ([])
      15:30:15,747 INFO [EXA] All Members : 2 ([10.31.2.85:1099, 10.31.4.242:1099])
      15:32:38,349 WARN [NAKACK] 10.31.4.242:4653] discarded message from non-member
      10.31.2.11:3017, my view is [10.31.2.85:1521|15] [10.31.2.85:1521, 10.31.4.242:4
      653]
      15:32:38,349 WARN [NAKACK] 10.31.4.242:4652] discarded message from non-member
      10.31.2.11:3025, my view is [10.31.2.85:1509|15] [10.31.2.85:1509, 10.31.4.242:4
      652]
      15:32:38,396 WARN [NAKACK] 10.31.4.242:4654] discarded message from non-member
      10.31.2.11:3023, my view is [10.31.2.85:1503|15] [10.31.2.85:1503, 10.31.4.242:4
      654]
      15:32:38,412 WARN [NAKACK] 10.31.4.242:4655] discarded message from non-member
      10.31.2.11:3027, my view is [10.31.2.85:1516|15] [10.31.2.85:1516, 10.31.4.242:4
      655]
      15:32:39,834 INFO [EXA] Suspected member: 10.31.2.85:1509
      15:32:39,974 INFO [EXA] New cluster view for partition EXA (id: 16, delta: -1)
      : [10.31.4.242:1099]
      15:32:39,990 INFO [EXA] I am (10.31.4.242:1099) received membershipChanged even
      t:
      15:32:39,990 INFO [EXA] Dead members: 1 ([10.31.2.85:1099])
      15:32:39,990 INFO [EXA] New Members : 0 ([])
      15:32:39,990 INFO [EXA] All Members : 1 ([10.31.4.242:1099])
      15:32:39,990 INFO [TreeCache] viewAccepted(): [10.31.4.242:4653|16] [10.31.4.24
      2:4653]
      15:32:40,021 INFO [TreeCache] viewAccepted(): [10.31.4.242:4654|16] [10.31.4.24
      2:4654]
      15:32:40,037 INFO [TreeCache] viewAccepted(): [10.31.4.242:4655|16] [10.31.4.24
      2:4655]
      15:32:43,506 WARN [NAKACK] 10.31.4.242:4655] discarded message from non-member
      10.31.2.85:1516, my view is [10.31.4.242:4655|16] [10.31.4.242:4655]
      15:32:52,633 INFO [EXA] New cluster view for partition EXA (id: 17, delta: 1) :
       [10.31.4.242:1099, 10.31.2.11:1099]
      15:32:52,633 INFO [EXA] I am (10.31.4.242:1099) received membershipChanged even
      t:
      15:32:52,633 INFO [EXA] Dead members: 0 ([])
      15:32:52,633 INFO [EXA] New Members : 1 ([10.31.2.11:1099])
      15:32:52,633 INFO [EXA] All Members : 2 ([10.31.4.242:1099, 10.31.2.11:1099])
      15:32:52,680 INFO [TreeCache] viewAccepted(): [10.31.4.242:4655|17] [10.31.4.24
      2:4655, 10.31.2.11:4851]
      15:32:52,680 INFO [TreeCache] viewAccepted(): [10.31.4.242:4654|17] [10.31.4.24
      2:4654, 10.31.2.11:4857]
      15:32:56,477 INFO [TreeCache] locking the subtree at / to transfer state
      15:32:56,555 WARN [NAKACK] 10.31.4.242:4653] discarded message from non-member
      10.31.2.11:4852, my view is [10.31.4.242:4653|16] [10.31.4.242:4653]
      15:32:58,477 INFO [StateTransferGenerator_140] returning the state for tree roo
      ted in /(1024 bytes)
      15:32:58,477 WARN [NAKACK] 10.31.4.242:4653] discarded message from non-member
      10.31.2.11:4852, my view is [10.31.4.242:4653|16] [10.31.4.242:4653]
      15:33:07,604 INFO [TreeCache] viewAccepted(): [10.31.4.242:4654|18] [10.31.4.24
      2:4654, 10.31.2.11:4857, 10.31.2.85:13405]
      15:33:07,635 INFO [TreeCache] viewAccepted(): [10.31.4.242:4655|18] [10.31.4.24
      2:4655, 10.31.2.11:4851, 10.31.2.85:13406]
      15:33:09,792 WARN [NAKACK] 10.31.4.242:4653] discarded message from non-member
      10.31.2.11:4852, my view is [10.31.4.242:4653|16] [10.31.4.242:4653]
      15:33:12,574 WARN [GMS] failed to collect all ACKs (2) for view [10.31.4.242:46
      54|18] [10.31.4.242:4654, 10.31.2.11:4857, 10.31.2.85:13405] after 5000ms, missi
      ng ACKs from [10.31.4.242:4654, 10.31.2.11:4857] (received=[]), local_addr=10.31
      .4.242:4654
      15:33:12,605 WARN [GMS] failed to collect all ACKs (2) for view [10.31.4.242:46
      55|18] [10.31.4.242:4655, 10.31.2.11:4851, 10.31.2.85:13406] after 5000ms, missi
      ng ACKs from [10.31.4.242:4655, 10.31.2.11:4851] (received=[]), local_addr=10.31
      .4.242:4655
      15:33:23,372 WARN [NAKACK] 10.31.4.242:4653] discarded message from non-member
      10.31.2.85:13407, my view is [10.31.4.242:4653|16] [10.31.4.242:4653]
      15:33:52,877 INFO [TreeCache] viewAccepted(): MergeView::[10.31.2.11:4857|19] [
      10.31.2.11:4857, 10.31.2.85:13405, 10.31.4.242:4654], subgroups=[[10.31.2.85:134
      05|0] [10.31.2.85:13405], [10.31.4.242:4654|18] [10.31.4.242:4654, 10.31.2.11:48
      57, 10.31.2.85:13405]]
      15:33:55,706 INFO [TreeCache] viewAccepted(): MergeView::[10.31.2.11:4851|19] [
      10.31.2.11:4851, 10.31.2.85:13406, 10.31.4.242:4655], subgroups=[[10.31.2.85:134
      06|0] [10.31.2.85:13406], [10.31.4.242:4655|18] [10.31.4.242:4655, 10.31.2.11:48
      51, 10.31.2.85:13406]]
      15:33:57,831 INFO [TreeCache] viewAccepted(): MergeView::[10.31.2.11:4852|17] [
      10.31.2.11:4852, 10.31.2.85:13407, 10.31.4.242:4653], subgroups=[[10.31.2.11:485
      2|1] [10.31.2.11:4852, 10.31.2.85:13407], [10.31.4.242:4653|16] [10.31.4.242:465
      3]]
      15:33:57,847 WARN [GMS] failed to collect all ACKs (3) for view MergeView::[10.
      31.2.11:4857|19] [10.31.2.11:4857, 10.31.2.85:13405, 10.31.4.242:4654], subgroup
      s=[[10.31.2.85:13405|0] [10.31.2.85:13405], [10.31.4.242:4654|18] [10.31.4.242:4
      654, 10.31.2.11:4857, 10.31.2.85:13405]] after 5000ms, missing ACKs from [10.31.
      2.85:13405] (received=[10.31.4.242:4654, 10.31.2.11:4857]), local_addr=10.31.4.2
      42:4654
      15:34:00,675 WARN [GMS] failed to collect all ACKs (3) for view MergeView::[10.
      31.2.11:4851|19] [10.31.2.11:4851, 10.31.2.85:13406, 10.31.4.242:4655], subgroup
      s=[[10.31.2.85:13406|0] [10.31.2.85:13406], [10.31.4.242:4655|18] [10.31.4.242:4
      655, 10.31.2.11:4851, 10.31.2.85:13406]] after 5000ms, missing ACKs from [10.31.
      2.85:13406] (received=[10.31.4.242:4655, 10.31.2.11:4851]), local_addr=10.31.4.2
      42:4655