2 Replies Latest reply on Jul 26, 2013 10:10 AM by kristjan273

    Strange election result in HA Singleton Service

    kristjan273

      Using as7 (7.2.0.Final with added patch for HA Singleton Service quorum, ISPN 5.2.7 and jgroups 3.2.10) I am noticing some strange behavior on master election.

      In the log from one (out of four) node:

       

       

      14:25:33,158 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (ServerService Thread Pool -- 52) ISPN000078: Starting JGroups Channel

      14:25:33,169 INFO  [stdout] (ServerService Thread Pool -- 52)

      14:25:33,169 INFO  [stdout] (ServerService Thread Pool -- 52) -------------------------------------------------------------------

      14:25:33,169 INFO  [stdout] (ServerService Thread Pool -- 52) GMS: address=host-one:server-iv/singleton, cluster=singleton, physical address=10.10.10.46:7900

      14:25:33,169 INFO  [stdout] (ServerService Thread Pool -- 52) -------------------------------------------------------------------

       

      14:25:36,173 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 52) election results: {}

      14:25:36,683 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 52) election results: {host-one:server-iii/singleton=1}

      14:25:36,684 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 52) sending JOIN(host-one:server-iv/singleton) to host-one:server-iii/singleton

      14:25:36,804 DEBUG [org.jgroups.protocols.pbcast.NAKACK2] (ServerService Thread Pool -- 52)

      [host-one:server-iv/singleton setDigest()]

      existing digest:  []

      new digest:       host-one:server-iii/singleton: [0 (0)], host-one:server-one/singleton: [0 (0)], host-one:server-iv/singleton: [0 (0)], host-one:server-two/singleton: [0 (0)]

      resulting digest: host-one:server-one/singleton: [0 (0)], host-one:server-iii/singleton: [0 (0)], host-one:server-iv/singleton: [0 (0)], host-one:server-two/singleton: [0 (0)]

      14:25:36,804 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 52) host-one:server-iv/singleton: installing view [host-one:server-iii/singleton|1] [host-one:server-iii/singleton, host-one:server-one/singleton, host-one:server-iv/singleton, host-one:server-two/singleton]

      14:25:36,806 DEBUG [org.jgroups.protocols.FD_SOCK] (ServerService Thread Pool -- 52) VIEW_CHANGE received: [host-one:server-iii/singleton, host-one:server-one/singleton, host-one:server-iv/singleton, host-one:server-two/singleton]

      14:25:36,810 DEBUG [org.jgroups.protocols.pbcast.STABLE] (ServerService Thread Pool -- 52) [ergonomics] setting max_bytes to 1,6MB (4 members)

      14:25:36,874 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (ServerService Thread Pool -- 52) ISPN000094: Received new cluster view: [host-one:server-iii/singleton|1] [host-one:server-iii/singleton, host-one:server-one/singleton, host-one:server-iv/singleton, host-one:server-two/singleton]

      14:25:36,876 DEBUG [org.jgroups.protocols.FD_SOCK] (FD_SOCK pinger,singleton,host-one:server-iv/singleton) ping_dest is host-one:server-two/singleton, pingable_mbrs=[host-one:server-iii/singleton, host-one:server-one/singleton, host-one:server-iv/singleton, host-one:server-two/singleton]

      14:25:36,942 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (ServerService Thread Pool -- 52) ISPN000079: Cache local address is host-one:server-iv/singleton, physical addresses are [10.10.10.46:7900]

      14:25:36,944 INFO  [org.infinispan.factories.GlobalComponentRegistry] (ServerService Thread Pool -- 52) ISPN000128: Infinispan version: Infinispan 'Delirium' 5.2.7.Final

      14:25:36,966 INFO  [org.infinispan.factories.TransactionManagerFactory] (ServerService Thread Pool -- 52) ISPN000161: Using a batchMode transaction manager

      14:25:37,050 INFO  [org.infinispan.jmx.CacheJmxRegistration] (ServerService Thread Pool -- 52) ISPN000031: MBeans were successfully registered to the platform MBean server.

      14:25:37,118 INFO  [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 52) JBAS010281: Started default cache from singleton container

      14:25:37,120 DEBUG [org.jboss.as.clustering.infinispan.subsystem] (ServerService Thread Pool -- 52) default cache started

      14:25:37,128 DEBUG [org.jboss.as.clustering] (MSC service thread 1-8) ViewAccepted: initial members set for partition singleton: 1 (org.jboss.as.clustering.impl.CoreGroupCommunicationService$GroupView@2fbc0cbb)

      14:25:37,128 INFO  [org.jboss.as.clustering] (MSC service thread 1-8) JBAS010238: Number of cluster members: 4

      14:25:37,129 DEBUG [org.jboss.as.clustering] (MSC service thread 1-8) host-one:server-iii/singleton

      14:25:37,129 DEBUG [org.jboss.as.clustering] (MSC service thread 1-8) host-one:server-one/singleton

      14:25:37,129 DEBUG [org.jboss.as.clustering] (MSC service thread 1-8) host-one:server-iv/singleton

      14:25:37,129 DEBUG [org.jboss.as.clustering] (MSC service thread 1-8) host-one:server-two/singleton

      14:25:37,129 DEBUG [org.jboss.as.clustering.impl.CoreGroupCommunicationService$ViewChangeEventProcessor.AsynchViewChangeHandler] (AsynchViewChangeHandler Thread) Begin AsynchViewChangeHandler Thread

      14:25:37,147 ERROR [org.jboss.as.clustering.singleton] (ServerService Thread Pool -- 52) JBAS010346: Just reached required quorum of 2 for jboss.beesmart.ha.singleton.manager service. If this cluster loses another member, no node will be chosen to provide this service.

      14:25:37,148 INFO  [org.jboss.as.clustering.singleton] (ServerService Thread Pool -- 52) JBAS010342: host-one:server-iv/singleton elected as the singleton provider of the jboss.beesmart.ha.singleton.manager service

      14:25:37,148 INFO  [org.jboss.as.clustering.singleton] (ServerService Thread Pool -- 52) JBAS010340: This node will now operate as the singleton provider of the jboss.beesmart.ha.singleton.manager service

      14:25:37,149 ERROR [org.jboss.as.clustering.singleton] (notification-thread-0) JBAS010346: Just reached required quorum of 2 for jboss.beesmart.ha.singleton.manager service. If this cluster loses another member, no node will be chosen to provide this service.

      14:25:37,149 INFO  [org.jboss.as.clustering.singleton] (notification-thread-0) JBAS010342: host-one:server-one/singleton elected as the singleton provider of the jboss.beesmart.ha.singleton.manager service

      14:25:37,186 INFO  [com.acme.cluster.HAManagerService] (MSC service thread 1-7) Start HA Singleton manager service 'com.acme.cluster.HAManagerService'

       

      IMO Jgroups event was correct, server-iii was the oldest, and properly first in the member list. Weird is the line where singleton service takes over the election and 'ServerService thread' claims it elected (wrong one!) server-iv. Even more weird is the 'notification thread' which claims that server-one was elected. I am willing to put some effort and help fix this bug, but need some opinion from the code author (please ... ).

       

      Helpful info is the log from another node which IMO went proper election procedure:

       

      14:25:33,194 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (ServerService Thread Pool -- 51) ISPN000078: Starting JGroups Channel

      14:25:33,203 INFO  [stdout] (ServerService Thread Pool -- 51)

      14:25:33,203 INFO  [stdout] (ServerService Thread Pool -- 51) -------------------------------------------------------------------

      14:25:33,203 INFO  [stdout] (ServerService Thread Pool -- 51) GMS: address=host-one:server-iii/singleton, cluster=singleton, physical address=10.10.10.46:7800

      14:25:33,203 INFO  [stdout] (ServerService Thread Pool -- 51) -------------------------------------------------------------------

       

      14:25:36,207 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 51) election results: {}

      14:25:36,209 DEBUG [org.jgroups.protocols.pbcast.NAKACK2] (ServerService Thread Pool -- 51)

      [host-one:server-iii/singleton setDigest()]

      existing digest:  []

      new digest:       host-one:server-iii/singleton: [0 (0)]

      resulting digest: host-one:server-iii/singleton: [0 (0)]

      14:25:36,210 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 51) host-one:server-iii/singleton: installing view [host-one:server-iii/singleton|0] [host-one:server-iii/singleton]

      14:25:36,211 DEBUG [org.jgroups.protocols.pbcast.STABLE] (ServerService Thread Pool -- 51) resuming message garbage collection

      14:25:36,212 DEBUG [org.jgroups.protocols.FD_SOCK] (ServerService Thread Pool -- 51) VIEW_CHANGE received: [host-one:server-iii/singleton]

      14:25:36,217 DEBUG [org.jgroups.protocols.pbcast.STABLE] (ServerService Thread Pool -- 51) [ergonomics] setting max_bytes to 400KB (1 members)

      14:25:36,219 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (ServerService Thread Pool -- 51) ISPN000094: Received new cluster view: [host-one:server-iii/singleton|0] [host-one:server-iii/singleton]

      14:25:36,219 DEBUG [org.jgroups.protocols.pbcast.STABLE] (ServerService Thread Pool -- 51) resuming message garbage collection

      14:25:36,220 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 51) created group (first member). My view is [host-one:server-iii/singleton|0], impl is org.jgroups.protocols.pbcast.CoordGmsImpl

      14:25:36,271 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (ServerService Thread Pool -- 51) ISPN000079: Cache local address is host-one:server-iii/singleton, physical addresses are [10.10.10.46:7800]

      14:25:36,275 INFO  [org.infinispan.factories.GlobalComponentRegistry] (ServerService Thread Pool -- 51) ISPN000128: Infinispan version: Infinispan 'Delirium' 5.2.7.Final

      14:25:36,296 INFO  [org.infinispan.factories.TransactionManagerFactory] (ServerService Thread Pool -- 51) ISPN000161: Using a batchMode transaction manager

      14:25:36,365 INFO  [org.infinispan.jmx.CacheJmxRegistration] (ServerService Thread Pool -- 51) ISPN000031: MBeans were successfully registered to the platform MBean server.

      14:25:36,370 INFO  [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 51) JBAS010281: Started default cache from singleton container

      14:25:36,372 DEBUG [org.jboss.as.clustering.infinispan.subsystem] (ServerService Thread Pool -- 51) default cache started

      14:25:36,377 DEBUG [org.jboss.as.clustering] (MSC service thread 1-4) ViewAccepted: initial members set for partition singleton: 0 (org.jboss.as.clustering.impl.CoreGroupCommunicationService$GroupView@68c10853)

      14:25:36,377 INFO  [org.jboss.as.clustering] (MSC service thread 1-4) JBAS010238: Number of cluster members: 1

      14:25:36,377 DEBUG [org.jboss.as.clustering] (MSC service thread 1-4) host-one:server-iii/singleton

      14:25:36,378 DEBUG [org.jboss.as.clustering.impl.CoreGroupCommunicationService$ViewChangeEventProcessor.AsynchViewChangeHandler] (AsynchViewChangeHandler Thread) Begin AsynchViewChangeHandler Thread

      14:25:36,394 ERROR [org.jboss.as.clustering.singleton] (ServerService Thread Pool -- 51) JBAS010345: Failed to reach quorum of 2 for jboss.acme.ha.singleton.manager service. No singleton master will be elected.

      14:25:36,736 DEBUG [org.jgroups.protocols.pbcast.STABLE] (ViewHandler,singleton,host-one:server-iii/singleton) suspending message garbage collection

      14:25:36,737 DEBUG [org.jgroups.protocols.pbcast.STABLE] (ViewHandler,singleton,host-one:server-iii/singleton) resume task started, max_suspend_time=33000

      14:25:36,770 DEBUG [org.jgroups.protocols.pbcast.NAKACK2] (Incoming-1,shared=tcpab)

      [host-one:server-iii/singleton setDigest()]

      existing digest:  host-one:server-iii/singleton: [1 (1)]

      new digest:       host-one:server-iii/singleton: [0 (0)], host-one:server-one/singleton: [0 (0)], host-one:server-iv/singleton: [0 (0)], host-one:server-two/singleton: [0 (0)]

      resulting digest: host-one:server-one/singleton: [0 (0)], host-one:server-iii/singleton: [1 (1)], host-one:server-iv/singleton: [0 (0)], host-one:server-two/singleton: [0 (0)]

      14:25:36,771 DEBUG [org.jgroups.protocols.pbcast.GMS] (Incoming-1,shared=tcpab) host-one:server-iii/singleton: installing view [host-one:server-iii/singleton|1] [host-one:server-iii/singleton, host-one:server-one/singleton, host-one:server-iv/singleton, host-one:server-two/singleton]

      14:25:36,772 DEBUG [org.jgroups.protocols.FD_SOCK] (Incoming-1,shared=tcpab) VIEW_CHANGE received: [host-one:server-iii/singleton, host-one:server-one/singleton, host-one:server-iv/singleton, host-one:server-two/singleton]

      14:25:36,774 DEBUG [org.jgroups.protocols.pbcast.STABLE] (Incoming-1,shared=tcpab) [ergonomics] setting max_bytes to 1,6MB (4 members)

      14:25:36,774 DEBUG [org.jgroups.protocols.FD_SOCK] (FD_SOCK pinger,singleton,host-one:server-iii/singleton) ping_dest is host-one:server-one/singleton, pingable_mbrs=[host-one:server-iii/singleton, host-one:server-one/singleton, host-one:server-iv/singleton, host-one:server-two/singleton]

      14:25:36,775 INFO  [org.jboss.as.clustering] (Incoming-1,shared=tcpab) JBAS010225: New cluster view for partition singleton (id: 1, delta: 3, merge: false) : [host-one:server-iii/singleton, host-one:server-one/singleton, host-one:server-iv/singleton, host-one:server-two/singleton]

      14:25:36,776 DEBUG [org.jboss.as.clustering] (Incoming-1,shared=tcpab) dead members: []

      14:25:36,776 DEBUG [org.jboss.as.clustering] (Incoming-1,shared=tcpab) membership changed from 1 to 4

      14:25:36,776 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-1,shared=tcpab) ISPN000094: Received new cluster view: [host-one:server-iii/singleton|1] [host-one:server-iii/singleton, host-one:server-one/singleton, host-one:server-iv/singleton, host-one:server-two/singleton]

      14:25:36,777 DEBUG [org.jboss.as.clustering] (AsynchViewChangeHandler Thread) Begin notifyListeners, viewID: 1

      14:25:36,781 DEBUG [org.jboss.as.clustering] (AsynchViewChangeHandler Thread) End notifyListeners, viewID: 1

      14:25:36,876 DEBUG [org.jgroups.protocols.pbcast.STABLE] (ViewHandler,singleton,host-one:server-iii/singleton) resuming message garbage collection

      14:25:37,117 ERROR [org.jboss.as.clustering.singleton] (notification-thread-0) JBAS010346: Just reached required quorum of 2 for jboss.acme.ha.singleton.manager service. If this cluster loses another member, no node will be chosen to provide this service.

      14:25:37,118 INFO  [org.jboss.as.clustering.singleton] (notification-thread-0) JBAS010342: host-one:server-iii/singleton elected as the singleton provider of the jboss.acme.ha.singleton.manager service

      14:25:37,118 INFO  [org.jboss.as.clustering.singleton] (notification-thread-0) JBAS010340: This node will now operate as the singleton provider of the jboss.acme.ha.singleton.manager service

      14:25:37,186 INFO  [com.acme.cluster.HAManagerService] (MSC service thread 1-1) Start HA Singleton manager service 'com.acme.cluster.HAManagerService'

       

      But interesting as there are just 'notification thread' and no server service thread.