JRMPProxyFactory in deploy-hasingleton
tylerblack Mar 27, 2006 1:42 PMJBoss 4.0.2 (build: CVSTag=JBoss_4_0_2 date=200505022023)
We have a partition with 3 nodes (Node1, Node2, Node3). There is a singleton mbean running within the cluster that acts like a session cache. All nodes in the cluster service requests from clients, and all requests must be validated against the cache.
The singleton mbean is deployed in the /farm directory on all nodes. The proxy is deployed in the /deploy-hasingleton directory on all nodes.
A couple days ago, Node1 was running as the master node. Something happened that caused it to become shunned by the cluster and the singleton started on Node2. Node2 and Node3 reconnected to the new Proxy running on Node2 without skipping a beat. However, Node1, after it had left and rejoined the cluster, was still trying (and failing) to connect to itself. That's bad.
We had to shutdown JBoss on Node1 manually, and restart it. As it joined the cluster, it found the Proxy running on Node2 and all was well again.
How can we ensure that those services deployed in deploy-hasingleton get stopped when the masternode changes? Here is the jboss-service.xml file for the JRMPProxyFactory service:
"presenceProxy.sar" wrote:
<server>
<mbean code="org.jboss.invocation.jrmp.server.JRMPProxyFactory"
name="jboss.jmx:type=adaptor,name=SingletonInvoker,protocol=jrmp,service=serverSessionProxyFactory">
<depends>h2st:service=ServerSession</depends>
<depends optional-attribute-name="InvokerName">jboss:service=invoker,type=jrmp</depends>
<depends optional-attribute-name="TargetName">h2st:service=ServerSession</depends>
<attribute name="JndiName">jmx/invoker/ServerSessionSingletonRMIAdaptor</attribute>
<attribute name="InvokeTargetMethod">true</attribute>
<attribute name="ExportedInterfaces">com.how2share.pixposerver.mbean.ServerSessionMBean</attribute>
<attribute name="ClientInterceptors">
<interceptors>
<interceptor>org.jboss.proxy.ClientMethodInterceptor</interceptor>
<interceptor>org.jboss.invocation.InvokerInterceptor</interceptor>
</interceptors>
</attribute>
</mbean>
</server>
Below, I've listed the sequence of events as seen by the logfiles. Continue reading if you get off on combing through log snippets.
Node1 detects that it is being shunned and leaves then rejoins the cluster...
"Node1" wrote:
2006-03-26 04:22:55,535 WARN [org.jgroups.protocols.FD] I was suspected, but will not remove myself from membership (waiting for EXIT message)
2006-03-26 04:22:58,045 WARN [org.jgroups.protocols.FD] I was suspected, but will not remove myself from membership (waiting for EXIT message)
2006-03-26 04:22:58,552 WARN [org.jgroups.protocols.pbcast.CoordGmsImpl] I am the coord and I'm being am suspected -- will probably leave shortly
2006-03-26 04:22:58,552 INFO [org.jboss.ha.framework.interfaces.HAPartition.lifecycle.PixpoPresencePartition] Suspected member: Node1:34438 (additional data: 19 bytes)
2006-03-26 04:22:58,736 WARN [org.jgroups.protocols.pbcast.GMS] checkSelfInclusion() failed, Node1:34438 (additional data: 19 bytes) is not a member of view [Node2:46014 (additional data: 19 bytes
)|101] [Node2:46014 (additional data: 19 bytes), Node3:33003 (additional data: 19 bytes)]; discarding view
2006-03-26 04:22:58,737 WARN [org.jgroups.protocols.pbcast.GMS] I (Node1:34438 (additional data: 19 bytes)) am being shunned, will leave and rejoin group (prev_members are [Node3:32981 (additional
data: 19 bytes) Node2:45898 (additional data: 19 bytes) Node1:34438 (additional data: 19 bytes) Node2:46014 (additional data: 19 bytes) Node3:32998 (additional data: 19 bytes) GTBiznod
e03:33003 (additional data: 19 bytes) ])
Our singleton stops
"Node1" wrote:
2006-03-26 04:23:02,017 INFO [com.how2share.pixposerver.mbean.ServerSession] Stopped singleton.
Node2 takes over as the master. (We log this as an error so we notice a fail-over). It receives a request and reconnects to the new cache.
"Node2" wrote:
2006-03-26 04:22:58,661 INFO [org.jboss.ha.framework.interfaces.HAPartition.PixpoPresencePartition] Suspected member: Node1:34438 (additional data: 19 bytes)
2006-03-26 04:22:58,676 INFO [org.jboss.ha.framework.interfaces.HAPartition.lifecycle.PixpoPresencePartition] New cluster view for partition PixpoPresencePartition (id: 101, delta: -1) : [Node2:1099, Node3:1099]
2006-03-26 04:22:58,895 ERROR [com.how2share.pixposerver.mbean.ServerSession] ServerSession Singleton started.
2006-03-26 04:23:01,870 INFO [org.jboss.ha.framework.interfaces.HAPartition.lifecycle.PixpoPresencePartition] New cluster view for partition PixpoPresencePartition (id: 102, delta: 1) : [Node2:1099, Node3:1099, Node1:1099]
...
2006-03-26 04:23:02,229 WARN [com.how2share.pixposerver.web.MSHServlet] Exception caught in getSessionFromCache(): null object name
2006-03-26 04:23:02,229 INFO [com.how2share.pixposerver.web.MSHServlet] Getting a fresh instance of serverSessionMBean
2006-03-26 04:23:02,229 INFO [com.how2share.pixposerver.web.MSHServlet] Getting a fresh instance of presenceContext
2006-03-26 04:23:02,230 WARN [com.how2share.pixposerver.web.MSHServlet] Reconnected to cache.
Node3 receives a request and reconnects to the new cache on Node2.
"Node3" wrote:
2006-03-26 04:23:12,069 WARN [com.how2share.pixposerver.web.MSHServlet] Exception caught in getSessionFromCache(): null object name
2006-03-26 04:23:12,069 INFO [com.how2share.pixposerver.web.MSHServlet] Getting a fresh instance of serverSessionMBean
2006-03-26 04:23:12,069 INFO [com.how2share.pixposerver.web.MSHServlet] Getting a fresh instance of presenceContext
2006-03-26 04:23:12,097 WARN [com.how2share.pixposerver.web.MSHServlet] Reconnected to cache.
Node1 continues to try and connect via the old JRMPProxy. This is indicated by the following series of messages, which repeat until we manually shutdown the server.
"Node1" wrote:
2006-03-26 04:23:09,920 WARN [com.how2share.pixposerver.web.MSHServlet] Exception caught reconnecting to cache(): null
2006-03-26 04:23:09,920 INFO [com.how2share.pixposerver.web.MSHServlet] Getting a fresh instance of serverSessionMBean
2006-03-26 04:23:09,921 INFO [com.how2share.pixposerver.web.MSHServlet] Getting a fresh instance of presenceContext
2006-03-26 04:23:09,925 WARN [com.how2share.pixposerver.web.MSHServlet] Exception caught reconnecting to cache(): null
2006-03-26 04:23:09,926 ERROR [com.how2share.pixposerver.web.MSHServlet] Could not reconnect to cache.