Wildfly 8.1 failover occasionally fails, indicated by JGRP000032 messages
scarpent Mar 25, 2015 3:57 PMI'm using Wildfly 8.1 with standalone HA clustering, and jgroups config like so:
<stack name="tcp">
<transport type="TCP" socket-binding="jgroups-tcp"/>
<protocol type="TCPPING">
<property name="initial_hosts">${jgroups.tcpping.initial_hosts}</property>
<property name="port_range">1</property>
<property name="num_initial_members">3</property>
</protocol>
<protocol type="MERGE2"/>
<protocol type="FD_SOCK" socket-binding="jgroups-tcp-fd"/>
<protocol type="FD"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2">
<property name="use_mcast_xmit">false</property>
<property name="use_mcast_xmit_req">false</property>
</protocol>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="MFC"/>
<protocol type="FRAG2"/>
<protocol type="RSVP"/>
</stack>
(Using TCP since multicast not available in AWS.)
I have two servers in my dev environment, and normally things work as expected. When I bring up the second server, I see this in the log of the first:
2015-03-25 13:01:34,367 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-18,shared=tcp) ISP
N000094: Received new cluster view: [node-admin-app1/web|5] (2) [node-admin-app1/web, node-admin-app2/web]
And when shutting down a server I'll see something like this in the log of the remaining one:
2015-03-25 13:02:56,796 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-8,shared=tcp) ISPN
000094: Received new cluster view: [node-admin-app2/web|6] (1) [node-admin-app2/web]
And with that I've confirmed in my application that the session properly failed over.
Occasionally when shutting down the server, I won't see the "new cluster view" entry, and instead will get a bunch of messages like this:
2015-03-25 12:54:33,269 WARN [org.jgroups.protocols.TCP] (Timer-2,shared=tcp) JGRP000032: null: no physical address for node-admin-app1/web, dropping message
They eventually stop. But when I bring up the other server, I again don't see the expected cluster view message, and failover does not work when again a server is shutdown. The session is borked.
Please let me know if more information will help. Thank you!