Wildfly 8.1 failover occasionally fails, indicated by JGRP000032 messages
scarpent Mar 25, 2015 3:57 PMI'm using Wildfly 8.1 with standalone HA clustering, and jgroups config like so:
<stack name="tcp"> <transport type="TCP" socket-binding="jgroups-tcp"/> <protocol type="TCPPING"> <property name="initial_hosts">${jgroups.tcpping.initial_hosts}</property> <property name="port_range">1</property> <property name="num_initial_members">3</property> </protocol> <protocol type="MERGE2"/> <protocol type="FD_SOCK" socket-binding="jgroups-tcp-fd"/> <protocol type="FD"/> <protocol type="VERIFY_SUSPECT"/> <protocol type="pbcast.NAKACK2"> <property name="use_mcast_xmit">false</property> <property name="use_mcast_xmit_req">false</property> </protocol> <protocol type="UNICAST3"/> <protocol type="pbcast.STABLE"/> <protocol type="pbcast.GMS"/> <protocol type="MFC"/> <protocol type="FRAG2"/> <protocol type="RSVP"/> </stack>
(Using TCP since multicast not available in AWS.)
I have two servers in my dev environment, and normally things work as expected. When I bring up the second server, I see this in the log of the first:
2015-03-25 13:01:34,367 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-18,shared=tcp) ISP
N000094: Received new cluster view: [node-admin-app1/web|5] (2) [node-admin-app1/web, node-admin-app2/web]
And when shutting down a server I'll see something like this in the log of the remaining one:
2015-03-25 13:02:56,796 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-8,shared=tcp) ISPN
000094: Received new cluster view: [node-admin-app2/web|6] (1) [node-admin-app2/web]
And with that I've confirmed in my application that the session properly failed over.
Occasionally when shutting down the server, I won't see the "new cluster view" entry, and instead will get a bunch of messages like this:
2015-03-25 12:54:33,269 WARN [org.jgroups.protocols.TCP] (Timer-2,shared=tcp) JGRP000032: null: no physical address for node-admin-app1/web, dropping message
They eventually stop. But when I bring up the other server, I again don't see the expected cluster view message, and failover does not work when again a server is shutdown. The session is borked.
Please let me know if more information will help. Thank you!