2 Replies Latest reply on Mar 25, 2015 3:54 PM by scarpent

Wildfly 8.1 failover occasionally fails, indicated by JGRP000032 messages

scarpent Mar 25, 2015 3:57 PM

I'm using Wildfly 8.1 with standalone HA clustering, and jgroups config like so:

<stack name="tcp">
    <transport type="TCP" socket-binding="jgroups-tcp"/>
    <protocol type="TCPPING">
        <property name="initial_hosts">${jgroups.tcpping.initial_hosts}</property>
        <property name="port_range">1</property>
        <property name="num_initial_members">3</property>
    </protocol>
    <protocol type="MERGE2"/>
    <protocol type="FD_SOCK" socket-binding="jgroups-tcp-fd"/>
    <protocol type="FD"/>
    <protocol type="VERIFY_SUSPECT"/>
    <protocol type="pbcast.NAKACK2">
        <property name="use_mcast_xmit">false</property>
        <property name="use_mcast_xmit_req">false</property>
    </protocol>
    <protocol type="UNICAST3"/>
    <protocol type="pbcast.STABLE"/>
    <protocol type="pbcast.GMS"/>
    <protocol type="MFC"/>
    <protocol type="FRAG2"/>
    <protocol type="RSVP"/>
</stack>

(Using TCP since multicast not available in AWS.)

I have two servers in my dev environment, and normally things work as expected. When I bring up the second server, I see this in the log of the first:

2015-03-25 13:01:34,367 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-18,shared=tcp) ISP

N000094: Received new cluster view: [node-admin-app1/web|5] (2) [node-admin-app1/web, node-admin-app2/web]

And when shutting down a server I'll see something like this in the log of the remaining one:

2015-03-25 13:02:56,796 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (Incoming-8,shared=tcp) ISPN

000094: Received new cluster view: [node-admin-app2/web|6] (1) [node-admin-app2/web]

And with that I've confirmed in my application that the session properly failed over.

Occasionally when shutting down the server, I won't see the "new cluster view" entry, and instead will get a bunch of messages like this:

2015-03-25 12:54:33,269 WARN [org.jgroups.protocols.TCP] (Timer-2,shared=tcp) JGRP000032: null: no physical address for node-admin-app1/web, dropping message

They eventually stop. But when I bring up the other server, I again don't see the expected cluster view message, and failover does not work when again a server is shutdown. The session is borked.

Please let me know if more information will help. Thank you!

1. Re: Wildfly 8.1 failover occasionally fails, indicated by JGRP000032 messages

pferraro Mar 25, 2015 2:14 PM (in response to scarpent)

This was addressed since JGroups 3.5 - which can be found in WF 8.2 and WF9.
See https://issues.jboss.org/browse/JGRP-1814
Actions
2. Re: Wildfly 8.1 failover occasionally fails, indicated by JGRP000032 messages

scarpent Mar 25, 2015 3:54 PM (in response to pferraro)

Thank you, Paul! Is there anything that can be done to mitigate this in 8.1?

(We've had a prolonged upgrade from JBoss 5.1 to WF8.1 and Richfaces 3.3 to 4.3, and last I tried it, WF8.2 caused a problem with Richfaces that is then resolved in 4.5, and we'd really like to deploy this thing before another round of upgrades.)

Thanks again!
Actions

Go to original post