1 Reply Latest reply on Oct 22, 2019 9:29 AM by rhusar

Jboss HA 3 node cluster issue – 2 coordinators after restart

ct333083 Sep 29, 2019 5:01 PM

We are running Redhat single sign Red Hat Single Sign-On 7.3.1.GA (WildFly Core 6.0.12.Final-redhat-00001) on openshift cluster. It is setup to run as a deployment with 3 replicas.

When containers randomly restart or they are forced to restart by the operator, on some occasions (not always) I observed that cluster sometimes ends up in a state where I have two jgroup subgroups defined with 2 coordinators.

In the problem I am documenting I have following instances running:

tstsso-rhsso-5d4b857989-x2jlb

tstsso-rhsso-5d4b857989-nkk2m

tstsso-rhsso-5d4b857989-vvjjg

All the instances were restarted, which resulted in creation of new kubernetes PODs with new instance names. Start time of individual instance varied by couple seconds.

I am attaching logs from those instances as well as the configuration file.

The way I interpret those logs is the nodes established 2 coordinators which formed 2 following jgroup subgroups:

-rhsso-5d4b857989-x2jlb, -rhsso-5d4b857989-nkk2m, -rhsso-5d4b857989-vvjjg

and

-rhsso-5d4b857989-nkk2m, -rhsso-5d4b857989-vvjjg

Once cluster gets in such state, it never recover from it.

When similar state was observed in other environments, users reported having issues with Redhat SSO. I believe that it is the case with this cluster as well, I just don't have any clients using it. As I noted, when pod gets deleted and re-created, it changes the name. I am not sure whether it is source of the problem, so I rather mention it.

I would like to understand whether the issue is related to my configuration or it is some sort of bug.

Any help, directions and/or suggestions how to investigate the issue further are appreciated.

tstsso-rhsso-5d4b857989-nkk2m.log.zip 30.3 KB
tstsso-rhsso-5d4b857989-vvjjg.log.zip 8.6 KB
tstsso-rhsso-5d4b857989-x2jlb.log.zip 47.3 KB
standalone-openshift.xml.zip 6.5 KB

1. Re: Jboss HA 3 node cluster issue – 2 coordinators after restart

rhusar Oct 22, 2019 9:29 AM (in response to ct333083)
It looks as though your service is not configured with unready endpoints. Can you post your discovery kubernetes service definition?

service.alpha.kubernetes.io/tolerate-unready-endpoints: "true"
Actions