5 Replies Latest reply on Sep 22, 2016 2:20 AM by mnovak

Trouble getting Wildfly 10 HA Config to work!!

jbrios Sep 21, 2016 7:36 AM

(We ran those situations in both Wildfly's 10.0.0.Final and 10.1.0.Final versions)

Hello,

After breaking my brains out during almost 2 weeks I'm finally given up. The Cluster/HA implementation in Wildfly simply doesn't work for Enterprise solutions, but it works fantastic with simple examples provided by the documentation (get.jsp/put.jsp). After going deeper into the internet, searching in Wildfly's documentations and forums, I wasn't able to get our EAR working in a distributed environment using a Domain Configuration (that really sucks, and I'm kind/really frustrated by it - works great in Standalone mode). That doesn't mean I'm not appreciated with the work of thousands of good developers to get this version and product going, just that I think this solution should evolve a little more before going to the Corporate environment. Anyways, as I final resource, if there's some good spirits out there that by any chance already passed by those issues we had, here's a few details about our environment.

The EAR package contents (structure) are these:

My environment general architecture:

Each Wildfly server have just one server Node configured to it (host.xml), and the profile we are using is "full-ha" with "full-ha-sockets". The JMS queues are all defined in the Domain Controller, the other servers only point to it using a JMS bridge (works like a charm). The first situation arrives with the mod_cluster implementation, CentOS 7 doesn't have the symbols necessary for the mod_cluster.so 1.2.x library versions as described in the documentation (Clustering and Domain Setup Walkthrough - WildFly 10 - Project Documentation Editor - also, the Apache's configuration defined for the entry "LoadModule slotmem_module modules mod_slotmem.so" is incorrect, the correct one is cluster_slotmen_module, it doesn't work if you don't define it like this). So we followed a few steps and were able to download a newer version (1.3.1) and install it successfully in our environment. We ran the HA examples defined by the documentation and also a few of our own. Everything went okay, and the cluster was working and the session was replicated as expected between the server nodes. This is the Apache configuration we used to make the cluster working:

Listen 0.0.0.0:10001

ManagerBalancerName sigma_cluster

#MOD_CLUSTER CONFIGURATION !!!

Require all granted

ProxyPass balancer://sigma_cluster/

ProxyPassReverse balancer://sigma_cluster/

</Location>

ProxyPreserveHost On

KeepAliveTimeout 300

MaxKeepAliveRequests 0

AdvertiseFrequency 5

EnableMCPMReceive

ServerAdvertise on http://<local_ip>:10001

</VirtualHost>

--> "sigma_cluster" is the name of our server group configuration. Each server group in the configuration has only one server node definition.

Well, after a few stress testing in the environment, we actually deployed our EAR, and that's when the problems start occurring ( we run the same EAR in our QA servers. and the only difference between the Production and QA envs are that in QA we use a standalone configuration). I generalized a few of the situations we found:

For some reason, the Artemis broker doesn't seems to connect the EAR MDBs to the configured queues, e.g.: (ActiveMQNonExistentQueueException[errorType=QUEUE_DOES_NOT_EXIST message=AMQ119017: Queue jms.queue.PaymentQueue does not exist] - this happens in all the servers, including the one that is defined in the DM)
1. The queue is there (profile: full-ha): <jms-queue name="PaymentQ" entries="queue/PaymentQueue java:jboss/exported/jms/queues/PaymentQueue"/>
2. This error fill up the console, and it doesn't stop until we finish the server (we are running Wildfly as an OS service/systemd).
For some reason, sometimes a few resources return with a 503 server error and sometimes don't - this is very erratic and for several resources, but it makes the environment very unstable.
The session is not being replicated between the server nodes defined in the server group configuration (yes we defined the <distributable/> tag in the web.xml for all WAR projects) - we deactivated the option "session sickness" in the domain.xml configuration.

If someone could help, that would be great, but I really hope this could be helpful for someone in the future, to start from where we left.

Best to all, keep'n coding!

1. Re: Wildfly 10 HA Config Doesn't work!!

jbertram Sep 20, 2016 12:52 PM (in response to jbrios)

I can only address issues related to Artemis since I don't work on the application server itself. I believe your issue is that the JNDI configuration/lookup doesn't contain the proper prefix. Try adding "java:" to it. Or you can simply reference the implementation name of the queue (i.e. "PaymentQ") from the MDB's activation configuration properties.
1 of 1 people found this helpful
Actions
2. Re: Wildfly 10 HA Config Doesn't work!!

jbrios Sep 20, 2016 9:02 PM (in response to jbertram)

Hey Justin, thank you for your feedback, I did the changes you suggested (changed the name of the queue in the MDB reference to "PaymentQ"), but still the error persists. What's really strange is that the error starts as a "DEBUG" entry and then throws and fills up the stack trace, so I really don't know if this is something to really be concerned about or not:

[Server:slave-server-one] 2016-09-19 00:49:05 DEBUG server:506 - Sending exception to client
[Server:slave-server-one] ActiveMQNonExistentQueueException[errorType=QUEUE_DOES_NOT_EXIST message=AMQ119017: Queue jms.queue.PaymentQueue does not exist]
[Server:slave-server-one] at org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.createConsumer(ServerSessionImpl.java:408)
[Server:slave-server-one] at org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.createConsumer(ServerSessionImpl.java:396)
[Server:slave-server-one] at org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler.handlePacket(ServerSessionPacketHandler.java:208)
[Server:slave-server-one] at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.handlePacket(ChannelImpl.java:567)
[Server:slave-server-one] at org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl.doBufferReceived(RemotingConnectionImpl.java:349)
[Server:slave-server-one] at org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl.bufferReceived(RemotingConnectionImpl.java:331)
[Server:slave-server-one] at org.apache.activemq.artemis.core.remoting.server.impl.RemotingServiceImpl$DelegatingBufferHandler.bufferReceived(RemotingServiceImpl.java:605)
[Server:slave-server-one] at org.apache.activemq.artemis.core.remoting.impl.invm.InVMConnection$1.run(InVMConnection.java:171)
[Server:slave-server-one] at org.apache.activemq.artemis.utils.OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run(OrderedExecutorFactory.java:100)
[Server:slave-server-one] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[Server:slave-server-one] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[Server:slave-server-one] at java.lang.Thread.run(Thread.java:745)

The other errors continue...
Actions
3. Re: Wildfly 10 HA Config Doesn't work!!

jbertram Sep 20, 2016 10:12 PM (in response to jbrios)
Couple of things...
Anything logged at DEBUG level is, by definition, not an error (even if it includes a stack-trace).
The exception indicates that something is attempting to create a consumer on the queue "jms.queue.PaymentQueue." Assuming you changed all your MDBs to use "PaymentQ" as their "destination" then that indicates to me some other component or application is still trying to use "PaymentQueue" somewhere.
Do your MDBs receive messages sent to PaymentQ now?
Actions
4. Re: Wildfly 10 HA Config Doesn't work!!

jbrios Sep 21, 2016 7:30 AM (in response to jbertram)

Hey Justin, thank you for your feedback. Yes and no is the answer. We have another test project that correctly connects to the MDB and "puts" and "gets" messages from and to it (no errors in the console). On our actually project (EAR described above) we couldn't pass the phase of loading any page (because of the errors I described above), so no we didn't test it. The MOM architecture in the EAR is very simple, we have only two MDBs, one for processing backend/batch payments and the other for sending emails, so nothing really complicated. We removed the DEBUG entry from the logging configuration and the error disappeared, so I assume this is just a pin point for something I don't understand.

Thanks,
Actions
5. Re: Trouble getting Wildfly 10 HA Config to work!!

mnovak Sep 22, 2016 2:20 AM (in response to jbrios)

Hi Joao,

could you share configuration of messaging-activemq subsystem from "full-ha" profile? afaik by default "destination" activation config property should contain JNDI name of the queue.

If the MDB is working then Justin is right that there seems to be something else trying to create consumer on queue.

Thanks,
Mirek
Actions

Go to original post