1 2 Previous Next 18 Replies Latest reply on May 24, 2017 9:20 AM by mnovak

Some messages are not accessible to backup server with message configuration through data replication

wwang2016 May 17, 2017 4:55 PM

Hi,

I was testing both shared store and data replication configuration for wildfly (active) - wildfly (active). Each wildfly instance has a live-backup pair.

Test scenario:

start up wildfly instance #1

start up wildfly instance #2

send requests to wildfly instance #1, observe both instance#1 and instance#2 working to process requests/messages sent to instance#1.

shut down wildfly instance #1

check message processing to find out if all messages are completed processed (for example, queued message not processed by instance #1 were processed by instance #2)

This test was working fine with shared store approach.

However, with data replication approach, I found there are some messages not accessible to instance #2 once instance #1 is down. When I restart instance #1, the remaining messages were processed by instance#1. Is this an issue with overall configuration (live-backup) or some additional parameter to ensure messages get replicated during shutdown of instance#1?

Thanks,

Wayne

1. Re: Some messages are not accessible to backup server with message configuration through data replication

mnovak May 18, 2017 6:47 AM (in response to wwang2016)

Hi Wayne, could you share your config?

Thanks,
Mirek
Actions
2. Re: Some messages are not accessible to backup server with message configuration through data replication

wwang2016 May 18, 2017 8:51 AM (in response to mnovak)

Hi Mirek,

I am setting up wildfly (active) - wildfly (active) and each wildfly instance has a live-backup pair. The configuration was mostly based on the following link
Chapter 29. High Availability - Red Hat Customer Portal

Note: The differences are highlighted

The configuration of the first wildfly instance:

        <subsystem xmlns="urn:jboss:domain:messaging-activemq:1.0">
            <server name="default">
                <security enabled="false"/>
                <cluster password="password"/>
                <replication-master check-for-live-server="true" group-name="group1" cluster-name="my-cluster"/>
                <security-setting name="#">
                    <role name="guest" delete-non-durable-queue="true" create-non-durable-queue="true" consume="true" send="true"/>
                </security-setting>
                <address-setting name="#" redistribution-delay="1000" message-counter-history-day-limit="10" page-size-bytes="2097152" max-size-bytes="10485760" expiry-address="jms.queue.ExpiryQueue" dead-letter-address="jms.queue.DLQ"/>
                <http-connector name="http-connector" endpoint="http-acceptor" socket-binding="http"/>
                <http-connector name="http-connector-throughput" endpoint="http-acceptor-throughput" socket-binding="http">
                    <param name="batch-delay" value="50"/>
                </http-connector>
                <remote-connector name="netty" socket-binding="messaging">
                    <param name="use-nio" value="true"/>
                    <param name="use-nio-global-worker-pool" value="true"/>
                </remote-connector>
                <in-vm-connector name="in-vm" server-id="0"/>
                <http-acceptor name="http-acceptor" http-listener="default"/>
                <http-acceptor name="http-acceptor-throughput" http-listener="default">
                    <param name="batch-delay" value="50"/>
                    <param name="direct-deliver" value="false"/>
                </http-acceptor>
                <remote-acceptor name="netty" socket-binding="messaging">
                    <param name="use-nio" value="true"/>
                </remote-acceptor>
                <in-vm-acceptor name="in-vm" server-id="0"/>
                <broadcast-group name="bg-group1" connectors="netty" jgroups-channel="activemq-cluster"/>
                <discovery-group name="dg-group1" jgroups-channel="activemq-cluster"/>
                <cluster-connection name="my-cluster" discovery-group="dg-group1" retry-interval="1000" connector-name="netty" address="jms"/>
                <jms-queue name="ExpiryQueue" entries="java:/jms/queue/ExpiryQueue"/>
                <jms-queue name="DLQ" entries="java:/jms/queue/DLQ"/>
                <connection-factory name="InVmConnectionFactory" entries="java:/ConnectionFactory" connectors="in-vm"/>
                <connection-factory name="RemoteConnectionFactory" reconnect-attempts="-1" retry-interval-multiplier="1.0" retry-interval="1000" block-on-acknowledge="true" ha="true" entries="java:jboss/exported/jms/RemoteConnectionFactory" connectors="netty"/>
                <pooled-connection-factory name="activemq-ra" transaction="xa" entries="java:/JmsXA java:jboss/DefaultJMSConnectionFactory" connectors="in-vm"/>
            </server>
            <server name="backup">
                <cluster password="password"/>
                <replication-slave group-name="group2" cluster-name="my-cluster"/>
               <bindings-directory path="../../../activemq/bindings-A"/>
                <journal-directory path="../../../activemq/journal-A"/>
                <large-messages-directory path="../../../activemq/largemessages-A"/>
                <paging-directory path="../../../activemq/paging-A"/>
                <remote-connector name="netty" socket-binding="messaging-backup"/>
                <remote-acceptor name="netty" socket-binding="messaging-backup"/>
                <broadcast-group name="bg-group1" connectors="netty" jgroups-channel="activemq-cluster"/>
                <discovery-group name="dg-group-backup" jgroups-channel="activemq-cluster"/>
                <cluster-connection name="my-cluster" retry-interval="1000" connector-name="netty" address="jms"/>
            </server>
        </subsystem>

The configuration of the second wildfly instance:

        <subsystem xmlns="urn:jboss:domain:messaging-activemq:1.0">
            <server name="default">
                <security enabled="false"/>
                <cluster password="password"/>
                <replication-master check-for-live-server="true" group-name="group2" cluster-name="my-cluster"/>
                <security-setting name="#">
                    <role name="guest" delete-non-durable-queue="true" create-non-durable-queue="true" consume="true" send="true"/>
                </security-setting>
                <address-setting name="#" redistribution-delay="1000" message-counter-history-day-limit="10" page-size-bytes="2097152" max-size-bytes="10485760" expiry-address="jms.queue.ExpiryQueue" dead-letter-address="jms.queue.DLQ"/>
                <http-connector name="http-connector" endpoint="http-acceptor" socket-binding="http"/>
                <http-connector name="http-connector-throughput" endpoint="http-acceptor-throughput" socket-binding="http">
                    <param name="batch-delay" value="50"/>
                </http-connector>
                <remote-connector name="netty" socket-binding="messaging">
                    <param name="use-nio" value="true"/>
                    <param name="use-nio-global-worker-pool" value="true"/>
                </remote-connector>
                <in-vm-connector name="in-vm" server-id="0"/>
                <http-acceptor name="http-acceptor" http-listener="default"/>
                <http-acceptor name="http-acceptor-throughput" http-listener="default">
                    <param name="batch-delay" value="50"/>
                    <param name="direct-deliver" value="false"/>
                </http-acceptor>
                <remote-acceptor name="netty" socket-binding="messaging">
                    <param name="use-nio" value="true"/>
                </remote-acceptor>
                <in-vm-acceptor name="in-vm" server-id="0"/>
                <broadcast-group name="bg-group1" connectors="netty" jgroups-channel="activemq-cluster"/>
                <discovery-group name="dg-group1" jgroups-channel="activemq-cluster"/>
                <cluster-connection name="my-cluster" discovery-group="dg-group1" retry-interval="1000" connector-name="netty" address="jms"/>
                <jms-queue name="ExpiryQueue" entries="java:/jms/queue/ExpiryQueue"/>
                <jms-queue name="DLQ" entries="java:/jms/queue/DLQ"/>
                <connection-factory name="InVmConnectionFactory" entries="java:/ConnectionFactory" connectors="in-vm"/>
                <connection-factory name="RemoteConnectionFactory" reconnect-attempts="-1" retry-interval-multiplier="1.0" retry-interval="1000" block-on-acknowledge="true" ha="true" entries="java:jboss/exported/jms/RemoteConnectionFactory" connectors="netty"/>
                <pooled-connection-factory name="activemq-ra" transaction="xa" entries="java:/JmsXA java:jboss/DefaultJMSConnectionFactory" connectors="in-vm"/>
            </server>
            <server name="backup">
                <cluster password="password"/>
                <replication-slave group-name="group1" cluster-name="my-cluster"/>
                <bindings-directory path="../../../activemq/bindings-B"/>
                <journal-directory path="../../../activemq/journal-B"/>
                <large-messages-directory path="../../../activemq/largemessages-B"/>
                <paging-directory path="../../../activemq/paging-B"/>
                <remote-connector name="netty" socket-binding="messaging-backup"/>
                <remote-acceptor name="netty" socket-binding="messaging-backup"/>
                <broadcast-group name="bg-group1" connectors="netty" jgroups-channel="activemq-cluster"/>
                <discovery-group name="dg-group-backup" jgroups-channel="activemq-cluster"/>
                <cluster-connection name="my-cluster" connector-name="netty" address="jms" retry-interval="1000"/>
            </server>
        </subsystem>
Actions
3. Re: Some messages are not accessible to backup server with message configuration through data replication

mnovak May 18, 2017 9:10 AM (in response to wwang2016)

I don't see a problem in your configuration.

What might happen is that when messages are redistributed from WF1 -> WF2 then they're first send to sf.cluster... queue on WF1 which then forwards them to WF2. Those messages are in "in-flight" state and I suspect that backup on WF2 will not deliver them when it's activated. It might be also a bug.
Actions
4. Re: Some messages are not accessible to backup server with message configuration through data replication

wwang2016 May 18, 2017 9:51 AM (in response to mnovak)

Hi Mirek,

The following was what I observed for shared-store approach:

start up wildfly instance #1
start up wildfly instance #2
send requests to wildfly instance #1, observe both instance#1 and instance#2 working to process requests/messages sent to instance#1.
shut down wildfly instance #1
instance #2 continued to processed a few messages after instance#1 was down completely
check server log files from both servers and the number of requests sent to the instance#1. The sum of number of messages processed matched that of the total number of requests sent to instance #1

The following was what I observed for replication approach:

start up wildfly instance #1
start up wildfly instance #2
send requests to wildfly instance #1, observe both instance#1 and instance#2 working to process requests/messages sent to instance#1.
shut down wildfly instance #1
instance #2 DID NOT continued to processed ANY messages after instance#1 was down completely
check server log files from both servers and the number of requests sent to the instance#1. The sum of number of messages processed DID NOT matched that of the total number of requests sent to instance #1

Is it possible that the queued messages did not get replicated to the backup server of instance #1?
or
Is it possible that the queue messages got replicated to backup server of instance #1, but not accessible to instance #2?

Thanks,

Wayne
Actions
5. Re: Some messages are not accessible to backup server with message configuration through data replication

mnovak May 18, 2017 9:58 AM (in response to wwang2016)

Sure, attach the logs. Also how are you processing messages/requests on instance #1 and instance #2. Is there MDB on queue/topic?
Actions
6. Re: Some messages are not accessible to backup server with message configuration through data replication

wwang2016 May 18, 2017 10:35 AM (in response to mnovak)

Hi Mirek,

The attached are:
(1) server.log.1 (wildfly 10 instance#1)
(2) server.log.2 (wildfly 10 instance#2)
(3) screenshot (wildfly-helloworld-mdb_HelloWorldMDBservletclient.png)
(4) server.log.1.restart (restart wildfly 10 instance #1)

This is the out-of-box (quickstart) sample for testing sending messages to a queue to which a MDB is listening.

Modifications:

(1) I add some log messages to show the number of messages processed.
(2) I also modified the program so that it sends 10000 message instead of the default 5 messages.
(3) I also deploy it to two instances of a cluster so that I can send requests to one instance and see how messages are processed by two instance once I set up HA for messaging.

Test:
send requests to instance #1

Test results:
(1) a total of 2672 messages were created and sent to instance #1
(2) server.log.1 showed 1301 messages processed (search for string "Received Message from queue")
(3) server.log.2 showed 1358 messages processed (same as above)

The number of missing messages: 2672 - (1301 + 1358) = 13

Once I restart instance #1, 13 messages were processed, and the total number of message processed in instance #1 became 1314.

You can check the following links for log and image

https://developer.jboss.org/wiki/Wildfly-helloworld-mdbHelloWorldMDBServletClientpng
https://developer.jboss.org/wiki/Serverlog1
https://developer.jboss.org/wiki/Serverlog2
https://developer.jboss.org/wiki/Serverlog1restart
Actions
7. Re: Some messages are not accessible to backup server with message configuration through data replication

mnovak May 19, 2017 8:30 AM (in response to wwang2016)

Thanks Wayne! I wrote a test for this but could not reproduce issue with latest WF11 (nightly build). I'll try with WF10.
Actions
8. Re: Some messages are not accessible to backup server with message configuration through data replication

wwang2016 May 19, 2017 8:33 AM (in response to mnovak)

Let me know your test result with WF10.

Thanks

Wayne
Actions
9. Re: Some messages are not accessible to backup server with message configuration through data replication

mnovak May 19, 2017 10:07 AM (in response to wwang2016)

Looks like that I'm doing something differently. I cannot reproduce it with WF10. Could you describe in detail your use case? When instance #1 is shutting down, is servlet still sending messages? I think I miss something in the test scenario.
Actions
10. Re: Some messages are not accessible to backup server with message configuration through data replication

wwang2016 May 19, 2017 10:24 AM (in response to mnovak)

Hi Mirek,

I sent requests directly to instance #1.

The requests through the servlet are converted to messages and send to the queue (through a loop of 10000 messages). When instance #1 is down in the middle of sending messages, the servlet (hosted on instance #1) will not be available, so only a portion of the 10000 messages will be created and sent. This is expected.

However, I was expecting the instance #2 will pick up un-processed messages in the cluster queue, and it will continue to process the messages even when instance #1 is completely down. I am not expecting more message coming since instance #1 is already done.

I created this use case to verify the messages to be failed over to another instance that can access the cluster queue. If I send requests to a load balancer, it less obvious to spot the problem. When I send requests to one specific instance, I can see the issue all the time with data replication approach, but not shared store approach.

It looks like the some of the messages are either not replicated to the backup server or the backup server may not be accessible to the other instance.

Note: while the request is sent to instance #1, I can see both instance #1 and instance #2 were processing the messages while instance #1 was live.

Feel free to let me know if there is any confusion.

Thanks,

Wayne
Actions
11. Re: Some messages are not accessible to backup server with message configuration through data replication

wwang2016 May 19, 2017 11:01 AM (in response to wwang2016)

Hi Mirek,

I understand that in a cluster configuration (active-active), we will never send requests to a single wildfly instance directly, and we send requests to load balancer.

In this case, if a wildfly instance is down, the request will be re-directed to another live wildfly instance, and we should not have a scenario where a request will be sent to a wildfly instance that is in the middle of shutting down.

In all my test with requests sent to the load balancer, shutting down a wildfly instance does not lead to missing message processing. However, I did observe the following consistently:

(1) Configuring with shared-store approach for both wildfly(active)-wildfly(active)
sending request to a specific wildfly instance and shut it down, all un-processed message by this instance will be picked up by the other live instance

(2) Configuring with data replication approach for both wildfly(active)-wildfly(active)
sending request to a specific wildfly instance and shut it down, all un-processed message by this instance will NOT be picked up by the other live instance, and will be processed once the instance that was down is live again.

The tests were for wildfly 10.0.0

The question is if this is the expected behaviour of data replication approach.

Thanks,

Wayne
Actions
12. Re: Some messages are not accessible to backup server with message configuration through data replication

wwang2016 May 19, 2017 2:47 PM (in response to wwang2016)

Hi Mirek,

I made a simple change in the configuration, I was able to allow un-processed messages to be processed by the other server.

The configuration change is on the backup server which currently does not define a discovery-group within cluster-connection. I think this is why the backup server did not get the messages replicated.

                <discovery-group name="dg-group1" jgroups-channel="activemq-cluster"/>
                <cluster-connection name="my-cluster" discovery-group="dg-group1" connector-name="netty" address="jms"/>

I tested many times, I am pretty sure it is working now.

However, there is one thing that is problematic:
When I restart the wildfly instance, the instance can not start up properly, and requests can not be processed with 404 - Not found error

12:31:53,910 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) WFLYCTL0013: Operation ("deploy") failed - address: ([("deployment" => "wildfly-helloworld-mdb.war")]) - failure description: {"WFLYCTL0180: Services with missing/unavailable dependencies" => [
    "jboss.deployment.unit.\"wildfly-helloworld-mdb.war\".component.HelloWorldQueueMDB.CREATE is missing [jboss.ra.activemq-ra]",
    "jboss.naming.context.java.module.wildfly-helloworld-mdb.wildfly-helloworld-mdb.DefaultJMSConnectionFactory is missing [jboss.naming.context.java.jboss.DefaultJMSConnectionFactory]",
    "jboss.deployment.unit.\"wildfly-helloworld-mdb.war\".component.HelloWorldQTopicMDB.CREATE is missing [jboss.ra.activemq-ra]"
]}

Is it possible that the backup message server in instance #2 was not shutdown once the instance #1 is up and its message server has issue with deployment of the application? Is there a way to find out if the backup server is not shutdown?

Note: if I shutdown wildfly instance #2, I will have no problem to start up wildfly instance #1. However, that is not the right way to manage wildfly instances in production.

Thanks,

Wayne
Actions
13. Re: Some messages are not accessible to backup server with message configuration through data replication

mnovak May 22, 2017 2:55 AM (in response to wwang2016)

This is a bug that cluster-connection can be created without discovery-group or static-connectors. One of them must be defined. Nice catch! Do you want to fill a WFLY jira for this?

The 2nd thing is more problematic. I discussed it in another forum:
Wildfly 10.1 fails to deploy app during Artemis failback

Basically failback takes some time to happen as backup (on instance #2) must sent its journal to live (to instance #1) before live activates. Once lives activates then it deploys all connection factories/destinations to JNDI. Problem is that deployment does not check it. There are 2 jiras for this:
[WFLY-7395] Allow to provide deployment dependency to JNDI resource - JBoss Issue Tracker
[WFCORE-1912] Redeploy deployment if all missing dependencies for deployment are corrected - JBoss Issue Tracker

Neither is resolved. I'll try to push it. Workaround seems to be that you deploy your war/jar like:
deploy ~/tmp/luckywinner.ear --unmanaged --headers={rollback-on-runtime-failure=false}
Actions
14. Re: Some messages are not accessible to backup server with message configuration through data replication

wwang2016 May 23, 2017 12:08 PM (in response to mnovak)

Hi Miroslav,

I tested data replication approach with live-backup pair defined in both active wildfly instances [ wildfly #1 (live #1, backup #2) + wildfly #2 (live#2, backup #1) ] in the following wildfly versions

wildfly 10.0.0.Final,
wildfly 10.1.0.Final
wildfly 11.0.0.Alpha1

The issue of restarting the wildfly instance that was shut down remains in all wildfly versions.

I can see that the discussion (Wildfly 10.1 fails to deploy app during Artemis failback ) indicated that this was not supported configuration (live/backup within wildfly instance) for version 10.1.0. Is there any plan this configuration will be supported at all in future releases?

Thanks,

Wayne
Actions

1 2 Previous Next

Go to original post