8 Replies Latest reply on Jul 22, 2016 9:14 AM by Wayne Wang

    How to make messages in cluster queue replicated in real time

    Wayne Wang Apprentice

      I set up a cluster of two instances with HA profile. If I send 10 requests, I can see 5 were processed by instance #1 and another 5 were processed by instance #2. This is good.

       

      When I was testing shutting down instance#1  in the middle of message processing, and was expecting all unprocessed messages by instance #1 (in cluster queue) will be available to instance #2. So all remaining messages created by and not processed by instance #1 should be available to instance #2 and get processed.

       

      However, this does not seem to be the case. The following is the statistics.

       

      (1) Send messages through web server and shut down instance #1 in the middle

      Instance #1 created 171 messages in its queue, and apparently no message was successfully processed based on log file. This is due to the shutting down of instance #1

      Instance #2 created 171 messages in its queue, and successfully processed 185 messages after instance #1 was completely down. Apparently, instance #2 was able to access 185 - 171 = 14 messages that were created by instance #1

       

      (2) restart instance #1

      after I restart instance #1, I can see message processing by instance #2 (but not by instance #1). Finally, 342 messages were processed successfully by instance #2 (based on log file)

       

      Is this normal behavior with the default HA profile? Is there any way to make messages in instance #1 available to instance #2 in real time or at least before instance #1 was completely shut down?

        • 1. Re: How to make messages in cluster queue replicated in real time
          Miroslav Novak Master

          Hi Wayne,

           

          unfortunately HA profile does not provide HA for messaging even if you configure messaging cluster. I'll try to explain how HA for messaging works.

           

          Messaging in Widfly is using active-passive philosophy for HA. We call active as live server and passive as backup server. If you configure cluster between 2 messaging servers then you have configured 2 "live" servers in cluster. There are no backups which would activate when live is shutdown/killed.

           

          So when you shutdown server #1 then all the messages on #1 cannot be processed until #1 is started again. You will need to configure backup servers for each live server. Which version of Wildfly are you using?

           

          Cheers,

          Mirek

          • 2. Re: How to make messages in cluster queue replicated in real time
            Wayne Wang Apprentice

            Hi Mirek,

             

            I am on Wildfly 10

             

            I was trying to use the feature of "HA singleton" (the feature that was in Jboss AS6, and brought back to wildfly10) to set up a cluster of two servers, and this allows us to shut down the current HA singleton provider to do maintenance work while the other server becomes the new HA singleton provider, and we will have very short down time. In another scenario, if the current HA singleton is down for any reason, the other server should automatically start up, providing HA.

             

            However, I found out it does not provide HA for message once a server is down and in the middle of message processing.

             

            My goal is not to set up a cluster of two live servers. So I think setting up a live-backup configuration would meet our requirement. This setup will provide one active server at any time, and once the active server is down, the backup server is up. I am hoping the new active server has access to all the messages.

             

            However, I have not found any document on how to set up a live-backup configuration. Do you have any information to share?

             

            Thanks

             

            Wayne

            • 3. Re: How to make messages in cluster queue replicated in real time
              Miroslav Novak Master

              Hi Wayne,

               

              thanks for update! If I remember correctly Jboss AS6 used JBoss Messaging. It used active-active HA philosophy. All nodes in JBoss Messaging cluster had a shared database which had all messages. So if any node went down then all the messages could be "failovered" to another node in cluster by simple SQL "update" command.

               

              Wildfly 10 is using ActiveMQ Artemis as messaging which has nothing like that. There is no database, it can only use file system to store its messaging journal. Also each Artemis live server must have its own journal which cannot be shared with other live servers. This makes situation a little complicated but it's possible to create something like "active-active" configuration also for Artemis. It's called collocated HA topology.

               

              The idea of collocated HA topology is following. There are two Wildfly 10 servers. By default each of them contains Artemis server configured as "live". What we will do is that we'll configure one more backup server in each Wildfly server. It will be backup for live from the other Widfly 10 server. So in the end there will be 2 live-backup pairs. It will look like Wilfly1(Live1/Backup2) <-Artemis cluster-> Wildfly2(Live2/Backup1). All the live1,live2,backup1,backup2 servers are on the same Artemis cluster.

               

              In case that Wilfly1 will be shutdown/killed, Backup1 in Wildfly2 will activate. So live2 and backup1 will be active in Wildfly2 at the same time. Further once backup1 activates it will form Artemis cluster with live2. So it will be possible to consume all the messages which were previously on live1 from live2 in Wildfly2 because they'll be redistributed to live2 from backup1.

               

              There are 2 ways how live-backup pair can synchronize messages between each other. First is "replicated journal" and second "shared store". Replication journal is basically replication over network. Unfortunately this is very buggy in Wildfly 10.0.0 so I strongly recommend to choose "shared store".

               

              Shared store uses shared directory to store journal for live-backup pair. This shared directory must be accessible for both of the Wildfly 10 servers. It must be NFSv4 or GFS2 file system which is mount on both machines. It works in the following way. When live server is started then it creates file lock on the journal directory. Backup is using the same journal directory and checks whether there is still file lock created by live server. Backup will only activate and start using journal if live's file lock disappears. This happens only when live is shutdown/crashed.

               

              In case that live (Widfly1) server is started again then backup can be configured to automatically shutdown/deactivate so the live server can start again. It's called "fail-back" and I strongly recommend to configure it. I think it's disabled by default.

               

              Now the most important part, here is configuration for messaging-activemq subsystem for both of the Wildfly 10 servers. You will need to configure path to shared directory for each live-backup pair. So live1-backup1 have one shared directory for its journal and live2-backup2 has a different shared directory.

               

              Widlfly 1:

               

               <subsystem xmlns="urn:jboss:domain:messaging-activemq:1.0">
                          <server name="default">
                              <security enabled="false"/>
                              <cluster password="${jboss.messaging.cluster.password:CHANGE ME!!}"/>
                              <journal compact-min-files="0" min-files="10"/>
                              <shared-store-master failover-on-server-shutdown="true"/>
                              <bindings-directory path="../../../../hornetq-journal-A/bindings"/>
                              <journal-directory path="../../../../hornetq-journal-A/journal"/>
                              <large-messages-directory path="../../../../hornetq-journal-A/largemessages"/>
                              <paging-directory path="../../../../hornetq-journal-A/paging"/>
                              <security-setting name="#">
                                  <role name="guest" delete-non-durable-queue="true" create-non-durable-queue="true" consume="true" send="true"/>
                              </security-setting>
                              <address-setting name="#" redistribution-delay="0" page-size-bytes="524288" max-size-bytes="1048576" max-delivery-attempts="200"/>
                              <http-connector name="http-connector" endpoint="http-acceptor" socket-binding="http"/>
                              <http-connector name="http-connector-throughput" endpoint="http-acceptor-throughput" socket-binding="http">
                                  <param name="batch-delay" value="50"/>
                              </http-connector>
                              <remote-connector name="netty" socket-binding="messaging">
                                  <param name="use-nio" value="true"/>
                                  <param name="use-nio-global-worker-pool" value="true"/>
                              </remote-connector>
                              <in-vm-connector name="in-vm" server-id="0"/>
                              <http-acceptor name="http-acceptor" http-listener="default"/>
                              <http-acceptor name="http-acceptor-throughput" http-listener="default">
                                  <param name="batch-delay" value="50"/>
                                  <param name="direct-deliver" value="false"/>
                              </http-acceptor>
                              <remote-acceptor name="netty" socket-binding="messaging">
                                  <param name="use-nio" value="true"/>
                              </remote-acceptor>
                              <in-vm-acceptor name="in-vm" server-id="0"/>
                              <broadcast-group name="bg-group1" connectors="netty" broadcast-period="1000" jgroups-channel="activemq-cluster"/>
                              <discovery-group name="dg-group1" refresh-timeout="1000" jgroups-channel="activemq-cluster"/>
                              <cluster-connection name="my-cluster" discovery-group="dg-group1" retry-interval="1000" connector-name="netty" address="jms"/>
                              <jms-queue name="ExpiryQueue" entries="java:/jms/queue/ExpiryQueue"/>
                              <jms-queue name="DLQ" entries="java:/jms/queue/DLQ"/>
                              <jms-queue name="testQueue0" entries="jms/queue/testQueue0 java:jboss/exported/jms/queue/testQueue0"/>
                              <jms-queue name="InQueue" entries="jms/queue/InQueue java:jboss/exported/jms/queue/InQueue"/>
                              <jms-queue name="OutQueue" entries="jms/queue/OutQueue java:jboss/exported/jms/queue/OutQueue"/>
                              <jms-topic name="testTopic0" entries="jms/topic/testTopic0 java:jboss/exported/jms/topic/testTopic0"/>
                              <connection-factory name="InVmConnectionFactory" entries="java:/ConnectionFactory" connectors="in-vm"/>
                              <connection-factory name="RemoteConnectionFactory" reconnect-attempts="-1" retry-interval-multiplier="1.0" retry-interval="1000" block-on-acknowledge="true" ha="true" entries="java:jboss/exported/jms/RemoteConnectionFactory" connectors="netty"/>
                              <pooled-connection-factory name="activemq-ra" transaction="xa" reconnect-attempts="-1" block-on-acknowledge="true" ha="true" entries="java:/JmsXA java:jboss/DefaultJMSConnectionFactory" connectors="in-vm"/>
                          </server>
                          <server name="backup">
                              <security enabled="false"/>
                              <cluster password="CHANGE_ME!!!"/>
                              <shared-store-slave failover-on-server-shutdown="true"/>
                              <bindings-directory path="../../../../hornetq-journal-B/bindings"/>
                              <journal-directory path="../../../../hornetq-journal-B/journal"/>
                              <large-messages-directory path="../../../../hornetq-journal-B/largemessages"/>
                              <paging-directory path="../../../../hornetq-journal-B/paging"/>
                              <security-setting name="#">
                                  <role name="guest" manage="true" delete-non-durable-queue="true" create-non-durable-queue="true" delete-durable-queue="true" create-durable-queue="true" consume="true" send="true"/>
                              </security-setting>
                              <address-setting name="#" redistribution-delay="0" page-size-bytes="524288" max-size-bytes="1048576" max-delivery-attempts="200"/>
                              <remote-connector name="netty-backup" socket-binding="messaging-backup"/>
                              <in-vm-connector name="in-vm" server-id="0"/>
                              <remote-acceptor name="netty-backup" socket-binding="messaging-backup"/>
                              <broadcast-group name="bg-group-backup" connectors="netty-backup" broadcast-period="1000" jgroups-channel="activemq-cluster"/>
                              <discovery-group name="dg-group-backup" refresh-timeout="1000" jgroups-channel="activemq-cluster"/>
                              <cluster-connection name="my-cluster" discovery-group="dg-group-backup" retry-interval="1000" connector-name="netty-backup" address="jms"/>
                          </server>
                      </subsystem>
                  ...
                  <socket-binding-group name="standard-sockets" default-interface="public" port-offset="${jboss.socket.binding.port-offset:0}">
                   ...       
                      <socket-binding name="messaging-group" port="0" multicast-address="${jboss.messaging.group.address:231.7.7.7}" multicast-port="${jboss.messaging.group.port:9876}"/>
                      <socket-binding name="messaging" port="5445"/>
                      <socket-binding name="messaging-backup" port="5446"/>
                      ...
                  </socket-binding-group>
              

               

              Wildfly 2:

               

              <subsystem xmlns="urn:jboss:domain:messaging-activemq:1.0">
                          <server name="default">
                              <security enabled="false"/>
                              <cluster password="${jboss.messaging.cluster.password:CHANGE ME!!}"/>
                              <journal compact-min-files="0" min-files="10"/>
                              <shared-store-master failover-on-server-shutdown="true"/>
                              <bindings-directory path="../../../../hornetq-journal-B/bindings"/>
                              <journal-directory path="../../../../hornetq-journal-B/journal"/>
                              <large-messages-directory path="../../../../hornetq-journal-B/largemessages"/>
                              <paging-directory path="../../../../hornetq-journal-B/paging"/>
                              <security-setting name="#">
                                  <role name="guest" delete-non-durable-queue="true" create-non-durable-queue="true" consume="true" send="true"/>
                              </security-setting>
                              <address-setting name="#" redistribution-delay="0" page-size-bytes="524288" max-size-bytes="1048576" max-delivery-attempts="200"/>
                              <http-connector name="http-connector" endpoint="http-acceptor" socket-binding="http"/>
                              <http-connector name="http-connector-throughput" endpoint="http-acceptor-throughput" socket-binding="http">
                                  <param name="batch-delay" value="50"/>
                              </http-connector>
                              <remote-connector name="netty" socket-binding="messaging">
                                  <param name="use-nio" value="true"/>
                                  <param name="use-nio-global-worker-pool" value="true"/>
                              </remote-connector>
                              <in-vm-connector name="in-vm" server-id="0"/>
                              <http-acceptor name="http-acceptor" http-listener="default"/>
                              <http-acceptor name="http-acceptor-throughput" http-listener="default">
                                  <param name="batch-delay" value="50"/>
                                  <param name="direct-deliver" value="false"/>
                              </http-acceptor>
                              <remote-acceptor name="netty" socket-binding="messaging">
                                  <param name="use-nio" value="true"/>
                              </remote-acceptor>
                              <in-vm-acceptor name="in-vm" server-id="0"/>
                              <broadcast-group name="bg-group1" connectors="netty" broadcast-period="1000" jgroups-channel="activemq-cluster"/>
                              <discovery-group name="dg-group1" refresh-timeout="1000" jgroups-channel="activemq-cluster"/>
                              <cluster-connection name="my-cluster" discovery-group="dg-group1" retry-interval="1000" connector-name="netty" address="jms"/>
                              <jms-queue name="ExpiryQueue" entries="java:/jms/queue/ExpiryQueue"/>
                              <jms-queue name="DLQ" entries="java:/jms/queue/DLQ"/>
                              <jms-queue name="testQueue0" entries="jms/queue/testQueue0 java:jboss/exported/jms/queue/testQueue0"/>
                              <jms-queue name="InQueue" entries="jms/queue/InQueue java:jboss/exported/jms/queue/InQueue"/>
                              <jms-queue name="OutQueue" entries="jms/queue/OutQueue java:jboss/exported/jms/queue/OutQueue"/>
                              <jms-topic name="testTopic0" entries="jms/topic/testTopic0 java:jboss/exported/jms/topic/testTopic0"/>
                              <connection-factory name="InVmConnectionFactory" entries="java:/ConnectionFactory" connectors="in-vm"/>
                              <connection-factory name="RemoteConnectionFactory" reconnect-attempts="-1" retry-interval-multiplier="1.0" retry-interval="1000" block-on-acknowledge="true" ha="true" entries="java:jboss/exported/jms/RemoteConnectionFactory" connectors="netty"/>
                              <pooled-connection-factory name="activemq-ra" transaction="xa" reconnect-attempts="-1" block-on-acknowledge="true" ha="true" entries="java:/JmsXA java:jboss/DefaultJMSConnectionFactory" connectors="in-vm"/>
                          </server>
                          <server name="backup">
                              <security enabled="false"/>
                              <cluster password="CHANGE_ME!!!"/>
                              <shared-store-slave failover-on-server-shutdown="true"/>
                              <bindings-directory path="../../../../hornetq-journal-A/bindings"/>
                              <journal-directory path="../../../../hornetq-journal-A/journal"/>
                              <large-messages-directory path="../../../../hornetq-journal-A/largemessages"/>
                              <paging-directory path="../../../../hornetq-journal-A/paging"/>
                              <security-setting name="#">
                                  <role name="guest" manage="true" delete-non-durable-queue="true" create-non-durable-queue="true" delete-durable-queue="true" create-durable-queue="true" consume="true" send="true"/>
                              </security-setting>
                              <address-setting name="#" redistribution-delay="0" page-size-bytes="524288" max-size-bytes="1048576" max-delivery-attempts="200"/>
                              <remote-connector name="netty-backup" socket-binding="messaging-backup"/>
                              <in-vm-connector name="in-vm" server-id="0"/>
                              <remote-acceptor name="netty-backup" socket-binding="messaging-backup"/>
                              <broadcast-group name="bg-group-backup" connectors="netty-backup" broadcast-period="1000" jgroups-channel="activemq-cluster"/>
                              <discovery-group name="dg-group-backup" refresh-timeout="1000" jgroups-channel="activemq-cluster"/>
                              <cluster-connection name="my-cluster" discovery-group="dg-group-backup" retry-interval="1000" connector-name="netty-backup" address="jms"/>
                          </server>
                      </subsystem>
                     
              
                  <socket-binding-group name="standard-sockets" default-interface="public" port-offset="${jboss.socket.binding.port-offset:0}">
                      ... 
                      <socket-binding name="messaging-group" port="0" multicast-address="${jboss.messaging.group.address:231.7.7.7}" multicast-port="${jboss.messaging.group.port:9876}"/>
                      <socket-binding name="messaging" port="5445"/>
                      <socket-binding name="messaging-backup" port="5446"/>
                       ...
                  </socket-binding-group>
              

               

               

              Cheers,

              Mirek

              • 4. Re: How to make messages in cluster queue replicated in real time
                Wayne Wang Apprentice

                Hi Mirek,

                 

                Thank you very much for sharing the information.

                 

                Quick question:

                What I really need is a cluster of active-passive configuration with HA for message. Our application is currently not ready for a active-active configuration at wildfly server level. We would like to allow only one wildfly server to receive/process requests at any time.


                Is it possible to make some  additional changes in the configuration file so that only one wildfly server is actually receiving / processing requests?

                Is it possible to automatically turn the passive wildfly server alive after the currently live server is shut down?

                 

                Another option: I am not sure if it is possible to use the HA singleton feature in wildfly 10 to control the active-passive configuration if two wildfly server. But then I am not sure if the configuration you described work with the HA singleton feature while preserving the capability of HA for message?

                 

                Thanks,

                 

                Wayne

                • 5. Re: How to make messages in cluster queue replicated in real time
                  Miroslav Novak Master

                  Hi Wayne,

                   

                  thanks for update! You can easily modify the configuration and have just one live-backup pair by removing unwanted Artemis servers. I suggest to remove "backup" server configuration from both of the servers and changing configuration of live server in Wildfly 2 so it baheves as backup. Remove following lines:

                  1. <shared-store-master failover-on-server-shutdown="true"/> 
                  2. <bindings-directory path="../../../../hornetq-journal-B/bindings"/> 
                  3. <journal-directory path="../../../../hornetq-journal-B/journal"/> 
                  4. <large-messages-directory path="../../../../hornetq-journal-B/largemessages"/> 
                  5. <paging-directory path="../../../../hornetq-journal-B/paging"/>


                  And add: 

                  1. <shared-store-slave failover-on-server-shutdown="true"/> 
                  2. <bindings-directory path="../../../../hornetq-journal-A/bindings"/> 
                  3. <journal-directory path="../../../../hornetq-journal-A/journal"/> 
                  4. <large-messages-directory path="../../../../hornetq-journal-A/largemessages"/> 
                  5. <paging-directory path="../../../../hornetq-journal-A/paging"/> 


                  I don't know how to configure deployment as HA singleton. It it MDB, Servlet or some EJB? For MDB it should be possible to do so.


                  To your questions:

                  > Is it possible to make some  additional changes in the configuration file so that only one wildfly server is actually receiving / processing requests?

                  With changes described above it will be so.


                  > Is it possible to automatically turn the passive wildfly server alive after the currently live server is shut down?

                  With changes described above it will be exactly like this. When Wildfly 1 is shutdown then backup in Wildfy 2 will activate. Once Wildfly 1 is started again, live will automatically start and backup stop.

                  • 6. Re: How to make messages in cluster queue replicated in real time
                    Wayne Wang Apprentice

                    Hi Mirek,

                     

                    Thank you very much for sharing the information so promptly.

                     

                    I tried your first approach (configure live-backup for two wildfly servers) and made these two wildfly servers to run with HA singleton configuration. It worked !  The second approach made the deployment of the second wildfly server failed, may be some additional configuration is required.

                     

                    This is shared-store approach. Is there anyway to configure a replication approach? Are there any pros and cons with these two approaches?

                     

                    Regards,

                     

                    Wayne

                    • 7. Re: How to make messages in cluster queue replicated in real time
                      Miroslav Novak Master

                      Hi Wayne,

                       

                      I'm happy you made it work. 2nd approach has drawback in the way that backup in Wildfly 2 is passive. It means that it does not accept any connections. It basically does nothing until live is shutdown. I've realized it's not possible to say which HA singleton will be running at the given time. So if you start Wildfly 1 after singleton on Widlfy 2 was activated then live on Wildfly 1 will activate but HA singleton in Wildfly 2 will be still active. As HA singleton in Wildfly 2 was connected to backup and backup stops because live on Widlfy 1 started, HA singleton in Wildfly 2 will loose connection. This cannot happen with the first approach so it be ok for the use case.

                       

                      I strongly recommend to avoid replication approach for now. It's buggy in Wildfly 10.0.0 and there are design issues with collocated topology for replicated journal. I could explain but it would take long text to read :-) Replicated journal is slower in comparison to shared store. It's because every message ack or session commit must be replicated to backup's journal before response is sent back  to client. This takes network round-trip time and degrades performance. From my experience shared store on NFSv4 does not suffer by this problem so much is much faster. Also it's much more stable.

                       

                      There is ongoing work to stabilize it and it will be fixed in later Wildfly releases.

                       

                      Thanks,

                      Mirek

                       

                      Edited.