13 Replies Latest reply on May 16, 2018 1:30 PM by yprajapa

    Paging directory not accessible to cluster queue (wildfly 10.1.0.Final)

    wwang2016

      Hi,

       

      I was investigating wildfly cluster for messaging high availability.

       

      Basically, there are two wildfly instances: instance #1 and instance #2 were configured with shared-store approach.

       

      I took the sample helloworld-mdb, and test the messaging high availability.

       

      In the application, it quickly generated 3000 messages and sends them to a queue. I shut down the instance once it generated 3000 messages. I then shut down instance #1, and I can see instance #2 continue to process. However, I found two issues:

      (1) instance #2 did not process all the remaining message. Some messages somehow are localized to the instance that were down

      (2) a paging directory was created by the system.

      (3) once I restarted instance #1, the remaining messages will be processed again by instance #1.

       

      Note: if the number of messages that were generated is small, there will be no such issue as a paging directory

       

      Is there anyway to resolve this issue ? (all messages should be processed by the second instance)

       

      Thanks

        • 1. Re: Paging directory not accessible to cluster queue (wildfly 10.1.0.Final)
          mnovak

          Hi Wayne :-), could you share your configuration for both of the servers?

           

          Thanks,

          Mirek

          • 2. Re: Paging directory not accessible to cluster queue (wildfly 10.1.0.Final)
            wwang2016

            Hi Mirek,

             

            I was using the shared-store approach for wildfly 10.1.0.Final

             

            Test#1:

            (1) start up wildfly instance #1 and wildfly instance #2 ( I have a demo application to generate / consume messages)

            (2) accessing wildfly instance #1 to quickly generate/send up to 5000 messages to a queue, NO paging directory was created.

            (3) shut down wildfly instance #1 after the messages were fully generated, instance #2 continued to process the messages until completion.

             

            Test#2

            (1) start up wildfly instance #1 and wildfly instance #2

            (2) accessing wildfly instance #1 to quickly generate/send up 10000 messages to a queue, a paging directory was INDEED created.

            Note, If I increase the max-size-bytes for the queue, the paging directory will not be created. However, it is important to test the scenario when paging directory is created.

            (3) shut down wildfly instance #1 after the messages were fully generated, instance #2 continued to process the messages, but not all messages were processed.

            (4) restarting instance #1, all remaining messages were processed by instance #1

             

            Note: the paging file could not deleted in test #2. If I shut down instance #2 before restarting instance #1, the paging file will be deleted after message processing was completed. It looks like instance #2 was accessing the paging file, but just did not process the message.

             

            Also, I set permission of the root folder (shared) to be fully controlled by the user who started up the server

             

             

            The following are the configurations of the two wildfly instances

            wildfly#1

             

                    <subsystem xmlns="urn:jboss:domain:messaging-activemq:1.0">

                        <server name="default">

                            <cluster password="password"/>

                            <shared-store-master failover-on-server-shutdown="true"/>

                            <bindings-directory path="../../../shared/bindings-A"/>

                            <journal-directory path="../../../shared/journal-A"/>

                            <large-messages-directory path="../../../shared/largemessages-A"/>

                            <paging-directory path="../../../shared/paging-A"/>

                            <security-setting name="#">

                                <role name="guest" send="true" consume="true" create-non-durable-queue="true" delete-non-durable-queue="true"/>

                            </security-setting>

                            <address-setting name="#" dead-letter-address="jms.queue.DLQ" expiry-address="jms.queue.ExpiryQueue" max-size-bytes="10485760" page-size-bytes="2097152" message-counter-history-day-limit="10" redistribution-delay="1000"/>

                            <address-setting name="jms.queue.HELLOWORLDMDBQueue" redelivery-delay="30000" page-size-bytes="1048576" max-delivery-attempts="100" max-size-bytes="10485760"/>

            <http-connector name="http-connector" socket-binding="http" endpoint="http-acceptor"/>

                            <http-connector name="http-connector-throughput" socket-binding="http" endpoint="http-acceptor-throughput">

                                <param name="batch-delay" value="50"/>

                            </http-connector>

                            <remote-connector name="netty" socket-binding="messaging"/>

                            <in-vm-connector name="in-vm" server-id="0"/>

                            <http-acceptor name="http-acceptor" http-listener="default"/>

                            <http-acceptor name="http-acceptor-throughput" http-listener="default">

                                <param name="batch-delay" value="50"/>

                                <param name="direct-deliver" value="false"/>

                            </http-acceptor>

                            <remote-acceptor name="netty" socket-binding="messaging"/>

                            <in-vm-acceptor name="in-vm" server-id="0"/>

                            <broadcast-group name="bg-group1" jgroups-channel="activemq-cluster" connectors="netty"/>

                            <discovery-group name="dg-group1" jgroups-channel="activemq-cluster"/>

                            <cluster-connection name="my-cluster" address="jms" connector-name="netty" discovery-group="dg-group1"/>

                            <jms-queue name="ExpiryQueue" entries="java:/jms/queue/ExpiryQueue"/>

                            <jms-queue name="DLQ" entries="java:/jms/queue/DLQ"/>

                            <jms-queue name="HELLOWORLDMDBQueue" entries="java:jboss/queue/HELLOWORLDMDBQueue"/>

                            <connection-factory name="InVmConnectionFactory" entries="java:/ConnectionFactory" connectors="in-vm"/>

                            <connection-factory name="RemoteConnectionFactory" entries="java:jboss/exported/jms/RemoteConnectionFactory" connectors="netty" ha="true" block-on-acknowledge="true" reconnect-attempts="-1"/>

                            <pooled-connection-factory name="activemq-ra" entries="java:/JmsXA java:jboss/DefaultJMSConnectionFactory" connectors="in-vm" transaction="xa"/>

                        </server>

                        <server name="backup">

                            <cluster password="password"/>

                            <shared-store-slave failover-on-server-shutdown="true"/>

                            <bindings-directory path="../../../shared/bindings-B"/>

                            <journal-directory path="../../../shared/journal-B"/>

                            <large-messages-directory path="../../../shared/largemessages-B"/>

                            <paging-directory path="../../../shared/paging-B"/>

                            <address-setting name="#" redistribution-delay="0"/>

                            <remote-connector name="netty" socket-binding="messaging-backup"/>

                            <remote-acceptor name="netty" socket-binding="messaging-backup"/>

                            <broadcast-group name="bg-group1" jgroups-channel="activemq-cluster" connectors="netty"/>

                            <discovery-group name="dg-group-backup" jgroups-channel="activemq-cluster"/>

                            <cluster-connection name="my-cluster" address="jms" connector-name="netty" discovery-group="dg-group-backup"/>

                        </server>

                    </subsystem>

             

             

            Wildfly #2

             

                    <subsystem xmlns="urn:jboss:domain:messaging-activemq:1.0">

                        <server name="default">

                            <cluster password="password"/>

                            <shared-store-master failover-on-server-shutdown="true"/>

                            <bindings-directory path="../../../shared/bindings-B"/>

                            <journal-directory path="../../../shared/journal-B"/>

                            <large-messages-directory path="../../../shared/largemessages-B"/>

                            <paging-directory path="../../../shared/paging-B"/>

                            <security-setting name="#">

                                <role name="guest" send="true" consume="true" create-non-durable-queue="true" delete-non-durable-queue="true"/>

                            </security-setting>

                            <address-setting name="#" dead-letter-address="jms.queue.DLQ" expiry-address="jms.queue.ExpiryQueue" max-size-bytes="10485760" page-size-bytes="2097152" message-counter-history-day-limit="10" redistribution-delay="1000"/>

            <address-setting name="jms.queue.HELLOWORLDMDBQueue" redelivery-delay="30000" page-size-bytes="1048576" max-delivery-attempts="100" max-size-bytes="10485760"/>

                            <http-connector name="http-connector" socket-binding="http" endpoint="http-acceptor"/>

                            <http-connector name="http-connector-throughput" socket-binding="http" endpoint="http-acceptor-throughput">

                                <param name="batch-delay" value="50"/>

                            </http-connector>

                            <remote-connector name="netty" socket-binding="messaging"/>

                            <in-vm-connector name="in-vm" server-id="0"/>

                            <http-acceptor name="http-acceptor" http-listener="default"/>

                            <http-acceptor name="http-acceptor-throughput" http-listener="default">

                                <param name="batch-delay" value="50"/>

                                <param name="direct-deliver" value="false"/>

                            </http-acceptor>

                            <remote-acceptor name="netty" socket-binding="messaging"/>

                            <in-vm-acceptor name="in-vm" server-id="0"/>

                            <broadcast-group name="bg-group1" jgroups-channel="activemq-cluster" connectors="netty"/>

                            <discovery-group name="dg-group1" jgroups-channel="activemq-cluster"/>

                            <cluster-connection name="my-cluster" address="jms" connector-name="netty" discovery-group="dg-group1"/>

                            <jms-queue name="ExpiryQueue" entries="java:/jms/queue/ExpiryQueue"/>

                            <jms-queue name="DLQ" entries="java:/jms/queue/DLQ"/>

                            <jms-queue name="HELLOWORLDMDBQueue" entries="java:jboss/queue/HELLOWORLDMDBQueue"/>

                            <connection-factory name="InVmConnectionFactory" entries="java:/ConnectionFactory" connectors="in-vm"/>

                            <connection-factory name="RemoteConnectionFactory" entries="java:jboss/exported/jms/RemoteConnectionFactory" connectors="netty" ha="true" block-on-acknowledge="true" reconnect-attempts="-1"/>

                            <pooled-connection-factory name="activemq-ra" entries="java:/JmsXA java:jboss/DefaultJMSConnectionFactory" connectors="in-vm" transaction="xa"/>

                        </server>

                        <server name="backup">

                            <cluster password="password"/>

                            <shared-store-slave failover-on-server-shutdown="true"/>

                            <bindings-directory path="../../../shared/bindings-A"/>

                            <journal-directory path="../../../shared/journal-A"/>

                            <large-messages-directory path="../../../shared/largemessages-A"/>

                            <paging-directory path="../../../shared/paging-A"/>

                            <address-setting name="#" redistribution-delay="0"/>

                            <remote-connector name="netty" socket-binding="messaging-backup"/>

                            <remote-acceptor name="netty" socket-binding="messaging-backup"/>

                            <broadcast-group name="bg-group1" jgroups-channel="activemq-cluster" connectors="netty"/>

                            <discovery-group name="dg-group-backup" jgroups-channel="activemq-cluster"/>

                            <cluster-connection name="my-cluster" address="jms" connector-name="netty" discovery-group="dg-group-backup"/>

                        </server>

                    </subsystem>

            • 3. Re: Paging directory not accessible to cluster queue (wildfly 10.1.0.Final)
              mnovak

              I don't see any issue in the configuration. How many messages is stuck when you shutdown instance #1. Could you also check backup server on instance #2 is active in CLI:

              [standalone@localhost:9990 /] /subsystem=messaging-activemq/server=backup:read-attribute(name=active)

               

              Thanks,

              Mirek

              • 4. Re: Paging directory not accessible to cluster queue (wildfly 10.1.0.Final)
                wwang2016

                Hi Mirek,

                 

                I checked the live instance (#2), the backup server is active once the instance #1 was down.

                 

                [standalone@localhost:10090 /] /subsystem=messaging-activemq/server=backup:read-attribute(name=active)

                {

                    "outcome" => "success",

                    "result" => true

                }

                 

                Out of 10000 messages, there were 1555 messages not processed. They seem to be placed in the paging file which has a size of 616 KB. Once I re-started the instance #1, I was able to see the messages got processed again by instance #1 and the paging file size will reduce to zero.

                 

                My test showed that as long as I set the max-size-bytes to a larger number and no paging file was created, there was no issue for the remaining live instance to process all the message. The problem happened only when the paging directory (also created sub-directory), paging file get created.

                 

                I assume activemq should allow messages in paging directory to be processed by the backup server. In your experience, did messages in the paging directory get processed fully by the remaining instances? or is this a wildfly 10.1.0.Final issue?

                • 5. Re: Paging directory not accessible to cluster queue (wildfly 10.1.0.Final)
                  mnovak

                  I think, I have an idea what going on here :-) could you check that backup has address-full-policy=PAGE in address-settings. If not then set it to PAGE and try again. If there is address-full-policy=BLOCK then backup will not access paging directory.

                   

                  Thanks,

                  Mirek

                  • 6. Re: Paging directory not accessible to cluster queue (wildfly 10.1.0.Final)
                    wwang2016

                    I checked the configuration, and it does not have this attribute. I manually added this attribute, and it still has the same issue.

                     

                    In backup server configuration:

                    <address-setting name="#" redistribution-delay="0" address-full-policy="PAGE"/>

                     

                    However, I then started to focus on the address-setting in the backup server configuration by copying the existing definition from default server to backup server. It does not look anything special.

                     

                    <address-setting name="#" dead-letter-address="jms.queue.DLQ" expiry-address="jms.queue.ExpiryQueue" max-size-bytes="10485760" page-size-bytes="2097152" message-counter-history-day-limit="10" redistribution-delay="1000"/>

                                   

                    However, it looks like the issue was gone after two separate test. The paging file was created, and then the messages got consumed (file size reduced to zero).

                     

                    Note:

                    The max-size-bytes in the definition is the same as the address-setting for the queue in default server

                    <address-setting name="jms.queue.HELLOWORLDMDBQueue" redelivery-delay="0" page-size-bytes="1048576" max-delivery-attempts="100" max-size-bytes="10485760"/>

                     

                    The fix was due to the copy of the address-setting name="#" from the default server. It is puzzling. Do you have any idea?

                    • 7. Re: Paging directory not accessible to cluster queue (wildfly 10.1.0.Final)
                      wwang2016

                      Hi Mirek,

                       

                      Just want to share the latest findings:

                       

                      I increased the number of messages to 50000 (it was 10000), and let instance #1 to create messages.

                      I was surprised to see both paging-A and paging-B got created since I was expecting to see only paging-A folder get created. There were many paging files generated in both paging folders.

                      After I shut down instance #1, instance #2 continued to process the messages until completion. The paging files now reduced to only one file with a size=0 in each paging directory.

                       

                      I am sure the issue will not happen even if I increase the messages to a bigger number. It will be interesting to figure out why the copy of the address setting from default server solved the issue.

                      • 8. Re: Paging directory not accessible to cluster queue (wildfly 10.1.0.Final)
                        wwang2016

                        It looks like the complete definition of address-setting is required to fix the issue:

                        <address-setting name="#" dead-letter-address="jms.queue.DLQ" expiry-address="jms.queue.ExpiryQueue" max-size-bytes="10485760" page-size-bytes="2097152" message-counter-history-day-limit="10" redistribution-delay="1000"/>

                         

                        It did not work without the attribute: redistribution-delay. Note. redistribution-delay set to 0 also worked.

                         

                        However, the following did not work with only redistribution-delay defined either

                         

                        <address-setting name="#" redistribution-delay="0"/>

                         

                        So it looks like all the settings from default server are required.

                         

                        Note: I found sometimes the real default may not be what the document says. It will be safe if we set it explicitly.

                        • 9. Re: Paging directory not accessible to cluster queue (wildfly 10.1.0.Final)
                          mnovak

                          Hi Wayne, looking at description of address settings then if redistribution-delay is not specified then default value is "-1". So redistribution of messages is disabled. I guess this is the source of the problem as when backup activated on instance #2 then it could not redistribute messages to live on instance #2.

                          • 10. Re: Paging directory not accessible to cluster queue (wildfly 10.1.0.Final)
                            wwang2016

                            Hi Mirek,

                             

                            I think it will not work without the definition of redistribution-delay. However, I also tested the following before, it did not work either

                             

                            <address-setting name="#" redistribution-delay="0"/>

                             

                            There is another twist, It seems that the behavior in linux OS is a bit different. The working solution in Windows somehow has issue in Linux. This may be due to the NFS in Linux while I simply define a local folder accessed by two local instances running in different set of port.

                             

                            I will continue to test it linux environment, and share the results with you.

                            • 11. Re: Paging directory not accessible to cluster queue (wildfly 10.1.0.Final)
                              wwang2016

                              Hi Mirek,

                               

                              I finally fixed the issue in Linux environment, and it was due to a firewall setting not related to the configuration of wildfly standalone-full-ha.xml

                               

                              Back to original question of address-setting, I would like to share with you the following interesting findings:

                               

                              (1) All three attributes need to be defined

                               

                              The default values are from activemq document (activemq-artemis-1.1.0.pdf).

                              max-size-bytes (default is infinite)

                              page-size-bytes (default is 10485760)

                              redistribution-delay (default is -1, meaning not redistributed)

                               

                              (2) define a value different than that of the default server seemed to work fine

                               

                              (3) It will not work if the configuration has any one of the attributes missing. This does not make much sense. However, I think it is likely the default values in wildfly may not be the same

                              • 12. Re: Paging directory not accessible to cluster queue (wildfly 10.1.0.Final)
                                mnovak

                                Hi Wayne,

                                 

                                messaging integration layer in WF10 is rewriting some Artemis default values. max-size-bytes should be set to 10MB by default and redistribution-delay is 1000ms.

                                 

                                Good that you made it work :-)

                                 

                                Thanks,

                                Mirek

                                • 13. Re: Paging directory not accessible to cluster queue (wildfly 10.1.0.Final)
                                  yprajapa

                                  Hi Wayne,

                                   

                                  I found this post when searching for remote paging directory settings. While I understand how you configured your shared paging directory as relative path and it is worked for you ...that's great!

                                   

                                  I noticed your backup setting especially in "shared" context and wonder how it is working. Shouldn't every instance's backup (slave) be pointing to the same shared folder as it's master server. This way when master server goes down then backup server should take over accessing the same shared folder. With current configuration backup server will be accessing another folder...so wonder how is this failing over.

                                   

                                  Cross referencing shared folders between two different instance sounded like you are trying to achieve collocated effect but there is separate configuration construct "shared-store-colocated" for it. Even in "shared-store-colocated" case you would be using same share folder between master/slave within same Wildfly instance.

                                   

                                  Thanks

                                  Yogesh