1 2 Previous Next 17 Replies Latest reply on Sep 23, 2014 10:15 AM by jbertram

    2.5.x HA policy integration into WildFly

    jmesnil

      Hi, I'm integrating the HornetQ's 2.5.x branch in WildFly master and have some issues with the new HA policy in this version.

       

       

      First, I don't understand what the different policies are.

      Could you explain them in a few sentences (esp. the COLOCATED ones)?

       

       

      I have some questions about their XML representation, their WildFly definitions and the HornetQ HAPolicy API.

       

       

      Let's start with the HAPolicy API

       

       

      I understand the idea is that we start from one of the defined types and are able to override the properties on top of that.

      However, it's very hard to understand which properties are affected by which type. If I define a BACKUP_SHARED_STORE, what's the use of a replication-cluster-name property?

       

       

      It'd be a good thing to move away from a flat API with dozens of unrelated properties to a well-defined API.

      E.G. all scale-down properties looks related to the SCALE_DOWN strategy. This should be reflected by the API. If I use a FULL backup strategy, why are the scale-down properties even exposed?

       

       

      Could you also make the API follow a fluent builder pattern. That'd prevent to break integration everytime a new property is added to the API and it'd make the code much more readable (and the public API well defined).

       

       

      At first glance, it looks that some configuration are mutually exclusive:

       

       

      live or  backup

      shared store or replication

      full or scale down strategy

      remote or colocated backups

       

       

      Having all the configuration in a single level makes for an unmaintainable API.

      I don't want to put validation code in WildFly to check if the configuration makes sense. As much as possible, it's the API job to provide only "valid" configuration.

       

       

      For WildFly integration, we provide 2 ways to configure HornetQ. Using the management API (through the CLI) and through XML.

      Please note that the canonical one is the management API (the XML only generates management operations that uses this API).

      It's a good idea to try to build a hornet-server using the CLI to test whether the management API is usable or not

       

       

      But let's start with XML first.

      I want to move away from a flat list of unrelated elements to a well-structured tree (that kind of mirrors the fluent API I talked above).

       

       

      The current XSD lists all ha-policy attributes but it's too hard to understand how they relate to each other (and it does not help, I don't understand the different policy types in the first place).

       

       

      Guys, could you define a few use cases that are covered by the ha-policy.

      Such as:

       

       

      1. no HA

      2. backup server with shared store

      3. backup server with replication

      4. colocated backup server with shared store

      5. etc.

       

       

      And what their XML looks like.

       

       

      As an example, using a shared-store or replication is mutually exclusive. This should be reflected by the schema. For example I could have:

       

       

      <hornetq-server name="B">

        <ha-policy>

          <shared-store />

        </ha-policy>

      </hornetq-server>

       

       

      or

       

       

      <hornetq-server name="B">

        <ha-policy>

          <replication>

            <group-name>group1</group-name>

            <clustername>clusterA</clustername>

          </replication>

        </ha-policy>

      </hornetq-server>

       

       

      But it makes no sense to have a replication-cluster-name with a shared store.

       

       

      If the XSD stipulates that I can have either a <shared-store> or a <replication>

      that'd be much more simple to write the configuration.

       

       

      But like I wrote, the XML is not the canonical way to build the configuration. It is WildFly management API. As an exercise, you can start WildFly with --admin-only so that there is no runtime operation, just enough to build the configuration.

      As you will see, it's harder than it should be (e.g. why is the policy-type an attribute of the ha-policy if we use the policy-type resource name when the build the HAPolicy).

       

       

      I can deal with updating the WildFly XML and management API but for that, I need a better understanding of the different use cases covered by the HA policy stuff.

      And if you could use a fluent builder API that reflects the API that'd be much more simple to integrate and maintain.

       

       

      As it stands now, I conside the ha-policy integration broken and we need to improve it before merging it in WildFly master.

        • 1. Re: 2.5.x HA policy integration into WildFly
          ataylor

          Could you explain them in a few sentences (esp. the COLOCATED ones)?

           

                NONE - no policy

                REPLICATED - a replicated live server

                SHARED_STORE - a shared store live server

                BACKUP_REPLICATED - a replicated backup server

                BACKUP_SHARED_STORE - a replicated backup server

                COLOCATED_REPLICATED - a live server that can also maintain replicated backups

                COLOCATED_SHARED_STORE - a live server that can also maintain shared store backups

           

          At first glance, it looks that some configuration are mutually exclusive:

           

           

          live or  backup

          shared store or replication

          full or scale down strategy

          remote or colocated backups

          yes Jeff you are correct

           

          Having all the configuration in a single level makes for an unmaintainable API.

          I don't want to put validation code in WildFly to check if the configuration makes sense. As much as possible, it's the API job to provide only "valid" configuration.

           

           

          For WildFly integration, we provide 2 ways to configure HornetQ. Using the management API (through the CLI) and through XML.

          Please note that the canonical one is the management API (the XML only generates management operations that uses this API).

          It's a good idea to try to build a hornet-server using the CLI to test whether the management API is usable or not

           

           

          But let's start with XML first.

          I want to move away from a flat list of unrelated elements to a well-structured tree (that kind of mirrors the fluent API I talked above).

           

           

          The current XSD lists all ha-policy attributes but it's too hard to understand how they relate to each other (and it does not help, I don't understand the different policy types in the first place).

          Actually jeff I agree, tbh lots of this stuff was just at the top level anyway so it was just moved into its own element, the template attribute was there to make configuration easy.

           

          I think your suggested alternatives are the best way to go, tbh we didnt really spend much time thinking about it, we just grouped all the existing ones along with the new ones.

           

          Let me change this next week and we can improve things all round.

          • 2. Re: 2.5.x HA policy integration into WildFly
            ataylor

            Ive come up with a better configuration for HA policy. Ive also added a few todo's we may fix, if everyone could comment please?

             

            <ha-policy>

                     <!--only one of the following-->

                     <!--on server shutdown scale down to another live server-->

                     <scale-down>

                        <!--a grouping of servers that can be scaled down to-->

                        <group-name>boo!</group-name>

                        <!--either a discovery group-->

                        <discovery-group>wahey!</discovery-group>

                        <!--or some connectors-->

                        <connectors>

                           <connector-ref>sd-connector1</connector-ref>

                           <connector-ref>sd-connector2</connector-ref>

                        </connectors>

                     </scale-down>

                     <!--a live server that can be replicated-->

                     <replicated>

                        <!--check when starting that its replica isn't running-->

                        <check-for-live-server>true</check-for-live-server>

                        <!--Whether or not the live server can failback, if a backup is live then this server will start as a replica

                        and -->

                        <allow-failback>false</allow-failback>

                        <!-- if we need to failback then we need to wait a while after announcing as a backup so it has time to propagate

                        around the cluster-->

                        <failback-delay>10000</failback-delay>

                        <!--only replicas within this group can replicate-->

                        <!-- todo rename to replica-group maybe?-->

                        <backup-group-name>backupGroupName</backup-group-name>

                     </replicated>

                     <replica>

                        <!--only replicate from a live server in this group-->

                        <backup-group-name>backupGroupName</backup-group-name>

                        <!--use the discovery configuration from this cluster connection-->

                        <replication-clustername>replicationClustername</replication-clustername>

                        <!--max number of backups -->

                        <max-saved-replicated-journals-size>3</max-saved-replicated-journals-size>

                        <!--rather than startup scale down, same as above scaledown config-->

                        <scale-down/>

                     </replica>

                     <shared-store-master>

                        <!--whether or not this shared store backup will allow a restarting live server to become live-->

                        <allow-failback>false</allow-failback>

                        <!-- if we need to failback then we need to wait a while after announcing as a backup so it has time to propagate

                        around the cluster-->

                        <failback-delay>10000</failback-delay>

                        <!--whether or not to allow the backup to become live-->

                        <!--todo this gets used for replication to which makes no sense, fix-->

                        <failover-on-shutdown>true</failover-on-shutdown>

                     </shared-store-master>

                     <shared-store-slave>

                        <!-- if we have stopped because of a live server failing back, wait this long before restarting-->

                        <!--todo maybe rename restart-delay-->

                        <failback-delay>10000</failback-delay>

                        <!--whether or not to allow the backup to become live if we have failed over and are stopping-->

                        <failover-on-shutdown>true</failover-on-shutdown>

                        <!--rather than startup scale down, same as above scaledown config-->

                        <scale-down/>

                     </shared-store-slave>

                     <!--a live server that can automatically create backups and ask for backups-->

                     <colocated-replicated>

                        <!--should i request a backup to be started for me-->

                        <request-backup>true</request-backup>

                        <!--how many times to try and acquire a backup-->

                        <backup-request-retries>33</backup-request-retries>

                        <!--how long between retries-->

                        <backup-request-retry-interval>1234</backup-request-retry-interval>

                        <!--how many backups i can create-->

                        <max-backups>12</max-backups>

                        <backups>

                           <backup-port-offset>1002</backup-port-offset>

                           <remote-connectors>

                              <connector-ref>remote-connector1</connector-ref>

                              <connector-ref>remote-connector2</connector-ref>

                           </remote-connectors>

                           <backup-group-name>backupGroupName</backup-group-name>

                           <!--rather than creating backups that startup create them to scale down, same as above scaledown config-->

                           <!--todo, maybe we should allow scale down only, does full colocated make sense?-->

                           <scale-down/>

                           <!--the cluster connection discovery to use-->

                           <replication-clustername>replicationClustername</replication-clustername>

                        </backups>

                     </colocated-replicated>

                     <!--same as colocated-replicated without replication-clustername-->

                     <colocated-shared-store/>

             

             

             

            Also Im not 100% sure on the 2 colocated configurations, we could merge these into 1 as they are currently quite similar apart from and have a flag to distinguish between them, maybe something like

             

            <colocated>

                        <!--should i request a backup to be started for me-->

                        <request-backup>true</request-backup>

                        <!--how many times to try and acquire a backup-->

                        <backup-request-retries>33</backup-request-retries>

                        <!--how long between retries-->

                        <backup-request-retry-interval>1234</backup-request-retry-interval>

                        <!--how many backups i can create-->

                        <max-backups>12</max-backups>

                        <backups>

                           <backup-port-offset>1002</backup-port-offset>

                           <remote-connectors>

                              <connector-ref>remote-connector1</connector-ref>

                              <connector-ref>remote-connector2</connector-ref>

                           </remote-connectors>

                           <backup-group-name>backupGroupName</backup-group-name>

                           <!--rather than creating backups that startup create them to scale down, same as above scaledown config-->

                           <!--todo, maybe we should allow scale down only, does full colocated make sense?-->

                           <scale-down/>

                           <!--either-->

                           <replica/>

                           <!--or-->

                           <shared-store-slave/>

                        </backups>

                     </colocated>

            • 3. Re: Re: 2.5.x HA policy integration into WildFly
              martyn-taylor

              Hi Andy,

               

              It seems that there are 3 different types of policy defined above:

              • Replicated
              • Shared Store
              • Co-located

              With in each of these types of policy there are 2 roles that a server can take, essentially the 2 roles are "master", "slave"(with slightly different semantics depending on the policy type).

               

              I'd suggest grouping by policy type at level 1, then group by role at level 2.  Example:

               

              <ha-policy>

                  <shared-store>

                    <!-- might be useful

                    <role="master">

                        <!-- master specific config here -->

                    </role>

                    <!-- Config that applies to both master/slave here -->

                  </shared-store>

              </ha-policy>

               

              or

               

              <ha-policy>

                  <shared-store>

                    <master>

                        <!-- master specific config here -->

                    </master>

                    <!-- Config that applies to both master/slave here -->

                  </shared-store>

              </ha-policy>


              I'd move the <scale-down> element away from top level and only allow it with in the specific policy configuration.  If we do want to allow scale down to be defined for servers without and HA policy configured defined then I'd recommend creating a new HA policy type None, that can take scale down config.  For example:

               

              <none>

                <scale-down>...</scale-down>

              </none>

               

              Once you have this in place I don't see any reason to prepend element names with policy.  i.e.

               

              <replication-clustername>replicationClustername</replication-clustername>


              could just be:


              <ha-policy><replicated><cluster-name>...


              I can't comment on the indivual paramters since I don't understand what they are used for and in what circumstances.  However, I think it should be possible to group them according to the format above.  Other than the comments above I think this looks good.


              Cheers

              Martyn

              • 4. Re: 2.5.x HA policy integration into WildFly
                ataylor

                Hi Andy,

                 

                It seems that there are 3 different types of policy defined above:

                • Replicated
                • Shared Store
                • Co-located

                With in each of these types of policy there are 2 roles that a server can take, essentially the 2 roles are "master", "slave"(with slightly different semantics depending on the policy type).

                 

                I'd suggest grouping by policy type at level 1, then group by role at level 2.  Example:

                 

                <ha-policy>

                    <shared-store>

                      <!-- might be useful

                      <role="master">

                          <!-- master specific config here -->

                      </role>

                      <!-- Config that applies to both master/slave here -->

                    </shared-store>

                </ha-policy>

                 

                or

                 

                <ha-policy>

                    <shared-store>

                      <master>

                          <!-- master specific config here -->

                      </master>

                      <!-- Config that applies to both master/slave here -->

                    </shared-store>

                </ha-policy>


                You couldnt do exactly that because this needs defining in a schema and the schema differs depending on the role, this is the flattened config we are trying to move away from, what you could do is this tho:

                 

                <master>

                    <!-- master specific config here -->

                <master>


                Im not sure how this would work in colcoated tho as it is actually both master and slave.


                I'd move the <scale-down> element away from top level and only allow it with in the specific policy configuration.  If we do want to allow scale down to be defined for servers without and HA policy configured defined then I'd recommend creating a new HA policy type None, that can take scale down config.  For example:

                 

                <none>

                  <scale-down>...</scale-down>

                </none>

                Yeah i thought of that but couldn't decide, I'll go with what ever the majority want.

                Once you have this in place I don't see any reason to prepend element names with policy.  i.e.

                 

                <replication-clustername>replicationClustername</replication-clustername>


                could just be:


                <ha-policy><replicated><cluster-name>...

                +1, i actually have changed most of them to be like that

                • 5. Re: 2.5.x HA policy integration into WildFly
                  martyn-taylor

                  Andy Taylor wrote:

                   

                  You couldnt do exactly that because this needs defining in a schema and the schema differs depending on the role, this is the flattened config we are trying to move away from, what you could do is this tho:

                   

                  <master>

                      <!-- master specific config here -->

                  <master>

                   

                  Im not sure how this would work in colcoated tho as it is actually both master and slave.

                   

                  I think you missed my second example



                  • 6. Re: 2.5.x HA policy integration into WildFly
                    ataylor

                    yes i did

                    • 7. Re: 2.5.x HA policy integration into WildFly
                      martyn-taylor

                      You couldnt do exactly that because this needs defining in a schema and the schema differs depending on the role, this is the flattened config we are trying to move away from, what you could do is this tho:

                       

                      <master>

                          <!-- master specific config here -->

                      <master>


                      Im not sure how this would work in colcoated tho as it is actually both master and slave.

                      Sure, I can't see any issue with having both master and slave in the same config for co-located if that is what makes sense.

                      • 8. Re: 2.5.x HA policy integration into WildFly
                        ataylor

                        'd move the <scale-down> element away from top level and only allow it with in the specific policy configuration.  If we do want to allow scale down to be defined for servers without and HA policy configured defined then I'd recommend creating a new HA policy type None, that can take scale down config.  For example:

                         

                        <none>

                          <scale-down>...</scale-down>

                        </none>

                        Actually, the policy is scaledown not none so i think i prefer how i have it.

                        • 9. Re: 2.5.x HA policy integration into WildFly
                          martyn-taylor

                          I got the impression that scale down was a behavioural option given to other HA policies?  TBH I'm not sure that scale down makes sense as a HA policy, since it doesn't add any HA.  To me scale down seems more like cluster configuration, i.e. how to dynamically remove servers from the cluster without losing data.

                          • 10. Re: 2.5.x HA policy integration into WildFly
                            ataylor

                            I got the impression that scale down was a behavioural option given to other HA policies?  TBH I'm not sure that scale down makes sense as a HA policy, since it doesn't add any HA.  To me scale down seems more like cluster configuration, i.e. how to dynamically remove servers from the cluster without losing data.

                            Well it does provide a way of making a journal highly available in a non HA cluster, but i see you're point. Im not sure it belongs in a cluster configuration either tho.

                            • 11. Re: 2.5.x HA policy integration into WildFly
                              jmesnil

                              +1 for the modifications, they make it easier to understand and figure out how to setup HornetQ for different use cases.

                              • 12. Re: 2.5.x HA policy integration into WildFly
                                ataylor

                                So I am very near to completing the refactoring and want to run the final configuration past every one to make sure all are happy, I will explain each one:

                                 

                                <ha-policy>

                                      <!--only one of the following-->

                                      <!--on server shutdown scale down to another live server-->

                                      <live-only>

                                         <scale-down>

                                            <!--a grouping of servers that can be scaled down to-->

                                            <group-name>boo!</group-name>

                                            <!--either a discovery group-->

                                            <discovery-group>wahey</discovery-group>

                                         </scale-down>

                                      </live-only>

                                   </ha-policy>

                                This is a live only policy, no ha but can support scale down of a live server.

                                 

                                <ha-policy>

                                      <shared-store-master>

                                         <failback-delay>3456</failback-delay>

                                         <failover-on-shutdown>false</failover-on-shutdown>

                                      </shared-store-master>

                                   </ha-policy>

                                A shared store live server

                                 

                                <ha-policy>

                                      <shared-store-slave>

                                         <failback-delay>9876</failback-delay>

                                         <failover-on-shutdown>false</failover-on-shutdown>

                                         <restart-backup>false</restart-backup>

                                         <scale-down>

                                            <!--a grouping of servers that can be scaled down to-->

                                            <group-name>boo!</group-name>

                                            <!--either a discovery group-->

                                            <discovery-group>wahey</discovery-group>

                                         </scale-down>

                                      </shared-store-slave>

                                   </ha-policy>

                                A shared store backup server

                                 

                                <ha-policy>

                                      <replicated>

                                         <allow-failback>true</allow-failback>

                                         <group-name>purple</group-name>

                                         <check-for-live-server>true</check-for-live-server>

                                         <failback-delay>1111</failback-delay>

                                         <clustername>abcdefg</clustername>

                                      </replicated>

                                   </ha-policy>

                                A replicated live server

                                 

                                <ha-policy>

                                      <replica>

                                         <group-name>tiddles</group-name>

                                         <max-saved-replicated-journals-size>22</max-saved-replicated-journals-size>

                                         <clustername>33rrrrr</clustername>

                                         <restart-backup>false</restart-backup>

                                         <allow-failback>true</allow-failback>

                                         <failback-delay>444</failback-delay>

                                         <scale-down>

                                            <!--a grouping of servers that can be scaled down to-->

                                            <group-name>boo!</group-name>

                                            <!--either a discovery group-->

                                            <discovery-group>wahey</discovery-group>

                                         </scale-down>

                                      </replica>

                                   </ha-policy>

                                A replicated backup server

                                <ha-policy>

                                      <colocated>

                                         <backup-request-retries>44</backup-request-retries>

                                         <backup-request-retry-interval>33</backup-request-retry-interval>

                                         <max-backups>3</max-backups>

                                         <request-backup>false</request-backup>

                                         <backup-port-offset>33</backup-port-offset>

                                         <replication>

                                            <replicated>

                                               <allow-failback>true</allow-failback>

                                               <group-name>purple</group-name>

                                               <check-for-live-server>true</check-for-live-server>

                                               <failback-delay>1111</failback-delay>

                                               <clustername>abcdefg</clustername>

                                            </replicated>

                                            <replica>

                                               <group-name>tiddles</group-name>

                                               <max-saved-replicated-journals-size>22</max-saved-replicated-journals-size>

                                               <clustername>33rrrrr</clustername>

                                               <restart-backup>false</restart-backup>

                                               <scale-down>

                                                  <!--a grouping of servers that can be scaled down to-->

                                                  <group-name>boo!</group-name>

                                                  <!--either a discovery group-->

                                                  <discovery-group>wahey</discovery-group>

                                               </scale-down>

                                            </replica>

                                         </replication>

                                      </colocated>

                                   </ha-policy>

                                A colocated server that uses replication

                                 

                                <ha-policy>

                                      <colocated>

                                         <backup-request-retries>44</backup-request-retries>

                                         <backup-request-retry-interval>33</backup-request-retry-interval>

                                         <max-backups>3</max-backups>

                                         <request-backup>false</request-backup>

                                         <backup-port-offset>33</backup-port-offset>

                                         <shared-store>

                                            <shared-store-master>

                                               <failback-delay>1234</failback-delay>

                                               <failover-on-shutdown>false</failover-on-shutdown>

                                            </shared-store-master>

                                            <shared-store-slave>

                                               <failback-delay>44</failback-delay>

                                               <failover-on-shutdown>false</failover-on-shutdown>

                                               <restart-backup>false</restart-backup>

                                               <scale-down/>

                                            </shared-store-slave>

                                         </shared-store>

                                      </colocated>

                                   </ha-policy>

                                A colocated server that uses shared store.

                                 

                                Martyn, you had some ideas on a slightly different approach, could you post some sample config and we can decide which is best.

                                • 13. Re: 2.5.x HA policy integration into WildFly
                                  gaohoward

                                  I think may be it's good idea to put some common concepts into one group, like:

                                   

                                      <server-capabilities>

                                          <live-only ... />

                                          <scale-down ... />

                                          <slave ... />

                                          <master ... />

                                          <colocated ... />

                                      </server-capabilities>

                                   

                                      <discovery>

                                      </discovery>

                                   

                                      <data-synchronization>

                                        <shared-store ... />

                                        <replicated .../>

                                      </data-synchronization>

                                   

                                  It would make it easy to change/edit. For example if you can to change a server from live-only to master, you only need to change the <server-capabilities> and other parts remain intact.

                                   

                                  Just my 2 cents. Not think it through however.

                                  • 14. Re: 2.5.x HA policy integration into WildFly
                                    martyn-taylor

                                    Hi Andy,

                                     

                                    The only change I 'd suggest is to split the HA Policy type and the server role with in that policy type out into separate elements.

                                     

                                    For example:


                                    <ha-policy>

                                      <replicated> <!-- HA Policy Type -->

                                        <replicated-master> <!-- Replicated policy server role -->

                                     

                                    I think splitting the policy type and the server role out is more declarative.  It allows us to split any configuration associated with the policy type vs the particular server role out from within the policy element.  I realise that there is only currently 1 or 2 places where this applies, but keeping the policy/role separate gives us more flexibility moving forward (when adding new ha-policies) and also keeps the option open for potential features like cluster role negotiation.  I could imagine something like this:

                                     

                                    <replicated>

                                      <min-replicas>2</min-replicas>

                                    <replicatied>

                                     

                                    The servers in the cluster then figure out how to best setup the topology using some voting system.

                                     

                                    In short I think there are 2 things we are defining/configuring.

                                     

                                    1. The HA Policy Type: This is akin to standard well known policy models like replication, shared-store etc...

                                    2. The Role that the server will take with in that HA policy type.  i.e. how each server is configured to realise the policy: replication-slave, shared-store-master.  etc...

                                     

                                    I think by grouping these 2 things into one configuration element we are adding complexity and restricting ourselves for future enhancements.

                                     

                                    This is the only suggestion I have.  Other than this I think the the new configuration structure is great.

                                     

                                    Thanks

                                    Martyn

                                    1 2 Previous Next