1 2 3 4 Previous Next 52 Replies Latest reply on Nov 27, 2006 10:12 AM by clebert.suconic

    Server side HA and failover

    timfox

      I would like to use this thread for discussing the implementation of server side failover and HA. (As opposed to the other thread which concentrates more on the client side).

      I am particularly interested in the following questions:

      1) How much of AS server side failover code we can re-use. (Brian?)

      2) How this is going to work in detail. How can we maintain a consistent mapping between nodes and their failover nodes. Is this maintained in the JGroups group state, or can we just use the JGroups view?

      3) How we can ensure that the client fails over to the same server as the server fails over onto.

      4) How we ensure that the client doesn't start sending requests to it's failover server after it has failed over, before the server has failed over.

      5) We need to make sure we tackle any other race conditions and edge cases.

        • 1. Re: Server side HA and failover
          brian.stansberry

          Very quick reply; I'll have to think more...

          1) Probably relatively little of the *existing* AS server side code. The AS code is based around the DistributedReplicantManager interface, the implementation of which uses HAPartition, which is built on an RpcDispatcher. In AS 5 it may use JBoss Cache. Neither matches what you're doing very well. The interface and some concepts probably should be reused, but we'd need a new implementation.

          The stuff that should be reused is the interceptor model. The AS currently has two interceptor models; the old pre-aop model used by most clustered services and the Remoting- and aop-based EJB3 model. Tom Elrod is working to come up with a unification of the two, based on the EJB3 one. Clearly it's best if Messaging can use the same stuff. I'll look at the EJB3 stuff this weekend and reply with thoughts and how to move forward.

          2) The JGroups view can't be used directly, as a JGroups Address is not suitable for use as a Remoting InvokerLocator. At minimum a different port. Quite possibly a different IP address (e.g. client connections use a different NIC than clustering traffic.) I think the thing to do is something like adding another Request type where a server publishes its InvokerLocator(s); the rest of the cluster maintains a map of that data. That's basically the DistributedReplicantManager concept.

          3) That's tricky. The AS' model for telling the client about the cluster topology is to piggy-back any new topology on the response to a client request. So, client doesn't know new topology until it successfully makes a request, possibly after a failover. So, we can't count on the client and server making the same decision (e.g. failover to next node in list) because the "list" may be different on client and server.

          Perhaps an approach where the client fails over to anyone, but if the server isn't the appropriate one it communicates that back to the client, along with the updated topology and info as to who the failover server should be.

          Stupid question: why does the client need to fail over to a particular server? The failover server needs to do work to recover persistent messages from the store, but once it's done that, aren't they available to a client connecting to any server?

          • 2. Re: Server side HA and failover
            brian.stansberry

            I had a look at the DRM code and was able to refactor it into an abstract version that can easily be subclassed with a version meant to integrate with your ClusteredPostOffice.

            So there are some useful classes from the AS for use on the server side. I created some UML for this, but the free version of the plugin wouldn't export an image :(

            1) (impl of) DRM. DRM basically maintains a distributed version of a Map<String key, Map<String nodeId, Object replicant>>, plus registration/notification of listeners when the data in the map changes. The replicant in the map is whatever you need; in the Remoting case it would be an InvokerLocator. DRM also calculates a hash of the replicants associated with a key; this hash is meant to be passed back and forth between the client and server; an out-of-date hash from the client tells the server it needs to send an updated cluster topology.

            2) An HATarget instance listens to the DRM for changes to a given key and exposes the current hash and list of replicants.

            3) ReplicantsManagerInterceptor is a server-side AOP interceptor that compares the hash sent by the client to the one held by the HATarget. If mismatch, get the current replicant list from target and packages it with the response.

            • 3. Re: Server side HA and failover
              brian.stansberry

              Question:

              What is the 'cluster topology' the client needs to track? Is it the set of JBoss Messaging servers; i.e. there's only one topology? Or does it vary by service (e.g. queue/topic), i.e. the topology for Queue A is different from Queue B because A is deployed on some nodes on B some other ones.

              • 4. Re: Server side HA and failover
                timfox

                 

                "bstansberry@jboss.com" wrote:


                The stuff that should be reused is the interceptor model.



                This is the client side interceptor model?



                2) The JGroups view can't be used directly, as a JGroups Address is not suitable for use as a Remoting InvokerLocator. At minimum a different port. Quite possibly a different IP address (e.g. client connections use a different NIC than clustering traffic.) I think the thing to do is something like adding another Request type where a server publishes its InvokerLocator(s); the rest of the cluster maintains a map of that data. That's basically the DistributedReplicantManager concept.



                I think we would need to replicate both the remoting invoker locator and the JGroups address.
                The client needs to know the locator list but a particular server also needs to know who it's failover server(s) are, this is so it can do stuff like cast messages to it's failover servers when we do in memory message replication. A JGroups address would be ideal for this.
                We should be able to support multiple failover servers for a particular server.



                Perhaps an approach where the client fails over to anyone, but if the server isn't the appropriate one it communicates that back to the client, along with the updated topology and info as to who the failover server should be.


                I like this idea.



                Stupid question: why does the client need to fail over to a particular server? The failover server needs to do work to recover persistent messages from the store, but once it's done that, aren't they available to a client connecting to any server?


                Each server maintains it's own set of queues each containing different sets of messages.
                So when a node fails the failover node needs to take over responsibility for the queues on the node that failed.
                This means creating those queues in memory and loading any persistent messages from storage into those queues (if in memory replication is being used the messages will already be there so no need to load from storage).
                This means clients need to reconnect the node which now hosts those queues so they can continue as if nothing happened.

                As an added complication we also need to decide what happens when the failed node comes back to life.

                Does it take over responsbility for the original queues it has?

                Actually I don't think this is possible unless there are no clients using those queues on the failover node, since we would have to disconnect and reconnect the connections on the resurrected node which would mean losing any persistent messages delivered in those sessions which would be unacceptable.


                • 5. Re: Server side HA and failover
                  timfox

                  Hmmm.. Come to think of it we could probably allow movement of queues as long as there are no unacked messages in any delivery lists for those queues, then we could safely move them.

                  • 6. Re: Server side HA and failover
                    timfox

                     

                    "bstansberry@jboss.com" wrote:
                    Question:

                    What is the 'cluster topology' the client needs to track? Is it the set of JBoss Messaging servers; i.e. there's only one topology? Or does it vary by service (e.g. queue/topic), i.e. the topology for Queue A is different from Queue B because A is deployed on some nodes on B some other ones.


                    This is an interesting question.

                    Currently all queues and topics share the same topology.

                    I'm not sure how much value add there is in supporting multiple topologies. What do others think?

                    • 7. Re: Server side HA and failover
                      brian.stansberry

                       


                      "brian .stansberry" wrote:

                      Stupid question: why does the client need to fail over to a particular server? The failover server needs to do work to recover persistent messages from the store, but once it's done that, aren't they available to a client connecting to any server?


                      Each server maintains it's own set of queues each containing different sets of messages.
                      So when a node fails the failover node needs to take over responsibility for the queues on the node that failed.
                      This means creating those queues in memory and loading any persistent messages from storage into those queues (if in memory replication is being used the messages will already be there so no need to load from storage).
                      This means clients need to reconnect the node which now hosts those queues so they can continue as if nothing happened.


                      I'm still not sure I understand. I completely understand about what the failover server needs to do; just not certain about why it matters which node the client connects to.

                      Is it because the recovered messages were part of a session, and to properly maintain the guaranteed order of delivery of messages from a particular JMS session, the client has to connect to the correct server?

                      • 8. Re: Server side HA and failover
                        timfox

                         

                        "bstansberry@jboss.com" wrote:


                        I'm still not sure I understand. I completely understand about what the failover server needs to do; just not certain about why it matters which node the client connects to.

                        Is it because the recovered messages were part of a session, and to properly maintain the guaranteed order of delivery of messages from a particular JMS session, the client has to connect to the correct server?


                        Because the queues the client is consuming from will only exist on one node (the node that the original node failed over to), so the client must connect to that one.

                        • 9. Re: Server side HA and failover
                          brian.stansberry

                          OK; I knew I didn't understand something fundamental. I thought the same queues existed on every node, based on this in the NewClusteringDesign wiki:

                          This gives us the freedom to implement the distributed queue as a set of partial queues, one on each node of the cluster.



                          If the queue doesn't exist on each node in the cluster, this implies then that there *is* a separate cluster topology for each queue/topic.

                          • 10. Re: Server side HA and failover
                            timfox

                            It is implemented as a set of partial queues, one on each node.

                            Let's discuss this in the meeting :)

                            • 11. Re: Server side HA and failover
                              clebert.suconic

                              I have *tried* to drive fail over from client side.

                              Last week I could make a state transfer from one node to another, moving all the messages from a failed queue to a new queue.

                              There are implications of doing this way as we will could have more than one Local Queues when a failover occurs.

                              Say if you have two nodes A and B, each node with a client and a durable subscription on that client.

                              Now say if node A fails. Now the clients from node A will be redirect into node B, and Routers will have to treat B's connections as they were still on nodeA but now they are local on node B.

                              Also any producer existent on NodeA will come to nodeB having preferences to send messages to the former NodeA.

                              These are all implications of the Partial Queue implementation existent in our cluster. I'm at this point looking at the code on how we could deal with this.


                              --Clebert

                              • 12. Re: Server side HA and failover
                                timfox

                                 

                                "clebert.suconic@jboss.com" wrote:
                                I have *tried* to drive fail over from client side.

                                Last week I could make a state transfer from one node to another, moving all the messages from a failed queue to a new queue.


                                As already discussed, moving messages is not an option since there may be 10s of millions of messages in each partial queue.

                                Also fully client driven failover won't work since we need to ensure JMS semantics with durable subscriptions - especially when doing in memory persistent message replication - this we have already discussed too.

                                "Clebert" wrote:

                                There are implications of doing this way as we will could have more than one Local Queues when a failover occurs.

                                Say if you have two nodes A and B, each node with a client and a durable subscription on that client.

                                Now say if node A fails. Now the clients from node A will be redirect into node B, and Routers will have to treat B's connections as they were still on nodeA but now they are local on node B.


                                Why is this a problem?


                                • 13. Re: Server side HA and failover
                                  clebert.suconic

                                   

                                  Tim wrote:
                                  As already discussed, moving messages is not an option since there may be 10s of millions of messages in each partial queue.


                                  I"m just pointing out what we discuseed on the other thread.

                                  But anyway, If you have 10s of millions of messages you still need to load these messages anyway.

                                  Tim wrote:
                                  Why is this a problem?


                                  What we already discussed about a PostOffice only accepting one binding per name identified by a String. We could use the nodeID on the ChannelName but on nodeB it will still create a new channel onde nodeB what would require state transfer between one node and another node.



                                  • 14. Re: Server side HA and failover
                                    timfox

                                     

                                    "clebert.suconic@jboss.com" wrote:


                                    But anyway, If you have 10s of millions of messages you still need to load these messages anyway.



                                    Actually, no.

                                    If you have 10s of millions of messages then 99.99% of them will probably be paged in storage.

                                    You only need to load those few thousand or so that will fit in memory.

                                    This is a big difference.

                                    See http://labs.jboss.com/file-access/default/members/jbossmessaging/freezone/docs/guide-1.0.1.GA/html/configuration.html#conf.destination.paging

                                    "Clebert" wrote:

                                    Tim wrote:
                                    Why is this a problem?


                                    What we already discussed about a PostOffice only accepting one binding per name identified by a String. We could use the nodeID on the ChannelName but on nodeB it will still create a new channel onde nodeB what would require state transfer between one node and another node.



                                    We just need to extend the model slightly to cope with this.

                                    1 2 3 4 Previous Next