1 2 3 4 Previous Next 52 Replies Latest reply on Nov 27, 2006 10:12 AM by clebert.suconic Go to original post
      • 45. Re: Server side HA and failover
        timfox

         

        "clebert.suconic@jboss.com" wrote:
        I've created a new class called LeaveClusterRequest.
        This request is sent over the cluster when PostOffice.stop is called.


        Yep, that's the way to do it for now, as discussed.

        While chatting with Bela at JBW, he said he's going to look at extending the JGroups API so we can have the information of whether the node crashed or not directly.

        Although this won't be available for a while though.

        • 46. Re: Server side HA and failover
          timfox

           

          "clebert.suconic@jboss.com" wrote:

          Now... viewAccept will perform a failOver if a view is changed. I still need to add some logic to have only one server accepting the failOver but that is pretty easy.


          The most trivial way of doing this is to define a failover node F of a failed node A as the node to the right (or left, it doesn't matter) of the failed node in the jgroups view. Also need to think of the view as a "ring".

          E.g. if the jgroups view has addresses:

          A
          B
          C
          D
          E

          then node A fails over to node B, node B fails over to node C, node C fails onto node D, node D fails onto node E, node E fails onto node A

          Questions:

          Is this sufficient for our needs?

          Is there any sense in supporting multiple failover nodes for a single node? Or does that make no sense?

          Should the policy really be pluggable? (Probably yes)



          • 47. Re: Server side HA and failover
            timfox

            Ok, so it looks like good progress is being made on failover :)

            What is there left to do?

            My understanding is the remaining pieces are:

            Load balancing policy for determining which server to initially connect to. (Re-use from remoting).

            Mechanism for propagating changes in the client side server list from the server to the client when a jgroups view change occurs. I.e. when a new node joins / leaves. (Do we re-use from remoting here too?)

            "Valve" functionality to stall any activity on connections when server failover is occurring.

            Replaying of delivered messages to the ServerConmsumerEndpoint so the delivery list can be recreated.

            Anything else?



            • 48. Re: Server side HA and failover
              timfox

               

              "clebert" wrote:

              We might need a design session to discuss what we have accomplished though.


              Yes we should definitely do this ASAP so we can evaulate where we are and what remains to be done.

              When does America come back after Thanksgiving?

              • 49. Re: Server side HA and failover
                clebert.suconic

                 

                when node A fails over to node B, node B fails over to node C, node C fails onto node D, node D fails onto node E, node E fails onto node A

                Questions:

                Is this sufficient for our needs?


                The logic you described above, is what I though for the logic on fail over.

                Is there any sense in supporting multiple failover nodes for a single node? Or does that make no sense?


                As the implementation stands now, I guess it's not possible to have multiple nodes taking care of a single failure.

                Should the policy really be pluggable? (Probably yes)


                I can implement it through an interface/abstract class, however I don't see other policies being implemented.


                • 50. Re: Server side HA and failover
                  timfox

                   

                  "clebert.suconic@jboss.com" wrote:

                  As the implementation stands now, I guess it's not possible to have multiple nodes taking care of a single failure.


                  Fine. But when we do in memory message replication this becomes more important. We will want to be able to replicate messages to more than one other server, and in the case the failover server fails too, we still have the messages on the second failover node. (This is similar to buddy replication groups in JbossCache)


                  I can implement it through an interface/abstract class, however I don't see other policies being implemented.


                  See previous comment.

                  • 51. Re: Server side HA and failover
                    clebert.suconic

                     

                    Tim wrote:
                    Ok, so it looks like good progress is being made on failover :)

                    What is there left to do?

                    ...

                    Load balancing policy for determining which server to initially connect to. (Re-use from remoting).


                    At this point, there is a HAConnectionFactory, registered in JNDI, that will LoadBalance Connectionso on createConnection.

                    Tim wrote:
                    Mechanism for propagating changes in the client side server list from the server to the client when a jgroups view change occurs. I.e. when a new node joins / leaves. (Do we re-use from remoting here too?)


                    There is one thing I've done on this direction. If you do a lookup on HAConnectionFactory, you will have the list updated on the new de-serialized HAConnectionFactory. I don't know yet how we could update existent instances.


                    Tim wrote:
                    "Valve" functionality to stall any activity on connections when server failover is occurring.


                    I'm not sure... but I guess we are locking write on failOver. Isn't that enough? We shall test it anyway.

                    Tim wrote:
                    Replaying of delivered messages to the ServerConmsumerEndpoint so the delivery list can be recreated.


                    What you mean? I didn't understand.

                    Tim wrote:
                    Anything else?


                    - Finish implementing the FailOverPolicy (Next node assumes the previous node)
                    - Establish a relationship between HAConnectionFactories and specific ConnectionFactories. (In case there is more than one Connection Factory, e.g. HTTP, JMS...)
                    - Failure on HA at this point is only done when ConnectionListener receives an event, which is pretty much done only at leasing. We should also change interceptors to take actions when Exceptions are occuring and take actions such as retry... retry... failOver
                    - We should have a MAP of failed nodes. (When a node assumes another node)
                    - The redirect protocol. Are you ready for this node? (What would be a good method name BTW?)
                    - Go over the current design on HAConnectionFactories.

                    • 52. Re: Server side HA and failover
                      clebert.suconic

                       

                      Tim wrote:
                      Fine. But when we do in memory message replication this becomes more important. We will want to be able to replicate messages to more than one other server, and in the case the failover server fails too, we still have the messages on the second failover node. (This is similar to buddy replication groups in JbossCache)


                      We (or simply I) will need to understand how the local queue will take effect when we have that implemented. That's the only point on the fail over with multiple nodes as the queue is transfered to the failedNode.

                      1 2 3 4 Previous Next