1 2 3 Previous Next 36 Replies Latest reply on Jan 22, 2007 12:14 PM by clebert.suconic

    JBMESSAGING-674 - Propagating changes on ClusteredConnection

    clebert.suconic

      On the process of Updating ConnectionFactories (CF) I'm thinking about using the CallbackManager and update the ConnectionFactory for active connections only (as I have talked about this with Ovidiu).

      The way it should work is during a Clustered::createConnection, ClusteredAspect will register the CF on the CallbackManager for the active connection. The server will callback on all active Connections when a CF was updated, sending also the uniqueId for the CF (configured on the CF Mbean). I will also add the uniqueID on CFs.

      The side effect for this is case a ConnectionFactory has two ore more active connections, it will be updated several times (one per active connection). If we want to fix this we would need a singleton CallbackManager. (we could either refactor the CallbackManager to be singleton per VM or we could create an extra CallbackManager just to work on CFs). I will start coding without change anything on CallbackManagers for now, and we could refactor this later if we require to.

        • 1. Re: JBMESSAGING-674 - Propagating changes on ClusteredConnec
          clebert.suconic


          Another discussed possibility would be to make delegates and failoverMaps static (somehow in a HashMap by uniqueId) and intercept serialization calls (like write/readExternal) to inject the CF in a CallbackManager but we would still need the singleton CallbackManager for this... so I'm not going to use this idea but i wanted to keep it registered here on the forum as a brain storm possibility.

          • 2. Re: JBMESSAGING-674 - Propagating changes on ClusteredConnec
            timfox

            Using a static map seems a lot simpler and clearer to me.

            AFAIK this is how JBoss AS failover works too.

            • 3. Re: JBMESSAGING-674 - Propagating changes on ClusteredConnec
              clebert.suconic

              I'm thinking about how to fix this race condition:


              I - ClientClusteredConnectionFactoryDelegate has this following failoverMap:

              Server 1->Server2->Server3->Server 1

              II - Server1 is killed

              III - Now the CF::failoverMap is updated to:
              Server2->Server3->Server2

              IV - A failure happens on a connection directed on Server1. FailoverMap doesn't have the information about Server1 any more.


              I will keep coding and maybe we will need to have the failoverMap on the Connection instead of ConnectionFactory, and having failover updating these fields on Connection only when a failure happens.

              As I said I am thinking about how to do it. I wanted to post this just to keep the team informed.


              • 4. Re: JBMESSAGING-674 - Propagating changes on ClusteredConnec
                clebert.suconic

                I wanted to have a static Collection/Map on server side for all active collections. I could navigate on JMSDispatcher for this but I wanted to capture ServerConnectionFactoryEndpoint::createConnection and ServerConnectionEndpoint::close to keep a list of these active connections.

                This is to send the notification for all active connections on update their respective CFs. I was going to add this into JMSDispatcher but didn't look like a proper place.

                Any ideas where I could put such static map?

                • 5. Re: JBMESSAGING-674 - Propagating changes on ClusteredConnec
                  clebert.suconic

                  Never mind my latest post... I will put this into ServerConnectionFactoryEndpoint... a ServerConnectionFactoryEndpoing will know the connection it has created.

                  • 6. Re: JBMESSAGING-674 - Propagating changes on ClusteredConnec
                    clebert.suconic

                    The way I found to skip the race condition was to guess the failoverId based on the order on the failoverMap.

                    So, in case we can't find the nodeID on failoverMap (meaning the CF was already updated) I will use a method that I'm calling guessFailoverId:

                    
                    class ClusteringAspect...
                    
                    
                     public static Integer guessFailoverID(Map failoverMap, Integer nodeID)
                     {
                     Integer failoverNodeID = null;
                     Integer[] nodes = (Integer[])failoverMap.keySet().toArray(new Integer[failoverMap.size()]);
                     // We need to sort the array first
                     Arrays.sort(nodes);
                     for (int i = 0; i < nodes.length; i++)
                     {
                     if (nodeID.intValue() < nodes.intValue())
                     {
                     failoverNodeID = nodes;
                     break;
                     }
                     }
                     // if still null use the first node...
                     if (failoverNodeID==null)
                     {
                     failoverNodeID = nodes[0];
                     }
                     return failoverNodeID;
                     }
                    
                    



                    I want to keep this method as public static just because on the process of writing it I wrote a testcase for it... I haven't committed it yet:

                    org.jboss.test.messaging.jms.clustering.ClusteringAspectInternalTest


                    Any objections on a test for internal methods (not part of the API)?

                    • 7. Re: JBMESSAGING-674 - Propagating changes on ClusteredConnec
                      timfox

                       

                      "clebert.suconic@jboss.com" wrote:
                      I'm thinking about how to fix this race condition:


                      I - ClientClusteredConnectionFactoryDelegate has this following failoverMap:

                      Server 1->Server2->Server3->Server 1

                      II - Server1 is killed

                      III - Now the CF::failoverMap is updated to:
                      Server2->Server3->Server2



                      I can't see how this would happen.

                      When a server dies, the new failover map is created on the server according to the algorithm.

                      The algorithm would never create such a mapping.

                      So this is a non problem AFAICT.

                      Am I missing something?

                      • 8. Re: JBMESSAGING-674 - Propagating changes on ClusteredConnec
                        timfox

                        Also remember that the failover map on the client is never guaranteed to exactly mirror the server side map.

                        This is why we built in "server hopping", where if the client got the wrong server it wouuld just redirect to the correct server.

                        In the most degenerate case, even if you choos a random server it would still redirect to the corrects server, so even if slightly inefficient would end up on the correct failover server eventually.

                        • 9. Re: JBMESSAGING-674 - Propagating changes on ClusteredConnec
                          clebert.suconic

                          Well.. I could replicate this in a testcase.. so it is a problem.


                          You have Server1, Server2, and Server3

                          Then you have a map:

                          1->2
                          2->3
                          3->1

                          Now you kill server1:

                          Map now is:

                          2->3
                          3>2

                          You don't have 1 on the MAP.

                          When a connection on server1 fails.. it won't find an ID on failoverMAP. I could replicate this on a testcase.

                          • 10. Re: JBMESSAGING-674 - Propagating changes on ClusteredConnec
                            clebert.suconic

                            Also.. that's why I called this a race condition.

                            Case the failure is captured after the ConnectionFactory is updated for some reason... you would have a NullPointerException.

                            This is not a problem on the hopping... it's a matter of not finding anything on failoverMap what would cause a NPE. That's why I introduced the guessing routine and I will trust on the Hopping case the guess didn't find the right server. (unlikely to happen.. the guessRoutine seemed to be pretty good to me)

                            • 11. Re: JBMESSAGING-674 - Propagating changes on ClusteredConnec
                              timfox

                              Clebert - I am somewhat baffled about the approach you are taking here.

                              Can you try and describe it clearly (and slowly) so I can understand it?

                              Thanks.

                              • 12. Re: JBMESSAGING-674 - Propagating changes on ClusteredConnec
                                timfox

                                 

                                "clebert.suconic@jboss.com" wrote:
                                Well.. I could replicate this in a testcase.. so it is a problem.


                                You have Server1, Server2, and Server3

                                Then you have a map:

                                1->2
                                2->3
                                3->1

                                Now you kill server1:

                                Map now is:

                                2->3
                                3>2

                                You don't have 1 on the MAP.

                                When a connection on server1 fails.. it won't find an ID on failoverMAP. I could replicate this on a testcase.


                                Surely you shouldn't update the failover map after node failure until all connections have failed over from the old server?

                                Then you can prevent this "race condition" from happening in the first place?

                                • 13. Re: JBMESSAGING-674 - Propagating changes on ClusteredConnec
                                  clebert.suconic

                                   

                                  Then you can prevent this "race condition" from happening in the first place?


                                  The server side doesn't know about failed Connections... it will send the callback as soon as the view was changed.

                                  To replicate this race condition I had to disable Lease on the client side. And have connections opened on different servers.

                                  Maybe it wouldn't occur in regular circunstances, but I wanted to prevent such behavior.

                                  The guess routine could find the right failoverServer fixing the race condition.

                                  Holding the update somewhere waiting failover to perform on all connections would require a lot of coding... while the simple guess routine could fix the problem.

                                  • 14. Re: JBMESSAGING-674 - Propagating changes on ClusteredConnec
                                    timfox

                                    No need to guess, just choose a random server and hopping will take care of the rest.

                                    1 2 3 Previous Next