7 Replies Latest reply on Jan 9, 2007 4:38 AM by ovidiu.feodorov
      • 1. Re: Failover analysis
        clebert.suconic

        My comments regarding the WIKI page:

        >>2. Failure detected when sending a new invocation into the server

        For new connections, Remoting will mask any IOException as org.jboss.remoting.somewhere.CannotConnectException.

        >>3. Failure detected during an in-flight invocation

        On this case you get an IOException indeed.


        >> Failure Handling

        "Ovidiu on Wiki Page" wrote:
        If there are active threads traversing the valve at the moment when "close" command arrives, those threads must be interrupted and put to wait until the valve opens again


        There is no way to interrupt those threads, but on the event of a failure, all of the inflight invocations are going to fail at the same time, and all of them will capture the failure trying to close the valve at the same time.


        This scenario is already implemented on Clebert_Third_Failover Branch.

        There are slightly differences on the way implemented:

        - I only have one valve, so there is no recursivety on closing the valve.
        - There is no need for the "command center" I guess, since a call on performFailover on ConnectionDelegate is already equivalent to "a failure has happened".



        • 2. Re: Failover analysis
          ovidiu.feodorov

           

          Clebert wrote:

          Ovidiu on Wiki Page wrote:
          If there are active threads traversing the valve at the moment when "close" command arrives, those threads must be interrupted and put to wait until the valve opens again



          There is no way to interrupt those threads, but on the event of a failure, all of the inflight invocations are going to fail at the same time, and all of them will capture the failure trying to close the valve at the same time.


          Not in a portable way, I agree. But then I said that a possibility would be to close the valve regardless of any active thread, since the active threads will have no choice but fail anyway shortly.

          For that, we need to make sure that:
          1. We can close a valve with active threads traversing it
          2. Once a valve is closed, it still handles correctly a downstream failure.


          • 3. Re: Failover analysis
            ovidiu.feodorov

            Keeping a single centralized valve per connection gives a single point of bottleneck. All threads invoking into any delegate will have to acquire/release the synchronization element of that valve. This will lead to a lot of contention.

            Distributing the load across different valve instances will relieve some of this pressure, with no apparent drawback.

            • 4. Re: Failover analysis
              ovidiu.feodorov

              Even if we keep a single valve instance per connection, I see no reason to involve remotingConnection at such low level, as you do in

              JMSRemotingConnection remotingConnection = null;
              
               try
               {
               valve.enter();
              
               // it's important to only retrieve the remotingConnection while inside the Valve, as we
               // guarantee that no failover has happened yet
               remotingConnection = connectionState.getRemotingConnection();
               return invocation.invokeNext();
               }
               catch (CannotConnectException e)
               {
               log.warn("We got a CannotConnectionException and we are trying a failover", e);
               ((ConnectionDelegate)connectionState.getDelegate()).performFailover(remotingConnection);
               return invocation.invokeNext();
              
               }
               catch (IOException e)
               {
               log.warn("We got an IOException and we are trying a failover", e);
               ((ConnectionDelegate)connectionState.getDelegate()).performFailover(remotingConnection);
               return invocation.invokeNext();
               }


              Why don't we just message the connection: "there's failure, deal with it!".

              The connection has access to the proper remoting connection instance, why does it need to receive as an argument of the call?

              • 5. Re: Failover analysis
                ovidiu.feodorov

                What's the use case for an re-entrant lock? If the valve is distributed among delegates, you probably don't need that, hence reduced complexity ...

                • 6. Re: Failover analysis
                  timfox

                   

                  "ovidiu.feodorov@jboss.com" wrote:
                  Keeping a single centralized valve per connection gives a single point of bottleneck. All threads invoking into any delegate will have to acquire/release the synchronization element of that valve. This will lead to a lot of contention.

                  Distributing the load across different valve instances will relieve some of this pressure, with no apparent drawback.


                  I don't really agree with this. You would only get a lot of contention if the threads were all attempting to get the same write lock, but in the normal case they would be getting the read lock, and multiple read locks can obtained at any one time - this is kind of the whole point of read locks.

                  There may be a very small synchronized region in actually executing the call to get the read lock but this is probably insignificant.

                  If we can reduce the scope for deadlock and make the code simpler by using a single pair of locks I would prefer that solution.

                  • 7. Re: Failover analysis
                    ovidiu.feodorov

                    OK.

                    One pair of locks shared by all delegates belonging to one connection.