3 Replies Latest reply on Nov 27, 2006 9:37 AM by Jason Sicotte

    Slow DataSouce Failover

    Jason Sicotte Newbie

      I am experiencing a slow failover when simulating a network failure. My setup consists of the following: Two servers, each with an instance of JBoss and MySQL running on them. I will refer to the servers as a "primary" and "backup". Both the primary and backup have their DataSources set to point to the MySQL instance running on the primary server. They are HA DataSources, so they also point to the backup server. I simulate a network failure by unplugging the primary from a network hub. Then the following happens:
      1) The backup server/node detects the failure.
      2) Several minutes go by after the cluster failure, and then our app is deployed.

      After playing with Log4j settings and some more testing, I have narrowed the issue down to a set of failures:

      2006-10-26 15:50:34,843 INFO [org.hibernate.connection.DatasourceConnectionProvider] Using datasource: java:/DefaultDS
      
      2006-10-26 15:51:06,343 WARN [org.jboss.resource.adapter.jdbc.local.HALocalManagedConnectionFactory] Destroying connection that is not valid
      
      2006-10-26 15:51:50,484 WARN [org.jboss.resource.adapter.jdbc.local.HALocalManagedConnectionFactory] Destroying connection that is not valid
      
      006-10-26 15:52:28,437 WARN [org.jboss.resource.adapter.jdbc.local.HALocalManagedConnectionFactory] Destroying connection that is not valid
      
      2006-10-26 15:52:49,578 WARN [org.jboss.resource.adapter.jdbc.local.HALocalManagedConnectionFactory]
       Failed to create connection for jdbc:mysql://nscluster-3:4589/netsight: Communications link failure


      Since the DataSouce connection pool size is set to a minimum of three, it seems that a database connection is not considered dead until all thread pool connections are timed out. The timeouts take a little over two minutes to complete, and if the minimum thread pool size was larger, the failover time would be protracted even more. Is there any way to reduce the duration of the timeouts and/or speed up this process?

        • 1. Re: Slow DataSouce Failover
          Weston M. Price Master

           


          Since the DataSouce connection pool size is set to a minimum of three, it seems that a database connection is not considered dead until all thread pool connections are timed out. The timeouts take a little over two minutes to complete, and if the minimum thread pool size was larger, the failover time would be protracted even more. Is there any way to reduce the duration of the timeouts and/or speed up this process?


          What you are talking about is the notion of a 'purge policy' wherein pooled connections are destroyed based upon a certain condition without waiting for them to be 'validated'. I have a working experimental branch with this functionality. My plan is to have this in place for JBoss 4.2.



          • 2. Re: Slow DataSouce Failover
            Weston M. Price Master

            Note, one of the reasons for the behavior that you are seeing is that prior to 4.0.5, JBoss/JCA validated a connection prior to removing it from the pool for *each* getConnection attempt. As a result, the entire pool of connections had to be exhausted prior to attempting to obtain a new connection. This is the 'slow' failover you are seeing being that every connection has to be checked.

            With 4.0.5 background connection validation has been added where connection validation occurs in a background thread. As a result, connection validation for getConnection() can be disabled. Enabling background validation casues the validator to run and periodically removing invalid connections from the pool. While this does not address your problem specifically, choosing background validation can give you the option to disable the validate on match behavior.

            The issue in destroying the entire pool when a validation error occurs is that the condition may be temporary (ie network glitch, transient DB failure). Destroying the entire pool when this happens can ultimately become quite expensive.

            The purge policy discussed earlier will destroy the entire pool during a connection *failure*, not on validation. Have you considered setting the socketTimeout or experimenting with the other JDBC level properties to see if you can get the wait time down?

            • 3. Re: Slow DataSouce Failover
              Jason Sicotte Newbie

               

              With 4.0.5 background connection validation has been added where connection validation occurs in a background thread.


              We are currently using 4.0.4, I will look into using 4.0.5.

              Have you considered setting the socketTimeout or experimenting with the other JDBC level properties to see if you can get the wait time down?


              I have not been able to find a DataSource option that configures TCP/IP timeout, but I did try setting <query-timeout> to 1. I am still looking for some possible Connector/J options to tweak.