10 Replies Latest reply on Aug 27, 2008 4:51 AM by jesper.pedersen

    [JBAS-5095] Race condition between connection.close() and tr

    jesper.pedersen

      I have been looking at subject for both 5.0.CR2 and 4.2.4.GA.

      As Adrian states there is a race condition between connection.close() and an asynchronous transaction completion (e.g. transaction timeout) that can lead to a leaked connection permit with eventual pool exhaustion when using track-connection-by-tx (i.e. all local resources).

      The fix (has he states) is to check that the number of handles are zero and the connection no longer is track-by-tx in an atomic way.

      I have introduced a reentrant lock that is used when isManagedConnectionFree() is used in combination with returnManagedConnection(). The lock is defined as an instance variable in TxConnectionManager - and the code is guarded using the lock() / unlock() methods.

      The overhead of the reentraant lock is minimal when operating with track-by-tx == false and the lock only comes into play in afterCompletion() when track-by-tx is true.

      The patch have been attached to the JIRA issue for review.

      Any feedback is welcomed !

        • 1. Re: [JBAS-5095] Race condition between connection.close() an

          There's no need for a reentrant lock, a plain synchronize would do.
          This code cannot recurse and fair queuing is irrelevant.

          The returnManagedConnection() needs to be outside the lock,
          otherwise you'll get all sorts of deadlocks with other locks, e.g. the permits in the pool.

          The ConnectionManager should not be locking anything when it invokes the pool
          (there's all sorts of bizarre paths/callbacks between the pool and CM).

          if the fix was as simple as this patch, then I'd have done it rather than
          introducing the workaround. :-)

          The fix is to replace isManagedConnectionFree() with wasManagedConnectionFreed().
          i.e. the managed connection might now be free, but it wasn't us that freed it
          => do it only once

          Most of the fix, is not the patch, it is some stress testing to make sure there's no hole
          in the logic, e..g connection leaks, double release, etc.

          Not just when racing with transaction timeouts, but also with "random"
          connection failures leading to destroing of connections,
          both from application calls and pool calls.

          • 2. Re: [JBAS-5095] Race condition between connection.close() an

             

            "adrian@jboss.org" wrote:

            The fix is to replace isManagedConnectionFree() with wasManagedConnectionFreed().
            i.e. the managed connection might now be free, but it wasn't us that freed it
            => do it only once


            Of course, that's only a logical description of what it should do.
            In practice the code is more complicated.
            But the key part is to have some synchronized block that maintains/checks
            the free state.

            e.g. something like
            synchronized boolean isFreed(...)
            {
             boolean wasFree = isManagedConnectionFree();
             // change state, e.g. #handles or track-by-tx
             if (wasFree)
             return false; // This shouldn't really happen now all the state is changed atomically
             else
             return isManagedConnectionFree(); // We just freed it
            }
            
            if (isFreed(...))
             returnManagedConnection();
            


            • 3. Re: [JBAS-5095] Race condition between connection.close() an
              dmlloyd

              The best tool for "do exactly once" types of tasks that I've found is AtomicBoolean:

               private final AtmoicBoolean freed = new AtomicBoolean(false);
              
               // ...then later, in your idempotent free() method:
               if (! freed.getAndSet(true)) {
               doTheOneTimeTask();
               }
              


              Of course that's JDK5+ only, so it might not be suitable for things that have to run on older JDKs.

              • 4. Re: [JBAS-5095] Race condition between connection.close() an

                The "freed" we are talking about is not a boolean state.

                It's an event that depends upon a set becoming empty and a boolean being/becoming false.

                • 5. Re: [JBAS-5095] Race condition between connection.close() an
                  jesper.pedersen

                  Thank you for your feedback, Adrian.

                  I'll look into your isFreed() solution.

                  • 6. Re: [JBAS-5095] Race condition between connection.close() an
                    jesper.pedersen

                    I'm looking at TxConnectionManager::getManagedConnection().

                    Shouldn't there always be a check against if an active transaction exists ?

                    Currently we have:

                    Transaction tx = tm.getTransaction();
                    if (tx != null && TxUtils.isActive(tx) == false)
                     throw new ResourceException("...");
                    


                    But in case of a transaction hasn't been started tm.getTransaction() will return null.

                    So code should read:
                    Transaction tx = tm.getTransaction();
                    if (tx == null || TxUtils.isActive(tx) == false)
                     throw new ResourceException("...");
                    


                    The transaction will only be passed to BaseConnectionManager2 if track-by-tx is enabled in any case.

                    TxConnectionManager::checkTransactionActive() also doesn't check for a null value - the JavaDoc states that a RollbackException should be thrown if the transaction is not active.

                    Or am I overlooking something ?

                    • 7. Re: [JBAS-5095] Race condition between connection.close() an

                      It's ok if there's no transaction associated with the thread.
                      The MC just won't be enlisted (yet? - lazy enlist).

                      But it's wrong if there is a transaction and its not ACTIVE.

                      This check is only an attempt to "fail early" anyway,
                      it could always race with a tx rollback.

                      The real definitive check is done later during tx.enlist(XAResource);

                      • 8. Re: [JBAS-5095] Race condition between connection.close() an
                        jesper.pedersen

                        Ok.

                        I'll skip that check then in my test case for TxConnectionManager.

                        Thanks, Adrian !

                        • 9. Re: [JBAS-5095] Race condition between connection.close() an
                          jesper.pedersen

                          I have added a new version of the patch - using the isFreed() method in connectionClosed() and afterCompletion().

                          The patch won't let the AS startup because managed connections aren't being returned within the blocking timeout (30 sec).

                          So I'm investigating why this happens -- delist() and returnManagedConnection() is outside the synchronized call.

                          • 10. Re: [JBAS-5095] Race condition between connection.close() an
                            jesper.pedersen

                            I'm currently looking at this trace

                            10:42:57,675 INFO [TxConnectionManager] afterCompletion(4) isTrackByTx=true for org.jboss.resource.connectionmanager.TxConnectionManager$TxConnectionEventListener@6efac385[state=NORMAL mc=org.jboss.resource.adapter.jdbc.local.LocalManagedConnection@40f33dde handles=0 lastUse=1219826577640 permit=true trackByTx=true mcp=org.jboss.resource.connectionmanager.JBossManagedConnectionPool$PoolBySubject@2162db22 context=org.jboss.resource.connectionmanager.InternalManagedConnectionPool@6120a64d xaResource=org.jboss.resource.connectionmanager.TxConnectionManager$LocalXAResource@35f8a538 txSync=null]
                            
                            10:42:57,675 INFO [TxConnectionManager] returnManagedConnection=org.jboss.resource.connectionmanager.TxConnectionManager$TxConnectionEventListener@6efac385[state=NORMAL mc=org.jboss.resource.adapter.jdbc.local.LocalManagedConnection@40f33dde handles=0 lastUse=1219826577640 permit=true trackByTx=false mcp=org.jboss.resource.connectionmanager.JBossManagedConnectionPool$PoolBySubject@2162db22 context=org.jboss.resource.connectionmanager.InternalManagedConnectionPool@6120a64d xaResource=org.jboss.resource.connectionmanager.TxConnectionManager$LocalXAResource@35f8a538 txSync=null]
                            
                            10:42:57,701 INFO [TxConnectionManager] afterCompletion(3) isTrackByTx=true for org.jboss.resource.connectionmanager.TxConnectionManager$TxConnectionEventListener@6efac385[state=NORMAL mc=org.jboss.resource.adapter.jdbc.local.LocalManagedConnection@40f33dde handles=1 lastUse=1219826577675 permit=true trackByTx=true mcp=org.jboss.resource.connectionmanager.JBossManagedConnectionPool$PoolBySubject@2162db22 context=org.jboss.resource.connectionmanager.InternalManagedConnectionPool@6120a64d xaResource=org.jboss.resource.connectionmanager.TxConnectionManager$LocalXAResource@35f8a538 txSync=null]
                            


                            An afterCompletion() coming in wih status of COMITTED and handles=1 on the same managed connection as a previously afterCompletion() with status of ROLLBACKED and handles = 0.

                            I've updated the patch in the JIRA issue.