5 Replies Latest reply on Jul 28, 2008 1:22 PM by Adrian Brock

    JBAS-5801 - Locking in 2PC

    Adrian Brock Master

      WRT: https://jira.jboss.org/jira/browse/JBAS-5801

      It occurs to me that there might not actually be any need to do the locking
      in the resource adapter during 2PC, i.e. prepare/commit/rollback?

      What we really want to do is make sure that end() blocks
      at tx timeout until the current user operation has completed.

      Is that correct?

      e.g. The case would be:

      User: Connection.createSession(); // allocates and enlists connection
      JCA: XAResource.start();

      Then later something like a competing

      User: sendMessage();
      TM: tx timeout => XAResource.end() before XAResource.rollback()

      We basically don't want the sendMessage() to occur during or after
      the rollback because that could potentially lead to inconsistencies
      depending upon how the underlying jms determines what to do.

      i.e. All we actually need to do is make sure the end() and sendMessage()
      (or other user ops) aren't racing with other.

      If true that would make the fix for 5801 a lot simpler since we could just
      remove the unnecessary locking in 2PC callbacks.

        • 1. Re: JBAS-5801 - Locking in 2PC
          Jonathan Halliday Master

          > What we really want to do is make sure that end() blocks
          > at tx timeout until the current user operation has completed.

          I'd argue that's exactly what we want to avoid. One use of a tx timeout is to ensure apps don't get stuck due to e.g. slow db queries. If I issue e.g. a SELECT inside a tx and that tx has a timeout, I want it to timeout promptly, not block on the db operation. Likewise for sending or receiving messages. That's part of the reason JBossTS uses a background thread for timeouts. Sure some resource managers have explicit support for configuring timeouts on potentially long running operations, but it's not required.

          I'd say it's up to the resource manager's driver code to decide how to handle the situation. If it's got one thread doing a business logic operation in a tx when another thread is trying to terminate that tx, it can either block internally (I've seen the MS SQL driver do that, queuing the end to go over the same TCP/IP connection when the SELECT eventually returns) or e.g. throw an exception on the business logic thread.

          That said, one option may be to make this configurable at the JCA level. Allow e.g. -ds.xml files to contain an option for how the JCA should behave - delegate to the driver as I describe, block as you suggest, or even explicitly interrupt the business logic and throw an exception as some users seem to expect. The last one of course is risky, it may mess up the underlying driver.

          • 2. Re: JBAS-5801 - Locking in 2PC
            Adrian Brock Master

             

            "jhalliday" wrote:
            > What we really want to do is make sure that end() blocks
            > at tx timeout until the current user operation has completed.

            I'd argue that's exactly what we want to avoid. One use of a tx timeout is to ensure apps don't get stuck due to e.g. slow db queries. If I issue e.g. a SELECT inside a tx and that tx has a timeout, I want it to timeout promptly, not block on the db operation. Likewise for sending or receiving messages. That's part of the reason JBossTS uses a background thread for timeouts. Sure some resource managers have explicit support for configuring timeouts on potentially long running operations, but it's not required.


            It doesn't have to block indefinetly, the wait is configurable,
            <!-- Whether to use try locks in seconds
            
             The default is wait for a lock indefinitely
             e.g. 5 minutes
             <use-try-lock>300</use-try-lock>
            -->
            <!ELEMENT use-try-lock (#PCDATA)>
            


            but the important thing is that they
            don't go the wrong way around during normal operation otherwise
            you end up leaking the work into the "next/no transaction" without error, e.g.

            Thread 1: does sql executeUpdate();
            // Here Thread 1 hits (or more accurately is just about to hit)
            // the real jdbc connection, e.g. oracle code which we don't control
            Thread 2: tx timeout
            Thread 2: end() and rollback()
            Thread 1: continues with update but now outside the transaction we initially intended


            I'd say it's up to the resource manager's driver code to decide how to handle the situation. If it's got one thread doing a business logic operation in a tx when another thread is trying to terminate that tx, it can either block internally (I've seen the MS SQL driver do that, queuing the end to go over the same TCP/IP connection when the SELECT eventually returns) or e.g. throw an exception on the business logic thread.


            For jdbc the correct thing to do is statement.cancel()
            although that's not necessarily guaranteed to work depending upon the jdbc driver
            and only for calls to the driver are that statement executes.


            That said, one option may be to make this configurable at the JCA level. Allow e.g. -ds.xml files to contain an option for how the JCA should behave - delegate to the driver


            We can't do that see the example above. Not because the driver
            doesn't do its own locking (it probably does) but because it
            might not understand the required semantics, e.g.

            session.sendMessage(); // context switched before it actually does it
            xaresource.rollback();
            session,sendMessage() // really executes, but now outside the rolled back transaction


            as I describe, block as you suggest,
            or even explicitly interrupt the business logic and throw an exception as some users seem to expect. The last one of course is risky, it may mess up the underlying driver.


            That's the mechanism of the previous TM.
            In fact, what it did besides "interrupting the thread" was effectively
            destroy the connection (if the I/O was on the same thread and it was in really in I/O)
            meaning the connection got closed so you couldn't leak into the next/non
            transaction.

            Maybe that is an alternate option?

            i.e. whenever we have a tx timeout, you provide a callback to JCA
            and then we close the associated connections - disallowing their use
            which may not be in a well defined state.

            We could do this on a per connection-factory case depending upon
            how well we know the underlying thirdparty code handles asynch rollbacks
            concurrently with normal requests.

            But my criticism of this is that the connection.close() is still
            racing with the normal operation outside the intended transaction
            so it could still complete before we close the connection to "cancel it".

            Of course, if we closed the connections before the now unnecessary rollback()
            then this wouldn't be a problem. ;-)

            BACK ON TOPIC:
            You didn't answer my question (even though I think I already know the answer :-)
            which is that we only need to protect the end() invoked by the TM
            to avoid the race, not the XAResource commit/rollback.
            Which would mean the transaction interleaving problem of 5801 is easier to solve.

            • 3. Re: JBAS-5801 - Locking in 2PC
              Jonathan Halliday Master

              yes, blocking end() should eliminate the race, at the cost of the rollback no longer actually 'killing' a long running operation.

              My concern with actually interrupting the business logic operation is related to the way connections are multiplexed. You may have one or more logical connections on a single TCP/IP connection. Killing the network level one may therefore kill unrelated business operations in other threads. Depending how the multiplexing is implemented by the driver, interrupting a logical connection handle may mess up the multiplexing, with much the same undesirable side effects. Potentially you have to throw away the entire connection pool, not just a single logical connection if you want to be safe.

              Add to that the fact that the XA control messages probably flow over the same TCP/IP connection and you wind up with a problem. Sure, presumed rollback means that the resource manager should abort the tx when the TCP/IP connection drops, but some (db2?) don't. You need to keep the connection intact if you want to be sure it gets terminated cleanly, or rely on the crash recovery to tidy it up.

              • 4. Re: JBAS-5801 - Locking in 2PC
                Adrian Brock Master

                 

                "jhalliday" wrote:
                yes, blocking end() should eliminate the race, at the cost of the rollback no longer actually 'killing' a long running operation.


                Ok that's what I'm doing. Maybe I should change the try-lock to have
                a default value. Blocking forever is never a good idea.


                My concern with actually interrupting the business logic operation is related to the way connections are multiplexed. You may have one or more logical connections on a single TCP/IP connection. Killing the network level one may therefore kill unrelated business operations in other threads. Depending how the multiplexing is implemented by the driver, interrupting a logical connection handle may mess up the multiplexing, with much the same undesirable side effects. Potentially you have to throw away the entire connection pool, not just a single logical connection if you want to be safe.


                No , I'm talking about doing ManagedConnection.destroy()
                effectively invoke close() on all the physical connections
                that are part of the timed out transaction and where the connection-factory
                has been configured for this behaviour due to known or suspected
                broken/hanging drivers.

                Whether the underlying driver actually honours the request or is
                mulitplexing underneath is another question.
                It's just further option to try to recover from broken/blocking
                connections causing timeouts.

                Whether you also interrupt() the thread is a different issue,
                since in most cases a close() - MSSQL aside - should be allowed to proceed
                even with other concurrent requests.


                Add to that the fact that the XA control messages probably flow over the same TCP/IP connection and you wind up with a problem. Sure, presumed rollback means that the resource manager should abort the tx when the TCP/IP connection drops, but some (db2?) don't. You need to keep the connection intact if you want to be sure it gets terminated cleanly, or rely on the crash recovery to tidy it up.


                Yes, Oracle is the same for local transactions. If you crash the connection it rolls back
                when auto-commit=false. If you explicitly close it without committing,
                it commits anyway - so much for auto-commit=false :-(
                So you have to do a rollback() before closing(), just to be sure.

                • 5. Re: JBAS-5801 - Locking in 2PC
                  Adrian Brock Master

                   

                  "adrian@jboss.org" wrote:
                  "jhalliday" wrote:
                  yes, blocking end() should eliminate the race, at the cost of the rollback no longer actually 'killing' a long running operation.


                  Ok that's what I'm doing. Maybe I should change the try-lock to have
                  a default value. Blocking forever is never a good idea.


                  https://jira.jboss.org/jira/browse/JBAS-5805