5 Replies Latest reply on Jul 28, 2008 1:22 PM by adrian.brock

JBAS-5801 - Locking in 2PC

adrian.brock Jul 28, 2008 6:28 AM

WRT: https://jira.jboss.org/jira/browse/JBAS-5801

It occurs to me that there might not actually be any need to do the locking
in the resource adapter during 2PC, i.e. prepare/commit/rollback?

What we really want to do is make sure that end() blocks
at tx timeout until the current user operation has completed.

Is that correct?

e.g. The case would be:

User: Connection.createSession(); // allocates and enlists connection
JCA: XAResource.start();

Then later something like a competing

User: sendMessage();
TM: tx timeout => XAResource.end() before XAResource.rollback()

We basically don't want the sendMessage() to occur during or after
the rollback because that could potentially lead to inconsistencies
depending upon how the underlying jms determines what to do.

i.e. All we actually need to do is make sure the end() and sendMessage()
(or other user ops) aren't racing with other.

If true that would make the fix for 5801 a lot simpler since we could just
remove the unnecessary locking in 2PC callbacks.

1. Re: JBAS-5801 - Locking in 2PC

jhalliday Jul 28, 2008 6:42 AM (in response to adrian.brock)

> What we really want to do is make sure that end() blocks
> at tx timeout until the current user operation has completed.

I'd argue that's exactly what we want to avoid. One use of a tx timeout is to ensure apps don't get stuck due to e.g. slow db queries. If I issue e.g. a SELECT inside a tx and that tx has a timeout, I want it to timeout promptly, not block on the db operation. Likewise for sending or receiving messages. That's part of the reason JBossTS uses a background thread for timeouts. Sure some resource managers have explicit support for configuring timeouts on potentially long running operations, but it's not required.

I'd say it's up to the resource manager's driver code to decide how to handle the situation. If it's got one thread doing a business logic operation in a tx when another thread is trying to terminate that tx, it can either block internally (I've seen the MS SQL driver do that, queuing the end to go over the same TCP/IP connection when the SELECT eventually returns) or e.g. throw an exception on the business logic thread.

That said, one option may be to make this configurable at the JCA level. Allow e.g. -ds.xml files to contain an option for how the JCA should behave - delegate to the driver as I describe, block as you suggest, or even explicitly interrupt the business logic and throw an exception as some users seem to expect. The last one of course is risky, it may mess up the underlying driver.
Actions
2. Re: JBAS-5801 - Locking in 2PC

adrian.brock Jul 28, 2008 7:18 AM (in response to adrian.brock)
"jhalliday" wrote:
> What we really want to do is make sure that end() blocks
> at tx timeout until the current user operation has completed.

I'd argue that's exactly what we want to avoid. One use of a tx timeout is to ensure apps don't get stuck due to e.g. slow db queries. If I issue e.g. a SELECT inside a tx and that tx has a timeout, I want it to timeout promptly, not block on the db operation. Likewise for sending or receiving messages. That's part of the reason JBossTS uses a background thread for timeouts. Sure some resource managers have explicit support for configuring timeouts on potentially long running operations, but it's not required.

It doesn't have to block indefinetly, the wait is configurable,
 <!ELEMENT use-try-lock (#PCDATA)>

but the important thing is that they
don't go the wrong way around during normal operation otherwise
you end up leaking the work into the "next/no transaction" without error, e.g.

Thread 1: does sql executeUpdate();
// Here Thread 1 hits (or more accurately is just about to hit)
// the real jdbc connection, e.g. oracle code which we don't control
Thread 2: tx timeout
Thread 2: end() and rollback()
Thread 1: continues with update but now outside the transaction we initially intended

I'd say it's up to the resource manager's driver code to decide how to handle the situation. If it's got one thread doing a business logic operation in a tx when another thread is trying to terminate that tx, it can either block internally (I've seen the MS SQL driver do that, queuing the end to go over the same TCP/IP connection when the SELECT eventually returns) or e.g. throw an exception on the business logic thread.

For jdbc the correct thing to do is statement.cancel()
although that's not necessarily guaranteed to work depending upon the jdbc driver
and only for calls to the driver are that statement executes.

That said, one option may be to make this configurable at the JCA level. Allow e.g. -ds.xml files to contain an option for how the JCA should behave - delegate to the driver

We can't do that see the example above. Not because the driver
doesn't do its own locking (it probably does) but because it
might not understand the required semantics, e.g.

session.sendMessage(); // context switched before it actually does it
xaresource.rollback();
session,sendMessage() // really executes, but now outside the rolled back transaction

as I describe, block as you suggest,
or even explicitly interrupt the business logic and throw an exception as some users seem to expect. The last one of course is risky, it may mess up the underlying driver.

That's the mechanism of the previous TM.
In fact, what it did besides "interrupting the thread" was effectively
destroy the connection (if the I/O was on the same thread and it was in really in I/O)
meaning the connection got closed so you couldn't leak into the next/non
transaction.

Maybe that is an alternate option?

i.e. whenever we have a tx timeout, you provide a callback to JCA
and then we close the associated connections - disallowing their use
which may not be in a well defined state.

We could do this on a per connection-factory case depending upon
how well we know the underlying thirdparty code handles asynch rollbacks
concurrently with normal requests.

But my criticism of this is that the connection.close() is still
racing with the normal operation outside the intended transaction
so it could still complete before we close the connection to "cancel it".

Of course, if we closed the connections before the now unnecessary rollback()
then this wouldn't be a problem. ;-)

BACK ON TOPIC:
You didn't answer my question (even though I think I already know the answer :-)
which is that we only need to protect the end() invoked by the TM
to avoid the race, not the XAResource commit/rollback.
Which would mean the transaction interleaving problem of 5801 is easier to solve.
Actions
3. Re: JBAS-5801 - Locking in 2PC

jhalliday Jul 28, 2008 12:09 PM (in response to adrian.brock)

yes, blocking end() should eliminate the race, at the cost of the rollback no longer actually 'killing' a long running operation.

My concern with actually interrupting the business logic operation is related to the way connections are multiplexed. You may have one or more logical connections on a single TCP/IP connection. Killing the network level one may therefore kill unrelated business operations in other threads. Depending how the multiplexing is implemented by the driver, interrupting a logical connection handle may mess up the multiplexing, with much the same undesirable side effects. Potentially you have to throw away the entire connection pool, not just a single logical connection if you want to be safe.

Add to that the fact that the XA control messages probably flow over the same TCP/IP connection and you wind up with a problem. Sure, presumed rollback means that the resource manager should abort the tx when the TCP/IP connection drops, but some (db2?) don't. You need to keep the connection intact if you want to be sure it gets terminated cleanly, or rely on the crash recovery to tidy it up.
Actions
4. Re: JBAS-5801 - Locking in 2PC

adrian.brock Jul 28, 2008 12:51 PM (in response to adrian.brock)

"jhalliday" wrote:
yes, blocking end() should eliminate the race, at the cost of the rollback no longer actually 'killing' a long running operation.

Ok that's what I'm doing. Maybe I should change the try-lock to have
a default value. Blocking forever is never a good idea.

My concern with actually interrupting the business logic operation is related to the way connections are multiplexed. You may have one or more logical connections on a single TCP/IP connection. Killing the network level one may therefore kill unrelated business operations in other threads. Depending how the multiplexing is implemented by the driver, interrupting a logical connection handle may mess up the multiplexing, with much the same undesirable side effects. Potentially you have to throw away the entire connection pool, not just a single logical connection if you want to be safe.

No , I'm talking about doing ManagedConnection.destroy()
effectively invoke close() on all the physical connections
that are part of the timed out transaction and where the connection-factory
has been configured for this behaviour due to known or suspected
broken/hanging drivers.

Whether the underlying driver actually honours the request or is
mulitplexing underneath is another question.
It's just further option to try to recover from broken/blocking
connections causing timeouts.

Whether you also interrupt() the thread is a different issue,
since in most cases a close() - MSSQL aside - should be allowed to proceed
even with other concurrent requests.

Add to that the fact that the XA control messages probably flow over the same TCP/IP connection and you wind up with a problem. Sure, presumed rollback means that the resource manager should abort the tx when the TCP/IP connection drops, but some (db2?) don't. You need to keep the connection intact if you want to be sure it gets terminated cleanly, or rely on the crash recovery to tidy it up.

Yes, Oracle is the same for local transactions. If you crash the connection it rolls back
when auto-commit=false. If you explicitly close it without committing,
it commits anyway - so much for auto-commit=false :-(
So you have to do a rollback() before closing(), just to be sure.
Actions
5. Re: JBAS-5801 - Locking in 2PC

adrian.brock Jul 28, 2008 1:22 PM (in response to adrian.brock)

"adrian@jboss.org" wrote:
"jhalliday" wrote:
yes, blocking end() should eliminate the race, at the cost of the rollback no longer actually 'killing' a long running operation.

Ok that's what I'm doing. Maybe I should change the try-lock to have
a default value. Blocking forever is never a good idea.

https://jira.jboss.org/jira/browse/JBAS-5805
Actions

Go to original post