I've figured out the root cause and it appears to all be JPA/Hibernate related.
Here is what is happening:
I have an entity called "Execution". The entity is queried for in the parent transaction (T1).
The Id is passed to the "assignExecutionToAgent" method which initiates T2 using REQUIRES_NEW.
Before doing so, hibernate executes a flush. The entity has a collection. Hibernate incorrectly detects that this collection is dirty due to this bug:
This results in an update to the entity row in oracle.
Now in T2, I am using optimistic locking. Under JBoss 5.1.0.GA, the optimistic locking is correctly implemented without using SELECT FOR UPDATE.
But under JBoss 5.1 EAP, for whatever reason, they use SELECT FOR UPDATE on the row. Because this row has been updated in T1, things hang at the Oracle level, where Oracle is waiting for T1 to complete.
Eventually, JBoss rolls back T1 for timeout, oracle allows T2 to continue and viola.
So two issues really:
1. Hibernate incorrectly detects a dirty collection.
2. Hibernate incorrectly uses SELECT FOR UPDATE for optimistic locking. Maybe there is a good reason, but I don't think this is expected behavior.
Does anyone have any idea if: a) there is a workaround for 1; and b) why hibernate would use SELECT FOR UPDATE for optimistic locking?
Ok, digging deeper, I'm not sure the collections are being marked as dirty--in fact, I'm almost certain they are not.
I registered my own hibernate listener to dump out debug data during the onFlushDirty event. In my debug code, I dump the old and new property. What I see is one of my updateTime Timestamp fields being updated:
2010-05-07 18:26:08,287 INFO [STDOUT] (http-0.0.0.0-9490-1) XDEBUG: Property updateTime was 2010-05-07 18:25:39.633541, is 2010-05-07 18:25:39.63791
I am definitely not updating this field in my code (it was updated using an @PreUpdate, but I commented that out).
Perhaps more detail will help. I have the following class hierarchy:
AbstractAuditEntity -- has updateTime and updateUser properties.
+--Execution -- derives from AbstractAuditEntity and is a parent class for various execution types.
+--RCEExecution -- derives from Execution.
Now, where we may have a problem is that we repeat the updateTime and updateUser fields in the RCEExecution entity. This is because we have a requirement to propogate these values to all the relational tables. Looking at my log entries, the onFlushDirty event is passed two updateTime properties, the second appears to be correct:
2010-05-07 18:26:08,289 INFO [STDOUT] (http-0.0.0.0-9490-1) XDEBUG: Property updateTime was 2010-05-07 18:25:39.63791, is 2010-05-07 18:25:39.63791
Given that the RCEExecution getter and setter just calls the super methods, I wonder how this could be? Where is the old value for the first iteration coming from?
Perhaps the code responsible for storing off the old values has a bug cloning the Date object correctly?
Any help is greatly appreciated.
The problem is indeed my lame attempt to propagate the audit fields to the RCEExecution table by reusing the values from the parent table. This, of course, was wrong. Essentially, hibernate was calling the setUpdateTime on the AbstractAuditEntity twice because I failed to override the setter and track the update time separately for the RCE Execution.
I'm not sure why the values would be different in my controlled test case, but I was obviously missing something. And I accept that they can and will be different in real world usage. So it's not correct for the child table (RCEExecution) to reuse the parent table's audit fields (parent being Execution, which gets the fields from AbstractAuditEntity). They are two different sets of values and needed to be kept as such.
So in the end, hibernate did it's dirty checking correctly, but couldn't account for my mistakes. The differences between JBoss 5.1.0.GA and 5.0.0 EAP in the optimistic locking is still concerning, but works none the less.
Hope this helps someone.