As proposed by Tom in [JBTM-1951] Deadlock in JBOSS Transaction Reaper - JBoss Issue Tracker I hereby create this discussion for analysis of the deadlock situation in jbossts 4.16.6.
Last two comments from bug pasted here for context:
Thanks for your comment to this (already closed) issue.
Unfortunately I am not able to check this on Wildfly, since this occurs in a productive product, which
a) needs a solution
b) cannot be migrated to Wildfly atm.
I will further investigate, due to a) and try to find a solution based on the latest 1.0 version (1.0.21.Final atm).
Will post it here, when anything is found.
Thanks Rico, I can give you some pointers. Transaction time out is processed in the reaper. To rollback a transaction (either reaped or normal) we need to lock the transaction. To rollback each resource, ironjacamar needs to lock the resource. For some reason Thread-503 has locked the resource before trying to rollback the transaction, hence the lock orders are reversed and the dead lock has happened.
What you are looking for is to understand why Thread-503 has needed to hold a lock on the resource before calling rollback on the transaction. I have looked through 1.0.21 and 1.0.9 and can't really see a scenario where it would have held the lock.
You could take a look in Jira and see if any issues might have been fixed for IronJacamar in this area:
These ones look interesting:
I think this one although its in my search indicates that maybe the version ordering in the IJ project has some overlap on 1.1 and 1.0 version numbers, it sounded interesting as it looks to be trying to prevent this:
As Jesper mentions above, please do raise a discussion in the forums moving forward.
Unfortunately, my plan to use the latest 1.0.x version (1.0.21.Final) for analysis is doomed, because the public API changed (contrary to what I expected from a minor version change), so this would have at least needed aoptions in jboss-as-connector as well, which produces a lot of work.
Alternative plan now is to use 1.0.9 and backport some of the fixes Tom pointed to, then retest and enter deadlock analysis. Fortunately we have a system set-up where this race-condition can be observed regularly.