Recently, after an update of Jboss from version 4.2.2 to 4.2.3 we are experiencing some problems. Our application suddenly stopped responding. In the logging we found the following lines:
23:43:27,179 WARN [com.arjuna.ats.arjuna.logging.arjLoggerI18N] [com.arjuna.ats.arjuna.coordinator.BasicAction_58] - Abort of action id a780806:1159:4b45c04c:d5063c invoked while multiple threads active within it.
23:43:27,179 WARN [com.arjuna.ats.arjuna.logging.arjLoggerI18N] [com.arjuna.ats.arjuna.coordinator.CheckedAction_2] - CheckedAction::check - atomic action a780806:1159:4b45c04c:d5063c aborting with 1 threads active!
And between 5 and 30 minutes later all database connections were in use. We discovered that we did not have a query timeout set and some query was blocking a table int he database. This meant the transaction was marked for rollback, but it never completed the query. Meanwhile, other threads kept locking until the server ran out of database connections. We set the query timeout to be the maximum remaining time of the transaction and updated the application with the problem. In our test this solved our problem and now the application logs a proper stacktrace when we reproduce this problem. We updated our application and hoped it would solve the problem. We also added logging of a full thread dump whenever the problem happens.
However, it did not solve our problem. Now, the full thread dump shows the following stack trace many times:
... remainder of stacktrace omitted - not relevant ...
This happens many many times. This means all threads wait in a lock that does not seem toever be released. I do not see 'Transaction is not active' exceptions from Arjuna. Because of the many many locks, this is a problem because after a while, all threads from the thread pool are in use and the applications stops responding as well. This means we simply changed the problem from database connections to threads. There are no deadlocks detected.
It seems as though we have a locking problem, but it is very hard to detect why - at least we know our CardBean is the object being locked (the PresenceLogBean object is a newly created object, the error occurs in the ejbPostCreate step). This bean however already has all its read-only methods marked as read-only and locking does not occur often. Also i have no easy way to monitor which thread locks which object. It also seems that after the transaction timeout the lock is not released, or perhaps a new lock is created before the old one is released.
So my questions are:
1. Is it possible that, although we have a 5 minute transaction timeout, that the transactions keep running in this case?
2. Is it possible to log information about which objects are being locked by which thread?
3. Have any of you seen similar behaviour and do you know of a possible cause or solution?
Our configuration in short
EJB 2.0 with CMP (and EJB 3.0/JPA in newer parts, this does not seem to have this problem so far)
Bean configuration is standard QueuedPessimisticLock with Commit option A, transaction timeout of 5 minutes