My development team is working on an EJB-based application which is fairly large (~30 EJB's including entity, session, and message-driven). The application has been split into 2 archives. 1 archive contains common functionality, and it is depended upon by the 2nd archive. The application is used via a web interface that is hosted by Resin 2.0.1 on a Windows 2000 machine using Sun's 1.4.0 JVM. Multiple Windows 2000 SQLServer databases are being used through non-XA configurations.
When there is only 1 user, the application functions as expected. However, when there are multiple users, the application inevitably hangs. For nearly a month, we have been debugging this problem. We've tried JBoss 2.4.4 on both Windows 2000 and Linux using Sun's 1.4.0 JVM. We've also tried JBoss 3.0.2 on Windows 2000 using Sun's 1.4.0 JVM. We have thoroughly scoured the JBoss forums looking for answers. On a whim, we changed the transaction attributes of all of the EJB's to RequiresNew. For some reason, this appears to fix the problem. However, we don't know why this fixes the problem; and we feel that we really need to understand why. We would appreciate any insight. More information on the nature of the problem follows.
We are certain that the hanging is not caused by the database(s) for a number of reasons:
(1) We can cause the application to hang when performing operations that only perform reads.
(2) SQLServer detects deadlocks and resolves them by forcibly killing one the transactions (designated as the victim). So we are certain that we do not have a database deadlock.
(3) Often, when debugging this problem, we have found a blocked database connection that is holding locks on various resources. (This causes other connections to block as well.) When looking at the process monitor for the database, there is no apparent reason for the connection to block. In other words, there are no database deadlock conditions evident from the information given by the monitor.
Also, in dealing with the databases, we've made sure that we don't keep any member or class variables that hold javax.sql.DataSource or java.sql.Connection instances.
As an important note, we know that our application will result in database deadlocks if we use an XA configuration to access the databases.
We've taken thread dumps of the JVM, hooked OptimizeIt's Thread Debugger up to the JVM, and walked through the execution of the code in real-time using Together Control Center's JPDA debugger. When our application hangs, OptimizeIt indicates that the threads representing connections from Resin are not blocking on any monitors held by other threads. Rather, all of the Resin threads are waiting on monitors that need to be notified. OptimizeIt and the thread dumps both indicate that the threads are waiting on various instances of org.jboss.ejb.plugins.lock.QueuedPessimisticEJBLock$TxLock. (On occasion, a Resin thread will be waiting on IO operations against the database. We simply interpret this as being a result of one of the blocked connections as discussed previously. So the thread blocking on IO is simply because another thread that is not blocking on IO still has locks on database resources.)
While the code is rather convoluted, we are quite certain that it is not reentrant. Before switching the transaction attributes to RequiresNew, we were using Required for session beans and Mandatory for entity beans.
We've also checked that reverse DNS lookup is working correctly between all of the machines involved.