We are using the JBossCache v.1.4.1.SP3. Recently we started experiencing a locking problem: lots of TimeoutExceptions: read/write lock for ... could not be acquired after a timeout. After analyzing the source coude we have noticed the following:
In the UnlockInterceptor there is a trace output:
if (trace) log.trace("Attempting to release locks on current thread. Lock table is " + lock_table);
The trace variable (log.isTraceEnabled()) is true when you use the log4j logging and configure your thresholds on appenders level (our case). So, the concatenation in the above code line does happen although the console shows no output.
The concatenation causes the toString() method to be called on all IdentityLock objects that are contained in the lock_table. The toString() logic there is quite heavy: it gets all reader owners (there can be lots of them), adds them one by one to an ArrayList, and then adds this list into a StringBuffer which in turn causes the toString() to be invoked on every reader. So this takes time.
Now, all of this happens before releasing a lock, so if another thread tries to acquire a lock, it'll have to wait until the current thread finishes its concatenation. And there is a snowball effect: the more threads are performing this, the less chances each individual thread (including the write lock owner) has to release its lock. And since the logic is in the finally block, even those threads that were unsuccessful in obtaining a lock, have to do this concatenation before exiting which provokes more and more threads to be slower and slower passing this bottleneck.
We've performed a little test: 300 threads, 1 node, each thread randomly puts or reads the data from this node 30 times. Without a concatenation (when the overall debug level is set to INFO) this finishes in less than a minute and all operations are successful. When the debug level is set to DEBUG, the test can be running several hours or even produce an out of memory error, and almost all of the read/write attempts fail because of the TimeoutException.
So we suggest to remove the above debug code from the future versions of the cache. Also, maybe it's a good idea to remove the heavy code from the toString() method to prevent such situations from occuring.