First off, we're running JBoss 3.2.6, Oracle 9i, and PostgreSQL 7.4.7 on RHEL3 in production and staging, the environment having the following probem...
JBoss is occasioally failing to connect to the Oracle servers. Setting the kernel tcp_keepalive_time to < firewall timeout was recently done to fix our connectivity issues to PostgreSQL, but caused the connectivity issues to Oracle to appear. The exact Oracle error is:
Could not execute query|java.sql.SQLException: Io exception: Connection timed out
If the rest of the very long error message might help, I'd be glad to add it to a followup post.
That said, since setting the tcp_keepalive_time set to 10 minutes, we have been seeing these errors, and they are accompanied by a rather strange behaviour in the jmx console. The PostgreSQL pooled connections seem to be working fine, but the Oracle ones aren't. Our maxsize is set to 15 in the Oracle-ds.xml file. The jmx-console view ManagedConnectionPool for the Oracle db reflects this, and shows 4 in use. However, it shows 26 connections as available! So I'm wondering if there's some issue popping up where JBoss thinks it has more Oracle connections available than it really does, and eventually runs out.
I'm in no way a JBoss or Java expert, more a system administrator and database admin. But, since this is a problem with connecting to the database, I'm assigned the job of hunting it down.
Now, the odd thing here is that when we set the tcp_keepalive_time to >1 hour, the time out of the firewall, the Oracle problem goes away, and PostgreSQL starts throwing errors because it's not very forgiving of dropped connections.
Thanks in advance for any ideas on this, and if it's a bug fixed in 3.2.7, I apologize in advance. I went through the bug database and didn't find anything related to this.
I found the answer here. We have NoTxSeparatePools: set to true for the oracle connections, so this is normal operation... sigh.