5 Replies Latest reply on Aug 25, 2017 5:03 AM by dode

Broken connections in JMS pool?

dode May 24, 2017 5:53 AM

In a WildFly 8.2.1-Final high-volume installation we are experiencing degrading performance within about a week until restart is necessary along with more and more frequently occurring failures to send JMS messages because of transactions in state ABORT_ONLY:

Caused by: javax.ejb.EJBException: javax.jms.JMSException: Could not create a session: IJ000457: Unchecked throwable in managedConnectionReconnected() Caused by: javax.resource.ResourceException: IJ000457: Unchecked throwable in managedConnectionReconnected() cl=org.jboss.jca.core.connectionmanager.listener.TxConnectionListener@3915e409[state=NORMAL managed connection=org.hornetq.ra.HornetQRAManagedConnection@7d6bcd06 connection handles=0 lastUse=1494579251303 trackByTx=false pool=org.jboss.jca.core.connectionmanager.pool.strategy.PoolByCri@691085ec mcp=SemaphoreArrayListManagedConnectionPool@66a127f5[pool=HornetQConnectionDefinition] xaResource=XAResourceWrapperImpl@56a9bb4[xaResource=org.hornetq.ra.HornetQRAXAResource@4ca920eb pad=false overrideRmValue=null productName=HornetQ productVersion=2.0 jndiName=java:/JmsXA] txSync=null] at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.reconnectManagedConnection(AbstractConnectionManager.java:780) at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.allocateConnection(AbstractConnectionManager.java:518) at org.hornetq.ra.HornetQRASessionFactoryImpl.allocateConnection(HornetQRASessionFactoryImpl.java:948) ... 261 more Caused by: javax.resource.ResourceException: IJ000461: Could not enlist in transaction on entering meta-aware object at org.jboss.jca.core.connectionmanager.tx.TxConnectionManagerImpl.managedConnectionReconnected(TxConnectionManagerImpl.java:551) at org.jboss.jca.core.connectionmanager.AbstractConnectionManager.reconnectManagedConnection(AbstractConnectionManager.java:775) ... 263 more Caused by: javax.transaction.SystemException: IJ000356: Failed to enlist: java.lang.Throwable: Unabled to enlist resource, see the previous warnings. tx=TransactionImple < ac, BasicAction: 0:ffff0a0001c3:682851c6:5900737b:dd09939 status: ActionStatus.ABORT_ONLY > at org.jboss.jca.core.connectionmanager.listener.TxConnectionListener$TransactionSynchronization.checkEnlisted(TxConnectionListener.java:848) at org.jboss.jca.core.connectionmanager.listener.TxConnectionListener.enlist(TxConnectionListener.java:383) at org.jboss.jca.core.connectionmanager.tx.TxConnectionManagerImpl.managedConnectionReconnected(TxConnectionManagerImpl.java:544) ... 264 more

We are sending the messages from an EJB with CMT using the pooled connection factory with transaction mode XA and there is also an Oracle XA database connection involved in the transaction. We are taking care to close all resources such as the connection, the session and the producer in a finally block including null checks and it does not look like we have a resource leak.

Also, for the time being, we have no idea why the transaction has status ABORT_ONLY. We don't set it to rollback only and, at least not before the error occurs and in the same thread, there is nothing in the log about an EJBTransactionRolledbackException or any other exception being thrown that would cause the transaction to be rolled back or marked for rollback only.

While the error occurred thousands of times in the log, it seems there are all in all only 7 instances of HornetQRAManagedConnection involved:

connection=org.hornetq.ra.HornetQRAManagedConnection@15848f2cconnection=org.hornetq.ra.HornetQRAManagedConnection@36f6dd60
connection=org.hornetq.ra.HornetQRAManagedConnection@3bd394fc
connection=org.hornetq.ra.HornetQRAManagedConnection@533c952a
connection=org.hornetq.ra.HornetQRAManagedConnection@658df2e8
connection=org.hornetq.ra.HornetQRAManagedConnection@6ea7af25
connection=org.hornetq.ra.HornetQRAManagedConnection@7d6bcd06
 
Is it possible that over time, more and more connections somehow "break" in regards to the transaction and keep on getting handed out by the pool? This maybe fits the fact that the error occurs more and more frequently over time.
 
If yes, is there a way to purge one or all connections from the pool so it fills itself with "brand new" connections?
 
We have tried hard to reproduce the error in many ways without success so far. By marking the transaction for rollback only before trying to send the message, we get a somewhat similar error in the log, but it happens in TxConnectionManagerImpl.getManagedConnection, not in TxConnectionManagerImpl.managedConnectionReconnected as in the real error, which maybe is a significant difference.

1. Re: Broken connections in JMS pool?

dode May 31, 2017 6:44 AM (in response to dode)

We finally realized we should set the log level to WARN to see warnings about transactions, and the cause for the transaction to be in status ABORT_ONLY is a previously logged warning:

2017-05-30 11:33:15,977 WARN [com.arjuna.ats.jta] (Thread-123) ARJUNA016086: TransactionImple.enlistResource setTransactionTimeout on XAResource < formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffff0a0001c3:-7a698606:591fe1ff:82cea51, node_name=1, branch_uid=0:ffff0a0001c3:-7a698606:591fe1ff:830e516, subordinatenodename=null, eis_name=java:/JmsXA > threw: XAException.XAER_RMERR: javax.transaction.xa.XAException Caused by: HornetQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT message=HQ119014: Timed out waiting for response when sending packet 51] at org.hornetq.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:390) at org.hornetq.core.client.impl.ClientSessionImpl.start(ClientSessionImpl.java:1901)

Looking for HQ119014: Timed out waiting for response when sending packet there are some posts suggesting network problems but since we are sending the messages locally and the in-VM connection is being used I don't think this is the issue. Other posts suggest that, since we are sending persistent messages, the disk is not fast enough to write the messaging journal.

Since the timeouts are the cause it is certainly not the case that there are "broken" connections in the pool.
Actions
2. Re: Broken connections in JMS pool?

andey May 31, 2017 10:25 AM (in response to dode)

If you use JmsXA Jboss will automatically enlist the call into ongoing JTA transaction. The error seems to be occurring while enlisting the resource in the JTA transaction. Check whether if the your application classes are part of JTA transaction? Also could you try the behaviour by using "java:/ConnectionFactory" instead of "java:/JmsXA" is the call is supposed to be non-transacted?
Actions
3. Re: Broken connections in JMS pool?

dode Jun 1, 2017 7:20 AM (in response to andey)

Using "java:/ConnectionFactory" doesn't work since these connections don't participate in the transaction started by the container. We are sending our messages from an EJB with CMT.

Anyway, thanks to the WARN log level we know now that the transaction is in state ABORT_ONLY because of the XAException caused by "HQ119014: Timed out waiting for response when sending packet xx" and we are currently trying to find out why the connection timeouts occur.
Actions
4. Re: Broken connections in JMS pool?

dode Jun 6, 2017 11:03 AM (in response to dode)

The packet types where the timeout occurs are 51 (SESS_XA_START) and 63 (SESS_XA_SET_TIMEOUT) so it doesn't seem to be a problem to actually send the message but starting the session?

This issue here sounds very much alike: https://developer.jboss.org/thread/267438
Actions
5. Re: Broken connections in JMS pool?

dode Aug 25, 2017 5:03 AM (in response to dode)

Following the suspicion that connections somehow break and stay in the pool, being reused and causing consecutive errors, we tried to make sure these broken connections are removed from the pool by inserting a call to cl.getManagedConnection().destroy() in line 799 in AbstractConnectionManager#reconnectManagedConnection(ConnectionListener) from ironjacamar-core-impl 1.1.9, replaced the JAR and let the system run for some weeks now.

It is running stable so far, does not slow down or even hang.

Some logging added to the same method logs the pool statistics and we can see that during this time 100 connections were destroyed but more important, in each of the initial error an individual connection is involved, so these broken connections are not reused any more.

The initial error causing the connection to break in the first place is still unclear. From the logging it seems that the same JMS connection is used by two threads at the same time, messing up the communication causing the timeouts and eventually leaving the connection in a broken state. We double-checked to make sure that we create a new JMS connection and correctly close it each time we send a message.

Another guess is that this is an issue in Ironjacamar 1.1.9 and was fixed long time ago, and does not occur in later releases of WildFly.
Actions

Go to original post