We are getting an intermittent exception when our system processes a large number of JBossMQ messages in parallel. Whenever we get this exception, one of our Message Driven Beans becomes deadlocked, and never processes any more messages.
Our scenario is as follows (JBoss 4.2.3.GA, EJB3):
- We have 5 separate MDBs processing messages from 5 different queues simultaneously.
- Each MDB is a singleton on its queue (i.e. maxSession = 1).
- Each MDB has a large backlog of messages to process, so they are all processing at once.
- During processing of a message, each MDB calls a SLSB to do some work which updates the database. This SLSB will then in-turn call another SLSB which sends a messages to a JBossMQ topic (e.g. notifying observers that something has changed in the database).
This will work fine for a short while, but after a few seconds (maybe 50-100 messages processed) we always get the following exception:
2008-10-31 11:47:50,440 563340 ERROR [org.jboss.resource.adapter.jms.inflow.JmsServerSession] (WorkManager(4)-124:) org.jboss.resource.adapter.jms.inflow.JmsServerSession@af2931 failed to commit/rollback
org.jboss.tm.JBossRollbackException: Unable to commit, tx=TransactionImpl:XidImpl[FormatId=257, GlobalId=vieo-ws01/7090, BranchQual=, localId=7090] status=STATUS_NO_TRANSACTION; - nested throwable: (java.lang.IllegalMonitorStateExcepti
Caused by: java.lang.IllegalMonitorStateException
... 7 more
At this point, the affected MDB becomes deadlocked, and fails to process any more messages. If we leave the system running, then eventually more MDBs fail with the same exception.
1. If we only run ONE MDB, then we never get this exception! However as soon as we introduce even a second MDB, we start seeing these failures again.
2. The deadlock point seems to be when the code tries to return from the first SLSB call - i.e. execution never returns to the MDB onMessage() method context.
3. JBossMQ and the SLSB EntityManager use the same datasource (Postgres 8.2 as a local-tx-datasource).
4. This happens both if we use persistent and non-persistent messages.
5. If we mark the second nested SLSB (the one that sends a message to the JMS Topic) with TransactionAttributeType.NOT_SUPPORTED then we don't get this exception! This is not a viable solution however, as we need any messages sent during processing to be rolled back if an exception is thrown.
My guess is that this problem has something to do with the SLSB committing its transaction (at which point the nested JMS messages would also need to be sent?). As it is unpredictably intermittent and also works when single threaded, I think there must be some thread race-condition in the JMS locking mechanism.
Any help here would be greatly appreciated!