Server hanging with ARJUNA012378: ReaperElement appears to be wedged
ianmacintyre Jul 18, 2017 5:08 AMHi folks,
We are having a strange issue with Transaction Reaper threads becoming wedged, which appears to lock up the entire server. A restart is then needed, however it only happens on some machines - not all. This is running Wildfly 10.0.
The initial symptom is a transaction timeout which the reaper attempts to abort. From the stacktrace we see it is happening in our code - waiting on the db to return (or a call to an external system). This part I can understand.
Then, within 2s I see a warning the Reaper element is wedged. The stack trace at that point seems to point to JBoss logging
2017-07-14 08:10:22,602 WARN [com.arjuna.ats.arjuna] (Transaction Reaper) ARJUNA012378: ReaperElement appears to be wedged: java.security.AccessController.doPrivileged(Native Method)
org.jboss.logmanager.formatters.Formatters$14.renderRaw(Formatters.java:638)
org.jboss.logmanager.formatters.Formatters$JustifyingFormatStep.render(Formatters.java:225)
org.jboss.logmanager.formatters.MultistepFormatter.format(MultistepFormatter.java:83)
org.jboss.logmanager.ExtFormatter.format(ExtFormatter.java:32)
org.jboss.logmanager.handlers.WriterHandler.doPublish(WriterHandler.java:46)
org.jboss.logmanager.ExtHandler.publish(ExtHandler.java:76)
org.jboss.logmanager.LoggerNode.publish(LoggerNode.java:314)
org.jboss.logmanager.LoggerNode.publish(LoggerNode.java:322)
org.jboss.logmanager.LoggerNode.publish(LoggerNode.java:322)
org.jboss.logmanager.LoggerNode.publish(LoggerNode.java:322)
org.jboss.logmanager.LoggerNode.publish(LoggerNode.java:322)
org.jboss.logmanager.Logger.logRaw(Logger.java:850)
org.jboss.logmanager.Logger.log(Logger.java:802)
org.jboss.logging.JBossLogManagerLogger.doLog(JBossLogManagerLogger.java:44)
org.jboss.logging.Logger.logv(Logger.java:2256)
com.arjuna.ats.arjuna.logging.arjunaI18NLogger_$logger.warn_coordinator_CheckedAction_2(arjunaI18NLogger_$logger.java:775)
...more
(the above happens perhaps about 90% of the time - however sometimes the stack trace doesnt seem to reflect anything)
2017-07-14 08:08:15,779 WARN [com.arjuna.ats.arjuna] (Transaction Reaper) ARJUNA012378: ReaperElement appears to be wedged: com.arjuna.ats.arjuna.coordinator.CheckedAction.check(CheckedAction.java:67)
com.arjuna.ats.arjuna.coordinator.BasicAction.checkChildren(BasicAction.java:3387)
com.arjuna.ats.arjuna.coordinator.BasicAction.Abort(BasicAction.java:1651)
com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.cancel(TwoPhaseCoordinator.java:127)
com.arjuna.ats.arjuna.AtomicAction.cancel(AtomicAction.java:215)
com.arjuna.ats.arjuna.coordinator.TransactionReaper.doCancellations(TransactionReaper.java:381)
com.arjuna.ats.internal.arjuna.coordinator.ReaperWorkerThread.run(ReaperWorkerThread.java:78)
jstack thread dumps are not showing anything obvious, and it is not happening on every machine. On my local dev box if I knock the transaction timeout down to 30s then simulate long running transactions which timeout I cannot get the same results - the transaction times out and the Reaper aborts it - but there is no subsequent wedged thread (of course I am just running single user where as the other servers are under load).
Anyone seem anything similar and has any pointers to debugging.
Thanks
Ian