8 Replies Latest reply on Sep 8, 2017 4:50 PM by rickyterzis

    Server hanging with ARJUNA012378: ReaperElement appears to be wedged

    ianmacintyre

      Hi folks,

       

      We are having a strange issue with Transaction Reaper threads becoming wedged, which appears to lock up the entire server.  A restart is then needed, however it only happens on some machines - not all.  This is running Wildfly 10.0.

       

      The initial symptom is a transaction timeout which the reaper attempts to abort.  From the stacktrace we see it is happening in our code - waiting on the db to return (or a call to an external system).  This part I can understand.

       

      Then, within 2s I see a warning the Reaper element is wedged.  The stack trace at that point seems to point to JBoss logging

       

      2017-07-14 08:10:22,602 WARN  [com.arjuna.ats.arjuna] (Transaction Reaper) ARJUNA012378: ReaperElement appears to be wedged: java.security.AccessController.doPrivileged(Native Method)

      org.jboss.logmanager.formatters.Formatters$14.renderRaw(Formatters.java:638)

      org.jboss.logmanager.formatters.Formatters$JustifyingFormatStep.render(Formatters.java:225)

      org.jboss.logmanager.formatters.MultistepFormatter.format(MultistepFormatter.java:83)

      org.jboss.logmanager.ExtFormatter.format(ExtFormatter.java:32)

      org.jboss.logmanager.handlers.WriterHandler.doPublish(WriterHandler.java:46)

      org.jboss.logmanager.ExtHandler.publish(ExtHandler.java:76)

      org.jboss.logmanager.LoggerNode.publish(LoggerNode.java:314)

      org.jboss.logmanager.LoggerNode.publish(LoggerNode.java:322)

      org.jboss.logmanager.LoggerNode.publish(LoggerNode.java:322)

      org.jboss.logmanager.LoggerNode.publish(LoggerNode.java:322)

      org.jboss.logmanager.LoggerNode.publish(LoggerNode.java:322)

      org.jboss.logmanager.Logger.logRaw(Logger.java:850)

      org.jboss.logmanager.Logger.log(Logger.java:802)

      org.jboss.logging.JBossLogManagerLogger.doLog(JBossLogManagerLogger.java:44)

      org.jboss.logging.Logger.logv(Logger.java:2256)

      com.arjuna.ats.arjuna.logging.arjunaI18NLogger_$logger.warn_coordinator_CheckedAction_2(arjunaI18NLogger_$logger.java:775)

      ...more

       

      (the above happens perhaps about 90% of the time - however sometimes the stack trace doesnt seem to reflect anything)

       

      2017-07-14 08:08:15,779 WARN  [com.arjuna.ats.arjuna] (Transaction Reaper) ARJUNA012378: ReaperElement appears to be wedged: com.arjuna.ats.arjuna.coordinator.CheckedAction.check(CheckedAction.java:67)

      com.arjuna.ats.arjuna.coordinator.BasicAction.checkChildren(BasicAction.java:3387)

      com.arjuna.ats.arjuna.coordinator.BasicAction.Abort(BasicAction.java:1651)

      com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.cancel(TwoPhaseCoordinator.java:127)

      com.arjuna.ats.arjuna.AtomicAction.cancel(AtomicAction.java:215)

      com.arjuna.ats.arjuna.coordinator.TransactionReaper.doCancellations(TransactionReaper.java:381)

      com.arjuna.ats.internal.arjuna.coordinator.ReaperWorkerThread.run(ReaperWorkerThread.java:78)

       

      jstack thread dumps are not showing anything obvious, and it is not happening on every machine.  On my local dev box if I knock the transaction timeout down to 30s then simulate long running transactions which timeout I cannot get the same results - the transaction times out and the Reaper aborts it - but there is no subsequent wedged thread (of course I am just running single user where as the other servers are under load).

       

      Anyone seem anything similar and has any pointers to debugging.

       

      Thanks

      Ian