JBoss EAP 6.2.3 Hangs with AJP Threads Blocked
jim-fan Jan 8, 2019 6:01 AMDear all,
My first post on JBoss Developer, I would try to state the problem as accurate as I can.
Setup: 2 x load balanced IIS on Windows Server 2012, forwarding request to 2 x JBoss (on same Windows Servers) in cluster with session replication
Fault symptom: From end-user perspective, website becomes unresponsive. Service only resumes after restarting both JBoss instances manually.
The fault underwent a very lengthy troubleshooting process and allow me to skip ahead to the last two steps performed:
Firstly, at incident time, thread dump was performed using jboss-cli.bat comes with JBoss. Findings:
- Many AJP threads blocked by older AJP threads:
{
"thread-id" => 991L,
"thread-name" => "ajp-{server name}-cluster/{IP address}:8009-175",
"thread-state" => "BLOCKED",
"blocked-time" => -1L,
"blocked-count" => 1L,
"waited-time" => -1L,
"waited-count" => 0L,
"lock-info" => {
"class-name" => "org.jboss.as.web.session.SessionBasedClusteredSession",
"identity-hash-code" => 1445846157
},
"lock-name" => "org.jboss.as.web.session.SessionBasedClusteredSession@562ddc8d",
"lock-owner-id" => 855L,
"lock-owner-name" => "ajp-{server name}-cluster/{IP address}:8009-80",
"stack-trace" => [
{
"file-name" => "DistributableSessionManager.java",
"line-number" => 858,
"class-name" => "org.jboss.as.web.session.DistributableSessionManager",
"method-name" => "storeSession",
"native-method" => false
},
{
"file-name" => "InstantSnapshotManager.java",
"line-number" => 47,
"class-name" => "org.jboss.as.web.session.InstantSnapshotManager",
"method-name" => "snapshot",
"native-method" => false
},
{
"file-name" => "ClusteredSessionValve.java",
"line-number" => 142,
"class-name" => "org.jboss.as.web.session.ClusteredSessionValve",
"method-name" => "handleRequest",
"native-method" => false
},
{
"file-name" => "ClusteredSessionValve.java",
"line-number" => 99,
"class-name" => "org.jboss.as.web.session.ClusteredSessionValve",
"method-name" => "invoke",
"native-method" => false
},
{
"file-name" => "JvmRouteValve.java",
"line-number" => 92,
"class-name" => "org.jboss.as.web.session.JvmRouteValve",
"method-name" => "invoke",
"native-method" => false
},
{
"file-name" => "LockingValve.java",
"line-number" => 64,
"class-name" => "org.jboss.as.web.session.LockingValve",
"method-name" => "invoke",
"native-method" => false
},
(stacktrace skipped)
- The blockers are found waiting for FlowControl credit:
{
"thread-id" => 855L,
"thread-name" => "ajp-{server name}-cluster/{IP address}:8009-80",
"thread-state" => "TIMED_WAITING",
"blocked-time" => -1L,
"blocked-count" => 1L,
"waited-time" => -1L,
"waited-count" => 182L,
"lock-info" => {
"class-name" => "org.jgroups.protocols.FlowControl$Credit",
"identity-hash-code" => 1691984096
},
"lock-name" => "org.jgroups.protocols.FlowControl$Credit@64d9a0e0",
"lock-owner-id" => -1L,
"lock-owner-name" => undefined,
"stack-trace" => [
{
"file-name" => "Object.java",
"line-number" => -2,
"class-name" => "java.lang.Object",
"method-name" => "wait",
"native-method" => true
},
{
"file-name" => "FlowControl.java",
"line-number" => 553,
"class-name" => "org.jgroups.protocols.FlowControl$Credit",
"method-name" => "decrementIfEnoughCredits",
"native-method" => false
},
{
"file-name" => "UFC.java",
"line-number" => 114,
"class-name" => "org.jgroups.protocols.UFC",
"method-name" => "handleDownMessage",
"native-method" => false
},
{
"file-name" => "FlowControl.java",
"line-number" => 341,
"class-name" => "org.jgroups.protocols.FlowControl",
"method-name" => "down",
"native-method" => false
},
{
"file-name" => "FlowControl.java",
"line-number" => 351,
"class-name" => "org.jgroups.protocols.FlowControl",
"method-name" => "down",
"native-method" => false
},
{
"file-name" => "FRAG2.java",
"line-number" => 247,
"class-name" => "org.jgroups.protocols.FRAG2",
"method-name" => "fragment",
"native-method" => false
},
{
"file-name" => "FRAG2.java",
"line-number" => 131,
"class-name" => "org.jgroups.protocols.FRAG2",
"method-name" => "down",
"native-method" => false
},
{
"file-name" => "RSVP.java",
"line-number" => 143,
"class-name" => "org.jgroups.protocols.RSVP",
"method-name" => "down",
"native-method" => false
},
{
"file-name" => "ProtocolStack.java",
"line-number" => 1030,
"class-name" => "org.jgroups.stack.ProtocolStack",
"method-name" => "down",
"native-method" => false
},
{
"file-name" => "JChannel.java",
"line-number" => 722,
"class-name" => "org.jgroups.JChannel",
"method-name" => "down",
"native-method" => false
},
(stackstrace skipped)
After consulting documentation, it is found JGroups is the credit-based protocol used for session replication. We then tried to workaround by removing "<distributable />" from web.xml of web application deployed to both JBoss. The problem does not happen since then.
Despite this we would still very much like to find out the root cause. After all, if it is caused by application code, the pitfall should be avoided at all cost. Or could it be a bug in JGroups or even JBoss? This is what I can't rule out as well.
Appreciate any input.
Thanks,
Jim