Undertow doesn't process request sometimes
vigneshsm Jan 26, 2017 5:54 AMHi,
We are seeing this issue in undertow 1.4.8 where it fails to process a POST(HTTPS) request sent from the SOAPUI client sometimes.
In the below screenshot you can see that the POST request has been received at undertow server at 14:16:13.015876 and we could see a tcp acknowledgement being sent immediately after that.
As you can see the there is no further response sent and the client times out after 180 seconds and closes the connection.
From the undertow logs it looks like the default-task that handled the previous request(14:16:13:006359) was taking some 20ms to print the request dump after the response had been sent on wire.
The next request has been sent from the client before the previous task thread completes.
this is shown below:
Undertow logs:
2017-01-26 14:16:13,008 TRACE [default task-124]-[io.undertow.server.HttpServerExchange] Starting to write response for HttpServerExchange{ POST /soap/ErrorHandlerServlet request {Connection=[Keep-Alive], SOAPAction=[""], Accept-Encoding=[gzip,deflate], Transfer-Encoding=[chunked], Content-Type=[text/xml;charset=UTF-8], User-Agent=[Apache-HttpClient/4.1.1 (java 1.5)], Host=[192.168.92.235:8443]} response {Expires=[0], Connection=[keep-alive], WWW-Authenticate=[Basic realm="NetworkInspectorRealm"], Cache-Control=[no-cache, no-store, must-revalidate], Pragma=[no-cache], Content-Length=[71], Date=[Thu, 26 Jan 2017 07:16:13 GMT]}}
2017-01-26 14:16:13,026 INFO [default task-124]-[io.undertow.request.dump]
----------------------------REQUEST---------------------------
URI=/soap/services/ApcRemotePort/9.5
characterEncoding=null
contentLength=-1
We have added additional logs in XNIO as well and the logs show that the default task-124 resumes reading before it returns.
XNIO logs:
2017-01-26 14:16:13,008 TRACE [default I/O-12]-[org.xnio.nio.selector] Beginning select on sun.nio.ch.EPollSelectorImpl@2e90bec2 (with timeout)
2017-01-26 14:16:13,026 TRACE [default task-124]-[org.xnio.nio] resuming sun.nio.ch.SelectionKeyImpl@1f8b66a3 on java.nio.channels.SocketChannel[connected local=/192.168.92.235:8443 remote=192.168.72.51/192.168.72.51:58756] ops 1
2017-01-26 14:16:13,026 TRACE [default task-124]-[org.xnio.nio.selector] setOps key sun.nio.ch.SelectionKeyImpl@1f8b66a3 for java.nio.channels.SocketChannel[connected local=/192.168.92.235:8443 remote=192.168.72.51/192.168.72.51:58756] readyOps 1 interestOps 0 setOps 1
2017-01-26 14:16:13,026 TRACE [default task-124]-[org.xnio.nio.selector] setOps normal
Ideally after this point we expect the default I/O thread to select the SelectionKeyImpl@1f8b66a3 to start reading the request out from the channel, but this doesn't happen.
The selectionkey is selected only after 180 seconds only to get cancelled.
2017-01-26 14:19:13,016 TRACE [default I/O-12]-[org.xnio.nio.selector] Selected key sun.nio.ch.SelectionKeyImpl@1f8b66a3 for java.nio.channels.SocketChannel[connected local=/192.168.92.235:8443 remote=192.168.72.51/192.168.72.51:58756]
2017-01-26 14:19:13,021 TRACE [default I/O-12]-[org.xnio.nio] suspending sun.nio.ch.SelectionKeyImpl@1f8b66a3 on java.nio.channels.SocketChannel[connected local=/192.168.92.235:8443 remote=192.168.72.51/192.168.72.51:58756] ops 1
2017-01-26 14:19:13,021 TRACE [default I/O-12]-[org.xnio.nio.selector] clearOps key sun.nio.ch.SelectionKeyImpl@1f8b66a3 for java.nio.channels.SocketChannel[connected local=/192.168.92.235:8443 remote=192.168.72.51/192.168.72.51:58756] readyOps 1 interestOps 1 setOps 1
2017-01-26 14:19:13,021 TRACE [default I/O-12]-[org.xnio.nio.selector] clearOps in current thread
2017-01-26 14:19:13,022 TRACE [default I/O-12]-[org.xnio.nio] Cancelling key sun.nio.ch.SelectionKeyImpl@1f8b66a3 of java.nio.channels.SocketChannel[connected ishut local=/192.168.92.235:8443 remote=192.168.72.51/192.168.72.51:58756] (same thread)
Its not clear on why the task thread that handled the previous request took around 18ms to wind up after it sent a response, also strange is the selector not selecting the SelectionKeyImpl@1f8b66a3 to process the next request.
The default task thread usually takes between 1 to 2 ms to send the response back when the issue is not seen.
Any help on tackling the issue is appreciated.
Thanks,
Vignesh