WildFly 8.2.1, JDK 1.8
WF receives a lot of requests of different types:
- simple http request to servlets, part of them calling EJB components
- web-service calls to EJB components
Frequently start async work through TimerService.
All EJB are stateless.
Some configuration moments:
Undertow worker - io-threads="16" task-max-threads="200"
Added pool for stateless EJB with max 50 size.
Thread pool for EJB - 128.
With this configuration on any load increase - too "many open files: exceptions starts and WF undeploing all deployed components.
We raised ulimit + added limit filter to undertow:
max-concurrent-requests="200" queue-size="1000"
It helped with "too many open files" exceptions, but still had stuck of calls about 1 time per 1-2 days with no extra load on server. Stuck was about 3-6 minutes max.
In logs we found errors like "javax.ejb.EJBException: JBAS014516: Failed to acquire a permit within 5 MINUTES" on most "popular" EJB
So we decided to increase EJB pool limit from 50 to 256. And receive long stuck, and restart of WF didnt help (may be a lot of stored timers?)
At this stuck, no extraordinary load on server or DB. All EJB pools maxed to 256, EJB3 thread pool used all 128 thread and queue of task increased. But there was no real work, cpu and memory usage even reduced.
jstack shows no deadlocks, statistic on jstack:
Total threads: 598
IN_NATIVE = 56
org.hornetq.core.libaio.Native.internalPollEvents = 1
sun.nio.ch.EPollArrayWrapper.epollWait = 20
sun.nio.fs.LinuxWatchService.poll = 27
java.net.SocketInputStream.socketRead0 = 6
sun.nio.ch.PollArrayWrapper.poll0 = 2
IN_VM = 1
java.lang.Class.forName0 = 1
BLOCKED = 540
java.lang.Object.wait = 47
java.lang.Thread.sleep = 1
sun.misc.Unsafe.park = 490
IN_JAVA = 1
org.jboss.marshalling.river.RiverUnmarshaller.doInitSerializable = 1
We have no problem with same code on same server when run with JBoss 5.1.0.GA.
What wrong configured? What to watch?