OOME when using JBPM
peterj Mar 17, 2016 1:01 PMWildfly 8.1.0.Final
JBPM 6.1.0.Final
2GB heap
I ran into an OOME on heap space and I suspect one of our JBPM processes. The process is started by a timer. The timer used to be set to 1 day; after we changed the timer to 10 minutes we started to run out of heap space every 3 days. As a temp band-aid we changed the timer to 1/2 day and thus are avoiding the issue (though if my calculations are correct, in 7 months we should run into an OOME again).
We took a heap dump on one of the OOMEs and used the Eclipse Memory Analyzer to examine it. According to it, the culprit is:
One instance of "java.util.concurrent.LinkedBlockingQueue" loaded by "<system class loader>" occupies 1,800,720,424 (88.66%) bytes. The instance is referenced by org.jboss.threads.JBossThreadPoolExecutor @ 0x78364d9c8 , loaded by "org.jboss.modules.ModuleClassLoader @ 0x78320c870".
Looking at the linked list, every node in the linked list appears to contain a org.jboss.as.ejb3.component.interceptors.AsyncFutureInterceptorFactory$1$2 instance. Looking at a histogram, there are 2,103,657 instances of this class in the heap. I added some logging to that class to find out how it being created and found dozens of occurrences this call stack:
at org.jboss.as.ejb3.component.interceptors.AsyncFutureInterceptorFactory$1.processInvocation(AsyncFutureInterceptorFactory.java:71) [wildfly-ejb3-8.1.0.Final.jar:8.1.0.Final]
at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:309)
at org.jboss.as.ejb3.component.interceptors.LogDiagnosticContextStorageInterceptor.processInvocation(LogDiagnosticContextStorageInterceptor.java:67) [wildfly-ejb3-8.1.0.Final.jar:8.1.0.Final]
at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:309)
at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61)
at org.jboss.as.ee.component.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:73)
at org.jbpm.executor.impl.AvailableJobsExecutor$$$view6.executeJob(Unknown Source) [jbpm-executor-6.1.0.Final.jar:6.1.0.Final]
at org.jbpm.executor.impl.ExecutorRunnable.run(ExecutorRunnable.java:42) [jbpm-executor-6.1.0.Final.jar:6.1.0.Final]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [rt.jar:1.7.0_75]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) [rt.jar:1.7.0_75]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) [rt.jar:1.7.0_75]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [rt.jar:1.7.0_75]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_75]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_75]
at java.lang.Thread.run(Thread.java:745) [rt.jar:1.7.0_75]
A little bit about the process in question: once started by the timer, the first step is a script task that reads about 3000 objects form a database, those objects in turn have an embedded object; the embedded objects are examined for possible state changes (there are 2 possible changes), and if the state did change, the object is added to a collection. The collection is then processed, one object at a time, by a subprocess. The overall process logs a starting and a stopping message, along with an examined object count and processed object count. Looking at the logs for when the time was 10 minutes, most of the time the process finds no state changes; when it does find state changes usually it is only a few objects; I don't think I ever saw more than 5 but it is possible. (Of course, when the timer is 1/2 day there might be 20-50 objects with state changes.)
So my questions:
1) Under what circumstance in the JBPM AvailableJobsExecutor is an AsyncFutureInterceptorFactory$1$2 object placed into the thread pool linked list, and under what circumstances would it later be removed? Because it appears that the condition under which it would normally be removed is not being met.
2) Is there a way that I can tell which of my defined processes AvailableJobsExecutor is working on when it performs the processing suggested by the above stack trace? In my environment, I have 38 process definitions and 8 deployments. It would be helpful if I could determine which processes were causing the various entries in the linked list that's taking up all the heap space.
3) What does AsyncFutureInterceptorFactory$1$2 do? Knowing what it does would help understand why it is in the linked list. And is there a way that I can determine, from the AsyncFutureInterceptorFactory$1$2, which of my objects it is going to "intercept'? I've dug through the heap analyzer histogram and cannot find any references to any of my objects, not even within the hashmaps that seem to be everywhere.