this looks like a ThreadLocal leak. Recently, we've fixed a similar issue but it was observed when using the Atmosphere Push Library on Tomcat - the reason was that a Weld listener, responsible for the cleanup and implementing javax.servlet.ServletRequestListener.requestDestroyed(), was not invoked. Could you describe your application and technologies used? As to lifecycle events - does your logout action include an explicit HTTP session invalidation, i.e. calling javax.servlet.http.HttpSession.invalidate()? If not, it's the matter of HttpSession timeout when is the @PreDestroy callback invoked. However, bean instances must never be shared/reused across different sessions.
thanks for your quick reaction. Regarding your questions:
Since it is a quite simple web application the variety of technologies is quite low.
- JDK 1.7
- Glassfish 4.0
- WELD 2.0.5
- JSF 2.2.5
- OmniFaces 1.7
- Apache Commons
- Gson Library
Yes, if the users use the logout button, the logout action includes an explicit session invalidate call:
Do you use PrimeFaces Push or any other websockets/long-polling/streaming solution as well?
No, we don't use any of these techniques. It's in the traditional sense a simple web application.
Ok, back to the beginning. You've mentioned there are "two independent instances of the same JSF 2.2 application on two machines". Does it mean that there's no session replication, no load balancing, etc. Those machines are isolated and users always send requests to the same machine...
Exactly. These two machines are literally at opposite sides of the globe and the Glassfish instances are setup to be all by themselves without communicating with each other. Users at one site only use their local server and not the other instance.
Well, I'm running out of ideas. It would be great if you could reproduce the problem locally - but I know it's difficult/impossible. I would start with reducing the number of threads GF is using to handle HTTP requests (I still think it might be some ThreadLocal leak) and looking for strange exceptions in the log. I would also try GlassFish forums.
So do we. Am I understanding you correctly, that the assumption of a thread local leak would mean, that after a process is done (e.g. a request was processed) the value of a ThreadLocal instance is not cleared and therefore (in some rare situations) reused on other requests being processed by the very same thread? So decreasing the number of threads inside the domain's thread pool to 1 or 2 could maybe increase the probability to have the error occurring.
Yes, that's exactly what I mean. I'm not sure about which thread pool is used in GF to handle HTTP requests though.
Thanks for the lead. To find the right thread pool is no big deal compared to the rest.