this is weird indeed.
Would it be possible for you to write a UT to try to simulate this? As this issue only hits after 90 mins and the scenario isn't that straight forward, it's really hard to reproduce it otherwise.
That would be difficult, but after 90 minutes in this case means in more loaded conditions. Since in our load tests the load is gradually increased continously to reach 200% (expected peak user load for current year x 2) but in this case the application failed due to caching even before it reached 100% (about 2000 concurrent users)
The issue has not come up since we increased all the values of eviction as per load env. But the concern is still there since it has occured once, it shouldn't happen in production
we have encountered the same issue again, following are the settings used
org.jboss.cache.NodeNotValidException: Node /Products is not valid. Perhaps it has been moved or removed. at org.jboss.cache.invocation.NodeInvocationDelegate.assertValid(NodeInvocationDelegate.java:497) at org.jboss.cache.invocation.NodeInvocationDelegate.getChild(NodeInvocationDelegate.java:330) at com.pearson.commonbiz.app.appservices.caching.JBossCacheAdpater.getFromCache(JBossCacheAdpater.java:316) at com.pearson.commonbiz.app.appservices.ResourcesCacheHelper.getFromCache(ResourcesCacheHelper.java:232) at com.pearson.commonbiz.app.appservices.ResourcesCacheHelper.getResourceIDTypeListFromCache(ResourcesCacheHelper.java:156) at com.pearson.ph.ois.presentation.actions.OISCoreAction.isProductHasService(OISCoreAction.java:734) at com.pearson.ph.ois.presentation.actions.GetAllAssignmentsForClassAction.processRequest(GetAllAssignmentsForClassAction.java:319) at com.pearson.ph.ois.presentation.actions.OISCoreAction.execute(OISCoreAction.java:87) at org.apache.struts.action.RequestProcessor.processActionPerform(RequestProcessor.java:425) at org.apache.struts.action.RequestProcessor.process(RequestProcessor.java:228) at org.apache.struts.action.ActionServlet.process(ActionServlet.java:1913) at com.pearson.commonsys.dynaweb.uicontroller.ConcertActionServlet.process(ConcertActionServlet.java:230) at org.apache.struts.action.ActionServlet.doGet(ActionServlet.java:449)
Like I told you earlier we store the "Product" node in a hashtable which is marked as resident in the begining after the creation
Lately in our performance tests the error was coming up in logs every few hours.
I am not sure how I can reproduce it since its very inconsistent.
Could you please help us a bit with this
A couple of thoughts: If losing data is undesirable, then disable eviction. If eviction is required but losing data is undesirable, use a cache loader. If eviction is happening sooner than you expect, then don't sent max nodes and set a minimum time to live for nodes.
We are facing the same issue. Have anybody resolved this issue?
All our region nodes are resident nodes. Still we are getting this issue.