Here is a follow up to my earlier post...
After testing and playing with the configuration we found that repl_async w/out using the replication queue was causing the memory leak previously reported. The issue was resolved simply by changing jboss-cache-service.xml as follows:
< attribute name="CacheMode">REPL_ASYNC< /attribute>
< attribute name="UseReplQueue">true< /attribute>
< attribute name="ReplQueueInterval">100< /attribute>
< attribute name="ReplQueueMaxElements">1000< /attribute>
< attribute name="ClusterConfig">
<!-- Not sure if this is required but was changed in our -->
<!-- config, added the max_bytes attribute... -->
<pbcast.STABLE desired_avg_gossip="20000" up_thread="false" down_thread="false" max_bytes="250000" />
After this change where where able to process 750 events/second per node or 1500 events/second across 2 nodes for 5+ days before stopping the test. Each event either inserts or deletes in 5 separate cache regions. Before making the configuration change we where only able to process this load for 15-20 hours before running out of memory.
Hopefully this is helpful to someone.
Thanks for the heads up !
Were you using FC in your stack ? FC, together with STABLE (and max_bytes set) would prevent an OOME, too.
No, unfortunately I missed the Flow Control option completely. Given a pbcast.STABLE defined as:
<pbcast.STABLE desired_avg_gossip="20000" up_thread="false"
would you say that a FC element configured as follows would be appropriate?
Thanks for the input!
Yes, that's correct. I suggest take a look at one of the configs shipped with JGroups (e.g. udp.xml) as example