I have been evaluating Infinispan (5.3.0) in an embedded mode as a distributed data caching solution for our system. Currently, we are using Ehcache in a local mode and it works without any problems, however there is no clustering. So far, I have been experiencing various issues with Infinispan, and I'm not sure if it can meet our requirements. I'd appreciate if someone more familiar with this product could provide more insight.
Our system consists of a set of servers that receive messages from a JMS queue and for each message do some processing, which involves interaction with a database. The database contains both reference data (used to look up some IDs) and data that is updated while messages are being processed. For example, processing of a message may involve multiple ID look ups and insertion of some new records into the database.
For performance reasons, we are caching (using Ehcache) reference (read-only) data. Most of the data is cached lazily, whereas some mutable data is preloaded first (if the data set if big, i.e. 2 MM - 4 MM records) and then lazily cached as new messages are being processed. To optimize this further, we want to use a distributed cache in order to prevent unnecessary database look ups (i.e. when data is already retrieved by one node, it should be available for others).
If all data was read-only, we could technically preload all records at the startup and store in local caches. However, it is not always the case. Some additional data is written to the database as messages are being processed.
Given the above requirements and my understanding of Infinispan, initially, I configured a cluster using an asynchronous (UDP-based) replication, with a replication queue. I chosen this approach, since all cluster members need to share the same data (there are 6 cluster members). Unfortunately, this doesn't seem to work as the system runs out of memory (no matter how big the heap size: 10 GB, 20 GB...) because of a very high replication rate, which according to JMX statistics is not justified (I described the problem in another thread - Replication count very high (5.3.0)). It appears that Infinispan enters an infinite replication spin, even where there are cache hits...
Having been unsuccessful with the replication, I also tried a distributed mode (which in my opinion doesn't fit into this model but wanted to try it anyways). It didn't work either because of deadlocks and high rate of lock timeouts - lock timeouts were expected as multiple nodes could put the entries in the cache using the same keys, at the same time (for that test I used 3 nodes and 2 owners).
Can someone confirm whether Infinispan is a good tool for the job and if yes, then whether the clustered replication mode is the correct model for the above requirements? I can provide more details if necessary.