4 Replies Latest reply on Sep 9, 2010 9:59 PM by penkween

How to use Infinispan to implement distributed content servers

penkween Sep 6, 2010 3:16 AM

Hi,

      Basically I have tried out the idea using some fast key-value store like "Redis", but to store
tera numbers of files (each having 500k-2MB in size), even only keys are stored in memory with values 
swapped out to disk using virtual memory, we still requires huge memory cost. So, I need to find another 
cache store solution which allows us to put hot items into memory and offload cold item into the file 
store. And if someone access the cold item, it will then become hot item loaded into memory until it 
become cold item using LRU expiration algorithm  So, in order to increase the overall performance later,
we just need to add RAM gradually into the server cluster but we don't have to put the whole set of 
(keys+values) or (keys) into memory.
 

      Can we use some combination of Infinispan's goodies like GridFileSystem, Eviction and  Persistence 
Strategy eg.FileCacheStore to implement the above ? 
 
      There is an article at http://community.jboss.org/wiki/Infinispaninteractivetutorial showing that 
 we can combine Eviction with FileCacheStore. Can we combine Eviction with GridFileSystem too? if yes, 
 how should the configuration file look like?
 

Thanks.

1. Re: How to use Infinispan to implement distributed content servers

manik Sep 7, 2010 7:39 AM (in response to penkween)

I'm not sure if the grid filesystem is a good fit for you. Perhaps what you need is eviction + a cache store, + organise the cache such that you map keys to path names. E.g.,

cache.put("file1", "/home/blah/path/to/file1.txt");

A setup like that, along with using LIRS for eviction and a fast cache store impl, should work well for you.
1 of 1 people found this helpful
Actions
2. Re: How to use Infinispan to implement distributed content servers

penkween Sep 8, 2010 12:13 AM (in response to manik)
         Thank for your reply. As with above suggestion, we have tried a quite similar usage using [MemCache] + [FileSystem(ext3)] and we start to face lot of invalidation problem when we try to scale into multiple server (not too high end one). So, then, we see if we could put that 2 components into a single solution as [Cache + Store], we try Redis, since it can persist the cache to file store (so no need for invalidation) , but then, it requires us to put the entire data set or the best it can offer is to allow only entire key set to be put into Memory. If we have large key set (ie. large number of files), it is still not feasible and efficient since not all key set are hot key (ie.not all files are accessed frequently). Then, as current, we take a look at Infinispan, it provides us with a persisted cache store (filecachestore) like Redis but it also allows us to offload Cold items into filesystem via [Eviction + Filecachestore] but unlike Redis, the offloaded Cold Item is not gone/expired forever, it can be loaded back into Memory and become Hot Item again depends on the Memory policy eg.MaxEntries= , all handled automatically by Infinispan. I have tried the Server Clustering and Memory federation, Infinispan is simply marvelous and elegant !

              So, Infinispan is already fit our requirement by using [Eviction + FileCacheStore ] but then we have to design some additional file & metadata storage method . Then , I stumble upon the Infinispan's GridFileSystem where it can store metadata as replication cache and file data as distribution cache, all run in a cluster and it is GOOD and seem to fit us. So without reinventing any wheel, we are investigating and testing whether GridFileSystem really fit us, since we are dealing with file and metadata after all. GridFileSystem seem like fit us naturally.

      As highlighted by a Gridfilesystem tutorial at http://community.jboss.org/wiki/GridFileSystem
Cache<String,byte[]> data = cacheManager.getCache(“distributed”); Cache<String,GridFile.Metadata> metadata = cacheManager.getCache(“replicated”); GridFilesystem fs = new GridFilesystem(data, metadata);

        Of course, we can use GridFileSystem only as In-Memory file store. But, why don't we use GridFileSystem as In-Memory backed by FileCacheStore ? for the above code, if the metadata (replicated) and data(distributed) caches is backed by FileCacheStore and together with eviction, can I say it is already Work like previous usage senarios as [Eviction+FileCacheStore] ? So, instead of just KV cache-store, now GridFileSystem can offer us a higher level File cache-store and together with the future DIST features, it is simply unbeatable.

      Hi Manik, do you have any idea where can I get the full configuration definition of the "distributed" and "replicated" cache definition of the above GridFileSystem tutorial code so that I could test it out?

Thanks .
Actions
3. Re: How to use Infinispan to implement distributed content servers

galder.zamarreno Sep 8, 2010 11:04 AM (in response to penkween)

Danny, you can have a replicated cache for metadata and distributed cache for data independent of using grid file system.

Whether GridFS suits your use case or not depends on what kind of API you want to expose to your clients. If you want them to view it as a filesystem, it'd be correct. If you want them to see as a k/v data source, use the normal cache API.

Wrt the configuration, there's no specific configuration definition for it.If you take all.xml shipped in Infinispan distros (http://anonsvn.jboss.org/repos/infinispan/tags/4.1.0.FINAL/core/src/main/resources/config-samples/all.xml), you can use the 'distributedCache' for the distributed one. You can use the default cache configuration in that file for the replicated one. To access the default config, you'd simply call cacheManager.getCache();
Actions
4. Re: How to use Infinispan to implement distributed content servers

penkween Sep 9, 2010 9:59 PM (in response to galder.zamarreno)

Hi Galder, Thank you for your reply and pointing out related to the GridFS. Shall give it a implementation trial. Thank.
Actions

Go to original post