6 Replies Latest reply on Sep 16, 2010 2:39 AM by Danny C

    Is Infinispan GridFileSystem persistable ?

    Danny C Newbie

                While the In-Memory GridFileSystem using replicated cache (for Metadata) and distributed cache (for Data) is stable, but when I try to make GridFileSystem's Metadata persist using FileCacheStore as shown in the config xml below  , I am facing some evicted entries can't be really retrieved by the following codes:

       

       

      Test Codes used:

      ##########################################################################################

      Cache<String,byte[]> data = cacheManager.getCache("distributed");              
      Cache<String,GridFile.Metadata> metadata = cacheManager.getCache("replicated");   //Current test only try to make metadata persistable

      GridFilesystem fs = new GridFilesystem(data, metadata);

       

      //------- After creating some directories and files (with empty data) exceeding eviction maxEntries="5"  ------------//

       

      //-------- Try to list the files -------//

      File file=fs.getFile("/");                                   
      File[] files=file.listFiles();
      files.toString();                     //Missing entries

       

      ***** Although metadata cache survive restart using metadata.stop() & metada.start() , BUT :----- *****

      ***** Not all directories or files entry are retrieved by "file.listFiles()" . I guess those evicted one gone missing .... *****

      ***** Not all entries are retrieved by "metadata.keySet() or metadata.entrySet() or metadata.values() ***

      ***** If without eviction, everythings look fine *****

       

       

      Anybody has any ideas if Infinispan GridFS is persistable? and is it possible to make both its metadata & data persistable (with or without passivation ) using FileCacheStore?

       

       

       

       

       

      Test config.xml used:

      ##########################################################################################

      <?xml version="1.0" encoding="UTF-8"?>

       

      <infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:infinispan:config:4.0">

       

      <global>
          <transport clusterName="infinispan-cluster" distributedSyncTimeout="50000" nodeName="Jalapeno"/>
          <serialization marshallerClass="org.infinispan.marshall.VersionAwareMarshaller" version="1.0"/>
          <shutdown hookBehavior="DEFAULT"/>
      </global>

       

      <namedCache name="distributed">
          <clustering mode="distribution">
               <sync/>
               <hash numOwners="3" rehashWait="120000" rehashRpcTimeout="600000"/>
               <l1 enabled="true" lifespan="600000"/>
          </clustering>
      </namedCache>
         
      <namedCache name="replicated">
          <clustering mode="replication">
               <sync replTimeout="20000"/>
          </clustering>
          <loaders passivation="false" shared="false" preload="true">
              <loader class="org.infinispan.loaders.file.FileCacheStore" fetchPersistentState="true"
                         ignoreModifications="false" purgeOnStartup="false">
                  <properties>
                      <property name="location" value="/tmp"/>
                  </properties>
              </loader>
          </loaders>
          <eviction wakeUpInterval="500" maxEntries="5" strategy="FIFO"/>
      </namedCache>

       

      </infinispan>

       

      ##########################################################################################

        • 1. Re: Is Infinispan GridFileSystem persistable ?
          Vladimir Blagojevic Master

          Why do you want to make it persistable? It kinda defeats the purpose of why GridFS was made to begin with!

           

          Cheers,

          Vladimir

          • 2. Re: Is Infinispan GridFileSystem persistable ?
            Danny C Newbie

                       Hi Vladimir, Thank for your reply. One of the great features of infinispan is its ability to persist its cache to filestore using eviction. The reason for the persistence of the GridFS use case is as highlighted in my previous post at http://community.jboss.org/thread/156207?tstart=0 . A simple thought will be since infinispan cache is persistable then GridFS using the underlying cache (one is replicated, one is distributed) should be persistable as well.

             

                      If GridFs is indeed persistable, then we can use Infinispan as combo (filesystem + cache) and save our pain with other Distributed Filesystem and implement cumbersome cache invalidation. Plus, we can enjoy the fast Cache (in a clustering distributed manner) and backed by slower Filesystem where Hot Item are loaded into Cache memory from Filesystem and Cold Item will be offloaded from Cache memory and we don't have to worry about running out of memory because all this is managed by Infinispan Eviction/Classloaders policy.

             

                     As far as I know, this really set Infinispan apart from other open source cache like Memcache & Redis and even Membase. I am not quite sure about the beginning purpose/direction of the GridFS, from a user point of view,  the question is why not for GridFS to support persistable backed store or it is already supporting it. That's why I am asking the question in this post.

             

            Thanks.

            • 3. Re: Is Infinispan GridFileSystem persistable ?
              Vladimir Blagojevic Master

              Hey Danny,

               

              File data is backed by dist cache and metadata is in replicated cache. Metadata is tiny memory-wise and I do not think there is a need for eviction in that cache. Also, I am not sure your configuration file makes sense. Why do you have maxEntries set to 5? If anything needs to be evicted/persisted it is the real file data that could be potentially huge. I am not sure how to optimize/configure that scenario and you'd have to experiment on your own

               

              Am I making more sense now?

              Vladimir

              • 4. Re: Is Infinispan GridFileSystem persistable ?
                Danny C Newbie

                Hi Vladimir,

                 

                    Thank for taking your time in this. First of all, in our deployment case using infinispan, we are expecting to store at least around ~

                100m users x 10,000 objects = 1 Tera objects

                 

                MetaData = 1 Tera object x 255 bytes  = 255 Tera Bytes  (*Assuming each average metadata entry is 255 bytes)

                Data        = 1 Tera object x 500K bytes = 500 Peta Bytes (*Assuming each average data is 500k bytes, also aware that not all object got data)

                 

                Ok. Let talk about MetaData first, while it is quite easy for 20 Nodes or less to store 255 Tera Bytes in Hard Disk but storing all together into RAM will need at least 1326 Nodes (with each Nodes board having 192GB RAM) and require quite huge cost.

                 

                     So, we expect to store  1% of Hot Metadata Item  into RAM and evict 99% (Most likely we will use passivation=false since we wish to persist all metadata set at Disk level). At the end of the day, both MetaData and Data got to be as "Distributed Cache" in a Infiniband networked infinispan clusters.

                 

                     As for the configuration shown above, as it is only a "Test Config.xml", I set maxEntries to 5 just for testing out whether the GridFS (metadata part) can make use of a persisted cache . You are right that in real production, the real file data and the MaxEntries will be much larger. Hope it it now clear a bit.

                 

                     I just want to know if Infinispan GridFS is persistable using disk backed cache such as "replicated" cache shown in the above config. From my previous experiment shown above, it seem can not but perhaps the problem is with the config and that's why I am asking in this post. Of course, I can try to write a custom GridFS to achieve this kind of usage scenario but just wanna avoid reinventing any wheel that already there with Infinispan.

                 

                     I don't think that Infinispan only fit for small website having Mega or Giga bytes of RAM Cache clustering.  I believe it can go for Tera or even Peta bytes of RAM Cache clustering, do you?

                 

                Thanks.

                • 5. Re: Is Infinispan GridFileSystem persistable ?
                  Vladimir Blagojevic Master

                  Danny,

                   

                  Seems like you are making a serious production deployment out of GridFS. I think you understand that we do not support GridFS as a product, it is just a proof-of-concept as of now. I think most likely you will have to look into hacking GridFS a bit to reach those scalability targets. But hey, we can help you along as much as we can

                   

                  As far as the concrete problem you are seeing lets start by raising maxEntries to 128 and do a bit of testing to see what happens. For replicated cache, do you want each node to store all data on disk or you want all nodes to store data at one single location? See details of configuration in http://docs.jboss.org/infinispan/4.1/apidocs/config.html#ce_default_loaders

                   

                  Expirement a bit, I am sure we'll get this one figured out.

                   

                  Cheers,

                  Vladimir

                  • 6. Re: Is Infinispan GridFileSystem persistable ?
                    Danny C Newbie

                    Hi Vladimir,

                     

                         You are right, got to go back to the Source and to hack GridFS a bit to optimize for this, should be ok . Since, not every items is stored inside Memory so I got to make sure the Query API can can work on GridFS as  smooth as possible. For full text search, luckily the infinispan's integration with lucene directory is already there.

                     

                         For Replicated cache, since distributed Filesystem mount point is being used, so, most likely I will set "shared=true" in the loaders configuration.

                     

                         In future, when memory price go down on par with Disk and together with highspeed I/O,  we only need RAM chip and Infinispan clustering, disk will become archive medium like tape. So, to the infinispan team, keep up the good work.

                     

                        Thanks.