2 Replies Latest reply on May 20, 2017 12:44 AM by Elias Ross

    Memory mapped cache store?

    Elias Ross Master

      I created a cache store that uses memory mapped files.


      The files themselves are graphic images.


      The interesting thing about it is it works fairly similarly to an "off heap" system except it doesn't require any copying to heap to return the data. Basically Infinispan keeps a reference to a ByteBuffer which is actually a MappedByteBuffer. The main restriction of course is your data types either need to be ByteBuffers or "wrap" a byte buffer.


      To get it to work, each value needs to be associated with a different file. Linux filesystems can handle thousands of files in a directory, or you can structure them in a way similar to 'git' where keys are hashed into separate directory. There are undoubtably problems, such as dealing with meta-data storage (I don't handle expiry) and concurrency but I made a simple version where there is a maximum number of such files and an LRU type cache.


      Anyway, just wanted to share this idea with you. It's a fairly simple idea but fills a pretty good number of use cases.

        • 1. Re: Memory mapped cache store?
          William Burns Expert

          Thanks Elias!


          The first thing is do you have a repo where this is available to look at?


          Unfortunately a file per entry isn't very feasible in our case. Our old file based store only did a file per segment and we ended up running into issues where we would have too many file descriptors open. It would be even worse per entry. Great care would have to be taken around this.


          Also you mention there is no concurrency? Is this something that could be solved by adding some simple java.util.concurrent.Lock instances?


          Either way we would love to see what you have done!

          • 2. Re: Memory mapped cache store?
            Elias Ross Master

            My code's in Scala first of all. Secondly, it's owned by my company. I can check how to submit it. But really it's not general purpose enough.


            Conceptually, really it's only about a day's coding to recreate. Basically you have to limit the size of the cache to the available number of file descriptors. For my use cases, that is sufficient. I actually don't need a high number of entries available simultaneously. Eviction is okay as reloading an entry is simply just remapping the same file yet again.


            I do use a striped lock based on the key and write/rename files as they are written. I suspect there's some more things to watch out for, though, if entries are modified.


            I found the off heap cache mode to be interesting, but doesn't really work for really large objects as you end up copying to heap on 'get' but it should be possible (I think) to return a byte buffer (not a byte array) that you can operate off of. For example, sending to a socket as a common use case.