2 Replies Latest reply on Jun 21, 2016 1:15 PM by Vladimir Dzhuvinov

    Backup strategy for file-based cache stores?

    Vladimir Dzhuvinov Novice

      What would be a good backup strategy for Infinispan's single-file and soft-index file store?


      Is it possible and safe to do a hot backup on the files?


      I'm looking for a solution that would ideally permit hot backups and would not incur cluster downtime.





        • 1. Re: Backup strategy for file-based cache stores?
          Radim Vansa Master

          Generally speaking, these cache stores are not designed for hot backup. It's an implementation detail of Infinispan and if you want to backup cache contents, use cache.entrySet() or streams to iterate through all values in your application (you could apply LOCAL_MODE flag and run the code on all nodes separately). Though, this does not give you snapshot of the cache, just one of the potentially changing values between commencing and finishing the backup. Snapshot functionality is not implemented (users don't call for this too often).


          Technically, single-file store is one continually changing file, with random access. Hot backup is doomed here.


          The data in soft-index file store are written in append-only files (there are two files appended in runtime), the finished files are only read or the file is removed as whole. Index file is redundant, you shouldn't backup that one. Records in data files contain sequence ids, so the order is set. That makes hot backup somewhat possible, though a half-finished write on the end of file could cause the later load to fail (and you would need some tool to chop off the garbage on the end of file properly). Noone ever tested hot backup, though.

          • 2. Re: Backup strategy for file-based cache stores?
            Vladimir Dzhuvinov Novice

            Thanks Radim, this is hugely informative. I really like the simplicity of the file-based stores, and they offer superb performance too. Benchmarking our service with the soft-index store showed 43% greater throughput compared to our LDAP store (which we published a while ago, mentioned in another thread). We just want to find a good way to sort out backups now.


            Iterating over the entries and streaming them into JSON, perhaps on some lower priority thread, would work for us.


            I've been also considering implementing an LMDB cache store [1]. It's a memory-mapped key-value DB, where making a hot backup is simply a matter of copying the DB file (one file). LMDB could also be used to implement off-heap storage, though I don't known how well that will work out in practise. Perhaps we should give it a try


            [1] GitHub - deephacks/lmdbjni: LMDB for Java