3 Replies Latest reply on Aug 25, 2011 7:50 PM by sannegrinovero

    Infinispan - big table support?

    kapilnayar

      Can we store big-table like structured data in Infinispan - how does Infinispan compare to open-source big table solutions like Cassandra?

       

      Cassandra offers the CH and also an in memory configuration.

      Infinispan offers the transaction capability - which in case of Cassandra would be possible through Zookeeper.

       

      A related question is when do we use the Cassandra cache store.

      Wouldn't it be better to use Cassandra directly since it does offer the in-memory option.

       

      Any comments?

        • 1. Re: Infinispan - big table support?
          sannegrinovero

          I'm not an expert on Cassandra, but have spoken with some to try understanding this myself; this is what I got out of it.

          The main difference is transactions and consistency. Eventual consistency is optional in Infinispan; though some need non-eventual consistency and this is an option too, very nice to have the day you realize a new feature can't be build on top of eventual consistency.

           

           

          Infinispan offers the transaction capability - which in case of Cassandra would be possible through Zookeeper.

          Zookeper might help to implement some locks but that still seems far from getting you all features of a transactional data store.

           

          A related question is when do we use the Cassandra cache store.

          Wouldn't it be better to use Cassandra directly since it does offer the in-memory option.

          The short answer is that Infinispan is designed with in-memory efficient datastructures, as memory is it's main target, but can use persistent storage, so it should be a better choice for those situations needing data very quickly, loaded from memory.

          Cassandra as you say can also work in memory, but is designed to be efficient on persistent storage and I'm not sure how far the Cassandra developers are motivated to win the "in memory" race.

           

          Of course both can do well in the others area, but I think focus matters and makes a difference. The fact that Infinispan can write to a Cassandra cache store basically provides you the best of two worlds, with consistency and transactions on top of the huge data that Cassandra can store on a disk.

           

          Finally, Infinispan can be used effectively as a cache, while Cassandra having reads more expensive than writes seems a poor choice as a cache implementation.

          Having an Infinispan layer in front of a Cassandra cache store provides you this nice distributed cache, and a simple API to program against. Developers won't need much training to understand how to use a Map.

          1 of 1 people found this helpful
          • 2. Re: Infinispan - big table support?
            kapilnayar

            Looking at the APIs and reading through the documentation it seems Cassandra is a AP system with Consistency levels which can be configured optionally (ranging from eventual consistency up to strict consistency) - although changing these options would definitely inherently affect the Availability (also performance comparisons are unknown...)

            I like your statement - "Of course both can do well in the others area, but I think focus matters and makes a difference".

             

            Talking about big-table structure though - Cassandra offers multidimensional maps (super columns and super column family with flexible number of columns in each row for creation/ searching/ updates/ deletion). This seems to be a big distinguishing feature off the shelf.

             

            Now, even though Cassandra Cache store allows Infinispan Cache to persist entries in Cassandra, these are simple map structures (columns in a column family). How does Infinispan address the multidimensional map operations while storing to big-table DBs - is it something which is available in 5.0.0 or on the roadmap.

            • 3. Re: Infinispan - big table support?
              sannegrinovero
              Looking at the APIs and reading through the documentation it seems Cassandra is a AP system with Consistency levels which can be configured optionally (ranging from eventual consistency up to strict consistency) - although changing these options would definitely inherently affect the Availability (also performance comparisons are unknown...)

              I had undersoot that "strict consistency" required all writes to be basically serialized; so unless there is some magic going on that will kill scalability. So assuming this is correct - as I've only read some documentation - then this is an example of the fact it wasn't designed for consistency (other than eventual).

               

              Talking about big-table structure though - Cassandra offers multidimensional maps (super columns and super column family with flexible number of columns in each row for creation/ searching/ updates/ deletion). This seems to be a big distinguishing feature off the shelf.

              I agree, assuming the system can take some practical advantage from this organization. Infinispan has a Tree API module but this is intended mainly as a compatibility API for people who used Infinispan's predecessor, JBoss Cache. Data is not organized in a clever form to be queried efficiently taking advantage of this, it's mainly API sugar, with a small optimization about replicating only the fragment of the data that would be needed when storing a Map directly in the cache.

              Now, even though Cassandra Cache store allows Infinispan Cache to persist entries in Cassandra, these are simple map structures (columns in a column family). How does Infinispan address the multidimensional map operations while storing to big-table DBs - is it something which is available in 5.0.0 or on the roadmap.

              It's not available, but we have talked about it. The problem is about how we should take advantage of a multidimensional storage, when our data is not organized that way (other than when using the Tree module). If you have suggestions or interesting use cases they are very welcome on the developer's mailing list, or you can open feature requests directly on JIRA if you have a very clear idea of your need already.