4 Replies Latest reply on Oct 14, 2014 1:55 PM by John Sanda

    Multi-tenancy

    John Sanda Apprentice

      In another forum discussion, I posted a link to a design doc that includes some discussion on multi-tenancy. A question was raised about whether it might make sense to have separate tables per tenant versus a single tenants table where each table has a tenant_id column. With the latter approach, every request will have implicitly filter on the tenant id. With Cassandra the former would most easily be implemented with a separate keyspace per tenant. This is a suboptimal approach with Cassandra. First, there is some on-heap memory overhead for every table. As the number of tenants/tables increase, that overhead can become substantial. Secondly, we would need to maintain a set of prepared statements for each tenant. That could become costly too. The shared table approach is more common in Cassandra. In fact,  some libraries like Hector offer "virtual keyspace" support. The Hector docs also explains the motivations the shared table approach.

        • 1. Re: Multi-tenancy
          Thomas Heute Master

          Ok thanks for the info, if Cassandra scale with extremely large tables then fine.

           

          I guess that in this case we can't use the Cassandra access control ability. It may not be a problem but I would like to know what we gain/lose in the 2 approaches.

          • 2. Re: Multi-tenancy
            Thomas Heute Master

            I guess a single table has the drawback that data will more likely span multiple storage nodes while we may be able to segregate better the data of 1 tenant on 1 node if we use the other approach

            • 3. Re: Multi-tenancy
              John Sanda Apprentice

              Thomas Heute wrote:

               

              Ok thanks for the info, if Cassandra scale with extremely large tables then fine.

               

              I guess that in this case we can't use the Cassandra access control ability. It may not be a problem but I would like to know what we gain/lose in the 2 approaches.

              It is my understanding that Cassandra does scale well with very large tables.

               

              Hmm...that's a good point/question about Cassandra's access control. I will look into this to see what options we might have. The authentication/authorization components are pluggable, so if nothing else maybe we explore the possibility of implementing our own.

              • 4. Re: Multi-tenancy
                John Sanda Apprentice

                Thomas Heute wrote:

                 

                I guess a single table has the drawback that data will more likely span multiple storage nodes while we may be able to segregate better the data of 1 tenant on 1 node if we use the other approach

                This is not the case with Cassandra. Whether there is a single table or separate tables per tenant, the data will span multiple nodes. What you are describing might be more easily achievable in systems that do sharding like HBase, MongoDB, and even InfluxDB; however, sharding has its drawbacks as a strategy for partitioning data.