0 Replies Latest reply on Oct 28, 2013 11:56 AM by apowney

    Large-scale architecture best practices request

    apowney

      Background:

      Soon, we will have to deal with a large number of active user requests accessing a large dataset. Our RDBMS solution won’t scale that far. I’ve been looking at numerous solutions, and am satisfied Infinispan will perform a major role in our next generation solution. The rate at which we are having to scale is, frankly, overwhelming.

      We have two datacentres (at the moment), and I want to introduce rack-awareness. I like the idea of using Infinispan as a data store, perhaps using the new disk store into disk arrays inside each rack. I like the idea of the write-behind strategy, as requests for a large number of writes invariably arrive in batches anyway.

      In a nutshell, we have two kinds of data: the transactional data and the configuration data. For illustration purposes, the transactional data comprises orders, items and parcels (items being backed into parcels) – it’s a relatively simple structure, but it accounts for the vast majority of the dataset. The configuration data is highly complex, currently spread over hundreds of tables. I like the possibility of using Hibernate here, possibly Hibernate OGM. The entire configuration a single request requires currently fits in memory on a single node, but that won’t be the case in the future. I need a more distributed model, and it has to be flexible to allow for change. I’m looking to move to a more document-like structure (and version controlled), but it is proving very hard – my problem. The configuration is read thousands of times per second, but changes rarely.

      The business logic we implement is complex, but is separable – individual worker nodes can be allocated to a specific kind of task. I want to separate it as much as possible from the data.

      I’d like to introduce orchestration, based on logical partitions of data. E.g. one set of users is directed to one set of rack cabinets etc. But most importantly, I’d like our Infrastructure team to be able to simply provision a node of a particular type as and when necessary (without having to change configuration).

      That pretty much covers my requirements, so I hope it gives you enough to understand what I’m after. Now to the questions:

       

      1) Is there an architecture guide showing a HA solution in a large-scale environment? The only documents I’ve seen are very general. My thinking is there is a rack cabinet with a switch, NAS, load-balancer and two sets of servers (Infinispan data nodes and worker nodes). In front of those rack-cabinets, I’ve additional load balancers and other orchestration kit.

       

      2) Can the worker nodes be configured using RemoteCacheStore and locally cache some specific kinds of data? I’m keen on caching as much of the configuration as possible, and only hit the data nodes for transactional data wherever possible.

      I fully intend to make use of the distributed processing. Are there best/recommended practices for the deployment of new code? I need 100% availability, so taking all nodes down at once is not an option – I need a managed method.

       

      3) Can I put a load balancer between a RemoteCacheStore client and the HotRod server? VM images for the nodes are important to me, so I’d like to keep configuration consistent and manage traffic in load balancers.

      Finally, while I like Hibernate ORM and Hibernate OGM (coupled with Lucene), I’m yet to be convinced. I’m happy to sacrifice ease-of-development for performance. Does anyone have thoughts on that? Hibernate OGM & Lucene certainly make development of the configuration stuff considerably easier…

       

      I’ve read all the documentation I can find, and there seems to be several options open to me. I guess I’m looking for some validation of my thought processes, so any advice would be gratefully received!

       

      (In case it's relevant: we expect to be handling 5,000,000 reads per second and 300,000 writes per second within 3 years. Dataset expected to be around 20PB, rising 1-3PB per year. Currently on Oracle, around 50Bn rows)