Skip navigation
2012

[Introduction]

 

Some of you may wondering what's sharding? If you don't know the concept, please read the link http://en.wikipedia.org/wiki/Shard_(database_architecture). It's very useful when you want to deal large data set throughput and traditional single data cluster group (Master-Slave, Master-Master and Slave-Slave, etc) setting couldn't survive from that kinds of situaion. It's concept is very simple and you may know in advance "Divide and Conquer". It has very different characters that quite different with Partition technique (You may know that from some database providers), since Partition was database layer specific technique but Sharding could spread across all layers you could touch.

 

[Rolling in the deep - amazing voice of Adele]


For examples, You have one web online product it holds only one page, and that page need load data from one table. And imagining if you have 1 billion data persist in that table which was located in single oracle/my-sql/ms-sql server and you query againts that table in average need cost 2 seconds under load 1000 TPS. It's not bad you may said. But please remember that only is database costs for one mature product you may need consider data transformations, network transportation (don't be confused, here I mentioned correlated to from application bewteen end user), page rendering, etc. At the end customer may need pay 5 or 10 seconds for that page action each time in average. Your stakeholder want you provide solution for that issue.

 

You may come up with other solutions first, but here I could give you one very special. If one table with 1 billion datas need 2 seconds, how about we divide that table into 5 pieces and each piece holds only 200 millions datas. That may related to one math question you may think, possible would be 400 milliseconds, but the result may give you surprise << 400 milliseconds (why two below signs?). Because the database features: the primary data files become more smaller, the indexe files become more smaller, etc. so that means the I/O rounds have been significantly reduced. Okey, you may say "wow" that works for me and go ahead with that proposals with your stakeholds.

 

Come on, calm down. You need resolve many things before any action against this.

 

[Things you may know or not]

 

  • Which kinds of rules you want take for dividing the data?

       How to divide? You are idiot? (You may think). Very simiple 1 ~ 200 million, 201 million ~ 400 million, etc. Let me tell you.

        1. based on row number

        2. based on date (year, month or day)

        3. based on business data

             3.1 If your data contains user information. User identity, male/femal, user location

             3.2 If your data contains some unqie enumerations. For example, plane ticket booking system. You may divide data acoording to the plane type and airplane company.

        4. Mixed-up all of aboves.

  • How you transfer request to the specific regions acoording to data division?

       Right now challenge come (if without database help) how could transfer request correctly to the specific regions acoording to data division? Let's imagine, how could you find one phone number from yellow page book? How would the internet controller could work with domain name? So that means we need somewhere persist the division logics or mappings, so that means after we divide data and we could use the same logics or mappings find the correct data again and again. Here we call that as DNS.

  • How could we finish some aggregation jobs after data division?

        Total counts, min, max across all the datas, this feature was very useful in some kinds of situation especially customer need that. But after division, all aggregation features was gone since database not hold all the data never. You may need work out solution as you own. More worse situaion you need work out cross data division join features. That effort I thought could not be estimated.

  • All other layers except data layer will be impacted also

       Before data division you data access layer could static the db connection information since that's not changed during runtime. But after data division your data access layer need accomodate the db connection information will be changed frequently due to the data has been splitted.

       Cache layer may also need persist more information especially for the data location information

       Restful interface was impacted also you may need always transfer the key data (that help division) back and forth.

  • Data consistency you may lose in some level

        Since data was splitted, you may end up with need update data across different data regions. XA wasn't good from performance and scalability perspective, because those two factors drive you towards to data sharding. So ACID unfortunatly cannot be covered.

  • Data re-division

        Since data size will grow every day, you may end up with some data region grow unexpectedly larger. You need re-division again. And you may need go through above issues again.

  • High maintenance effort

        You may need work out tools help you manage and monitoring those data region.

 

[Black or white]

 

    You may say "come on, so crazy, I will never try". But here please know its advantages.

  • High scalability

        Since your large data region become many smaller region, that means your throught also be splitted up either. So you could provide more capacity from throughput pespective, so could reach high scalability more easier.

  • High performance

      Query against smaller data with better performance, I think that is very easy math question. With large in-memory cache, data grid or cloud data service, you may even going to get more better performance.

  • High availability

        Because you have many regions, so that means customer may also be splitted either. Less data means you could hold more capacity room in one single region, OOM, ConnectionReset, Running out of connection limition, Connection Timeout, etc you may not meet often comparing to traditional solution.

  • Smaller portion customer impact when something happen (planned or un-planned)

      Because you have many regions, so that means customer may also be splitted either. So even one portion of your regions crashed won't impact other regions, and you could provide better site-down deployment services. Like you could deploy China region at 00:00:00 of china timezone, USA region at 00:00:00 of usa timezone.

  • Recovery will be easier

       Smaller data recovery will be less effort and more quick.

 

Note: all of above benefits need your architecture provide better design in advance, otherwise you won't reach the gold and may meet more worse situation.


[Nothing is impossible except Death or Debt]

 

  • First, consider your business model. Is data consistency was very critical to customer?

       If your customer could allow the window existence for data inconsistency (well-know data latency). You could go for with this solution. Flicker, twitter, etc adpot this solutions.

  • Second, sharding as earlier as possible

      If your system architecture from very beginning without consider sharding, changed from non-sharding to sharding will be big headache. But incrementally migrating could be acceptable, but that will more testing your system design ability. More backward-compatibility will be invovled which also is very challengeable. So try you best make it happen as earlier as possible.

  • Third, sharding up-front

        Sharding up-front will be better choice, since that will reduce the complexity, reducing the coupling between many components and provide the enough flexibility. That will be disucssed in the future topics.

  • Fourth, let service fly

       Try you best using service arming every components especially hidden the location information. So if one type tuning or improvement could be spread across the same component service.

  • Fifth, single function service rather than huge monothlic funcions service

       Split the multi-functional components into multiple single functional components would help you a lot. Not only from design perspective but also from organization perpective.

  • Sixth, re-organ your teams

       Sharding need you more consider or take care of product matainence and monitoring, so you should enlarge product running teams. Like facebook, they have a speical team working for feature deployments and monitoring.

  • Seventh, keep changing

 

[Summaries]

 

Sharding need many things changing and planning quite different from traditional model. But it did provide many benefit that you cannot decline with that.

 

1. http://en.wikipedia.org/wiki/Shard_(database_architecture

2. http://highscalability.com/blog/2009/8/6/an-unorthodox-approach-to-database-design-the-coming-of-the.html

3. http://highscalability.com/scaling-twitter-making-twitter-10000-percent-faster


[Introduction]

 

After almost one year using HornetQ for our products, we did get many benefits from this brand new messaging server. But we also met serveral issues, these issues would represent the designer of this product had no very clear sense of the variant of any prouduct env . Here we will go through them each one by one.

 

[Jonney Walker]

 

  • HornetQ didn't provide any time control mechanism.

         All of us will know machine time is not controllable, especially when multiple machines want to working together. You may meet this kinds of situation: you have three machines, one machine time is 19:40:33 (HH:MM:SS), second machine time is 19:41:02 (HH:MM:SS), third machine time is 19:39:22(HH:MM:SS). And second machine will hold HornetQ server, let's saying if third machine says " i send one message to HornetQ server, and I want keep my message will be expired after 30 minutes". This feature has been implemented by HornetQ in working stage but not functional stage, because if you go through HornetQ source code you will find it use System.currentMilliSeconds() to manipulate the time elapsing. So the problem you will find out ? Yeah, it may happen in two issue: 1. Message not survive according to time expiration setting. 2. Message will exist longer than time expiration setting. Why? Because you should aware: 1. TimeZone setting may different from the three machines. 2. The different time will happen at that time across the three machines. Side effect for that kinds of issue may be : 1. Customer will complain the message was disposed too faster. 2. The message queue will queue up very quickly. Maybe you will say that acutally should not be HornetQ's responsiblity, yeah maybe, but I strongly believe at least HornetQ should provide the interface allow customer plug-in any time control mechanism.

        More worse situation may happen in this kind situation: everyone should familar with JMS specification, we all know the consumer features. One feature of consumer is getMessage(long timeOut). That means wait specific time if cannot get message from message server and then retry. It's very useful for long-live message consuming pattern. Issue occur again, here please revie one piece of code:

long start = -1;

long toWait = timeout;

try

{

   while (true)

   {

      ... ...

         while ((stopped || (m = buffer.poll()) == null) && !closed && toWait > 0)

         {

            if (start == -1)

            {

               start = System.currentTimeMillis();

            }

            ... ...

               wait(toWait);

            ... ...

            long now = System.currentTimeMillis();

            toWait -= now - start;

            start = now;

         }

      }

Do you find issue? No, let's saying if your machine timezone is PST and right now will go through daylight saving. And you will find you cannot get message and consumer.getMessage method will not return. Because this piece of code if not safe, it didn't add any check between toWait and timeOut. So how about we add one line code like below?

 

long start = -1;

long toWait = timeout;

try

{

   while (true)

   {

      ... ...

         while ((stopped || (m = buffer.poll()) == null) && !closed && toWait > 0)

         {

 

 

            if (start == -1)

            {

               start = System.currentTimeMillis();

            }

            ... ...

               wait(toWait);

            ... ...

            long now = System.currentTimeMillis();

            toWait -= now - start;

            start = now;

            if (toWait >= timeout) toWait = 0;

         }

      }

 


  • HornetQ lack abstraction of storage layer (No recovery feature).

        Wait! You may say no to me, because HornetQ has storage layer and it works very well. And we all know HornetQ works on files rather than db and right now only support share storage (share file) mode for Master-slave setting, but how about share storage met issue need restart (since hornetQ recommend for SAN for share storage solution). You may find if share storage is restarted, HornetQ will not work any more even the share storage has come back. So that kind of working mechanism will force you will consider more complex alternatives when Hornet is crashed totally.

        So from my mind, I try go through HornetQ codes want to find the entry point could add recovery feature. But since it leverage NIO ( we didn't use AIO) lack some information assistance, I could not complete that feature.

        Good news they will provide the replication fucntions that will resolve this issue in some levels. Waiting for HornetQ 2.2.6

 

[Conclusion]

 

HornetQ team please consider these issues, handle them in the first priority rather than provide more features. And hope these experiences will help you.