[Introduction]

 

After almost one year using HornetQ for our products, we did get many benefits from this brand new messaging server. But we also met serveral issues, these issues would represent the designer of this product had no very clear sense of the variant of any prouduct env . Here we will go through them each one by one.

 

[Jonney Walker]

 

  • HornetQ didn't provide any time control mechanism.

         All of us will know machine time is not controllable, especially when multiple machines want to working together. You may meet this kinds of situation: you have three machines, one machine time is 19:40:33 (HH:MM:SS), second machine time is 19:41:02 (HH:MM:SS), third machine time is 19:39:22(HH:MM:SS). And second machine will hold HornetQ server, let's saying if third machine says " i send one message to HornetQ server, and I want keep my message will be expired after 30 minutes". This feature has been implemented by HornetQ in working stage but not functional stage, because if you go through HornetQ source code you will find it use System.currentMilliSeconds() to manipulate the time elapsing. So the problem you will find out ? Yeah, it may happen in two issue: 1. Message not survive according to time expiration setting. 2. Message will exist longer than time expiration setting. Why? Because you should aware: 1. TimeZone setting may different from the three machines. 2. The different time will happen at that time across the three machines. Side effect for that kinds of issue may be : 1. Customer will complain the message was disposed too faster. 2. The message queue will queue up very quickly. Maybe you will say that acutally should not be HornetQ's responsiblity, yeah maybe, but I strongly believe at least HornetQ should provide the interface allow customer plug-in any time control mechanism.

        More worse situation may happen in this kind situation: everyone should familar with JMS specification, we all know the consumer features. One feature of consumer is getMessage(long timeOut). That means wait specific time if cannot get message from message server and then retry. It's very useful for long-live message consuming pattern. Issue occur again, here please revie one piece of code:

long start = -1;

long toWait = timeout;

try

{

   while (true)

   {

      ... ...

         while ((stopped || (m = buffer.poll()) == null) && !closed && toWait > 0)

         {

            if (start == -1)

            {

               start = System.currentTimeMillis();

            }

            ... ...

               wait(toWait);

            ... ...

            long now = System.currentTimeMillis();

            toWait -= now - start;

            start = now;

         }

      }

Do you find issue? No, let's saying if your machine timezone is PST and right now will go through daylight saving. And you will find you cannot get message and consumer.getMessage method will not return. Because this piece of code if not safe, it didn't add any check between toWait and timeOut. So how about we add one line code like below?

 

long start = -1;

long toWait = timeout;

try

{

   while (true)

   {

      ... ...

         while ((stopped || (m = buffer.poll()) == null) && !closed && toWait > 0)

         {

 

 

            if (start == -1)

            {

               start = System.currentTimeMillis();

            }

            ... ...

               wait(toWait);

            ... ...

            long now = System.currentTimeMillis();

            toWait -= now - start;

            start = now;

            if (toWait >= timeout) toWait = 0;

         }

      }

 


  • HornetQ lack abstraction of storage layer (No recovery feature).

        Wait! You may say no to me, because HornetQ has storage layer and it works very well. And we all know HornetQ works on files rather than db and right now only support share storage (share file) mode for Master-slave setting, but how about share storage met issue need restart (since hornetQ recommend for SAN for share storage solution). You may find if share storage is restarted, HornetQ will not work any more even the share storage has come back. So that kind of working mechanism will force you will consider more complex alternatives when Hornet is crashed totally.

        So from my mind, I try go through HornetQ codes want to find the entry point could add recovery feature. But since it leverage NIO ( we didn't use AIO) lack some information assistance, I could not complete that feature.

        Good news they will provide the replication fucntions that will resolve this issue in some levels. Waiting for HornetQ 2.2.6

 

[Conclusion]

 

HornetQ team please consider these issues, handle them in the first priority rather than provide more features. And hope these experiences will help you.