4 Replies Latest reply on Aug 11, 2014 10:39 AM by Clebert Suconic

    Message Accumulation due to slow consumers and a proposed solution

    Howard Gao Master

      == The issue


      In a cluster when a local queue is empty, its consumers are in starvation state. If there are messages on other remote queues, those messages won't get redistributed as long as they have their local consumers, even that those consumers appears slow and messages are accumulating.


      Suppose we have a queue deployed in a 4-node cluster (nodeA, B, C and D) and a producer is sending messages to nodeA at rate 40 messages / sec.


      So each node gets 10 messages from the producer per second. Suppose the consumers on each node and their rates are as follows:


      NodeA - 2 consumers:

      consumer1A : 1 messages / sec

      consumer2A : 1 messages / sec


      NodeB - 1:

      consumer1B : 4 messages / sec


      NodeC - 2:

      consumer1C : > 5 messages / sec

      consumer2C : > 5 messages / sec


      NodeD - 1:

      consumer1D : > 10 messages /sec


      When all are up and running, the queue at nodeC and nodeD will often be empty due to their fast consumer rates.


      The queue at nodeA will have messages accumulated at rate 8 messages/sec. The queue at nodeB will have messages accumulated at rate 6 messages/sec.


      If the producer stops after 30 seconds, the messages at each node:


      nodeA : 8 * 30 = 240 messages

      nodeB : 6 * 30 = 180 messages

      nodeC and nodeD : 0 message


      To consume all those messages, node1 will take 240/2 = 120 seconds. Node 2 will take 180/4 = 45 seconds. Node3 and node4 will have been idle during this whole time.


      == The proposed solution:


      When a node has been idle for a certain time (like 2 sec, configurable), it sends a “STARVATION” notification message. In the above case nodeC and nodeD will send it.


      Nodes in the cluster receiving this notifications will trigger a 'message redistribution' as long as they have message in their queues. In the above case:


      NodeA receives the notification and triggers a 'message redistribution' on such conditions, and so does nodeB.


      == Redistribution Details


      Different from the other redistribution (redistribution-delay), this kind of redistribution applies only to nodes that sends the 'STARVATION' notification.


      In the above said scenario,


      nodeA gets notifications from nodeC and nodeD. It keeps a list of them {nodeC, nodeD}

      nodeB too gets notifications from nodeC and nodeD. {nodeC, nodeD}.


      So the messages will be redistributed among nodeC and nodeD from nodeA and nodeB.


      Not all messages need to be redistributed as the queue still has consumers (even if they're slow).


      We can decide how many of the total messages will be redistributed by a fixed ratio (e.g. 50%).

      (actual amount may be less because some messages won't get redistributed because of grouping).


      With the redistribution, the messages can be even out among nodes in a reasonable time period.