7 Replies Latest reply on Feb 19, 2012 1:46 PM by asadouglass

    JMS TimeToLive not functioning as expected

    asadouglass

      I recently began setting a TTL (time to live) on outgoing messages ( I was using 20000 milliseconds) and mostly it ran fine. However, I noticed that if there were a clock skew between the node on which I was running the hornetq/publisher and the node on which I was running the subscriber ( namely the subscriber's clock was more than 20 seconds ahead of the publishers) then messages didn't get delivered. On the other hand if I sync'ed the clocks, then the messages came right through. This only happended if the subscribers' clock was ahead of the publisher/jms clock. (Not sure if its important but I was sending from linux to windows)

       

      My hypothesis - I haven't looked at the code - is that the publisher is somehow setting the "stale time" for the message absolutely (even thought the value itself is relative)? This seems wrong to me. Is it intentional?

       

      Example:  

       

       

      StepHQ Node TimeMsg Time To LiveMsg Stale Time
      Sub Node  Time
      Message is published to HQ Node10:46:0020 seconds (20000 ms)10:46:2010:48:00
      Message available on HQServer (padded for clarity)10:46:0120 sec10:46:2010:48:01
      Message retrieved by Subscriber Node 10:46:0220 sec 10:46:2010:48:02

       

      In this example, the message is not received/not delivered by client on Subscriber Node since that client sees the time as 10:48:02 at the time of retrieval and the "Msg Stale Time" has passed.

       

      It seems to me that getting this absolutely correct from the time the "publish" call was made to the exact time every subscribers "receive" call was made would need to involve a complex scheme of interrogating the clocks in a system and accounting for the deltas. However, it seems that if HQ was to interpret TimeToLive as how long the message is allowed to reside on the HQ server (as opposed to the pre-fetch and post-send buffers local to the senders and receiver nodes), then this would be relatively easy to do reliably and would accomplish the desired effect of removing old undelivered messages which were clogging up the server (instead of PAGING or BLOCKING them, or using up tons of memory). There would be some complexity for clustered instantiations, but it seems that there may already be some mechanism for synchronizing the clocks across multiple servers?

       

      Any thoughts?