3 Replies Latest reply on Sep 10, 2010 11:59 AM by Ronny Schuetz Branched from an earlier discussion.

    Out of Ordering when paging

    Ronny Schuetz Newbie

      Hi Clebert,


      I added some optimizations to the test bridge I posted in my other error report concerning the OOM issue yesterday and encountered some strange message-out-of-sequence conditions over night I'd like to report:


      The test setup is the following: The HornetQ versions I tested with were Branch2_1/r9608 as well as Branch2_1/r9653. A single-threaded sender is sending to a single queue (HornetQ address jms.queue.msgduplicate_divert_inbound) that has two exclusive diverts (jms.queue.msgduplicate_divert_outbound_1/jms.queue.msgduplicate_divert_outbound_2) attached. Two receivers, running in parallel, are consuming messages from the *divert_outbound_* queues (one on each queue). The major differences to the tests before are that the maximum sizes of the *divert_outbound_* queues are a bit higher than before (configuration file is attached) and that the batch size of the receivers is time based (i.e. they receive as many messages as possible for a configured time and then commit) while the sender still uses a count based batch size. And I'm not restarting HornetQ anymore all over the time


      I extended the multi-threaded test client used before in order to reproduce the issue. The client + config files are attached as usual. The command line parameters used were


      jndi_hq_61099_msgduplicate_divert_inbound.properties \
      jndi_hq_61099_msgduplicate_divert_outbound_1.properties,jndi_hq_61099_msgduplicate_divert_outbound_2.properties \
      stopfile 200 25000 0 5000 COUNT 5000 TIME 10000 true


      The issue I see is swapped blocks (excerpt):


      WARN  [2010-09-08 11:26:49,844] R0: Received message # 849,393, expected 847,789.
      WARN  [2010-09-08 11:26:56,758] R0: Received message # 847,789, expected 850,000.
      WARN  [2010-09-08 11:26:56,805] R0: Received message # 850,000, expected 849,393.


      WARN  [2010-09-08 11:35:44,312] R1: Received message # 2,254,054, expected 2,253,814.
      WARN  [2010-09-08 11:35:51,393] R1: Received message # 2,253,814, expected 2,255,000.
      WARN  [2010-09-08 11:35:51,396] R1: Received message # 2,255,000, expected 2,254,054.


      WARN  [2010-09-08 11:52:52,079] R0: Received message # 4,934,300, expected 4,932,568.
      WARN  [2010-09-08 11:52:58,910] R0: Received message # 4,932,568, expected 4,935,000.
      WARN  [2010-09-08 11:52:58,928] R0: Received message # 4,935,000, expected 4,934,300.


      WARN  [2010-09-08 11:59:44,190] R1: Received message # 6,008,469, expected 6,006,710.
      WARN  [2010-09-08 11:59:52,080] R1: Received message # 6,006,710, expected 6,010,000.
      WARN  [2010-09-08 11:59:52,104] R1: Received message # 6,010,000, expected 6,008,469.


      The first out-of-sequence conditions appear on my system usually within 1-2h. There are no errors/exceptions in the HornetQ log nor in the client log.


      It might be relevant that the sequence errors do not appear on both *divert_outbound_* queues at the same messages. One receiver might get some messages out of sequence while the message sequence is fine for the same messages for the other receiver.


      Additionally, it looks like the swapped blocks are somehow caused on the writing side as some of the start/end-IDs of the swapped blocks are multiples of the sending batch size (5000 for the log excerpt above).


      Best regards,

        • 2. Re: Out of Ordering when paging
          Ronny Schuetz Newbie

          Hi Clebert,


          as already mentioned, I still see this issue with Branch2_1/r9661. I had a MultiThreadedLoadTest running tonight with the parameters


          <src properties file> <tgt properties files> stopfile 200 25000 0 2000 COUNT 5000 TIME 10000 true


          The good thing is, the issue doesn't happen that often anymore, I got it once within 15h and only for one consumer:


          WARN  [2010-09-09 23:18:03,266] R0: Received message # 84,052,036, expected 84,050,001.
          WARN  [2010-09-09 23:18:03,338] R0: Received message # 84,050,001, expected 84,055,000.
          WARN  [2010-09-09 23:18:03,755] R0: Received message # 84,055,000, expected 84,052,036.


          However, one point I noticed is the message IDs again at which the blocks are swapped. This is an excerpt of the log file of the client where I detected the issue after the fix again:


          WARN  [2010-09-09 16:10:02,968] B-q1,q2,q3: Received message # 14,527,760, expected 14,525,001.
          WARN  [2010-09-09 16:10:03,432] B-q1,q2,q3: Received message # 14,525,001, expected 14,530,000.
          WARN  [2010-09-09 16:10:03,888] B-q1,q2,q3: Received message # 14,530,000, expected 14,527,760.


          The bold message IDs above are always a multiple of the producer batch size + 1 now. Maybe there is a meaning behind that, but it might be a coincidence as well as I just have these two cases.


          Hope that helps somehow,

          Best regards,


          • 3. Re: Out of Ordering when paging
            Ronny Schuetz Newbie

            Hi Clebert,


            got one more:


            WARN  [2010-09-10 17:15:17,851] B-q4,q5,q6: Received message # 8,867,996, expected 8,865,000.
            WARN  [2010-09-10 17:15:18,920] B-q4,q5,q6: Received message # 8,865,000, expected 8,870,000.

            WARN  [2010-09-10 17:15:19,695] B-q4,q5,q6: Received message # 8,870,000, expected 8,867,996.


            Not producer batch size + 1, but directly at the producer batch boundaries. No idea if this leads to the issue, but this pattern is noticeable.


            Best regards & Have a nice weekend,