13 Replies Latest reply on Jul 8, 2010 10:40 AM by clebert.suconic

Issues with duplicate message detection

aengineer Jul 6, 2010 2:51 PM

When we publish a certain number of identical messages to a JMS queue, some of those messages seem to be getting tagged as duplicate messages even tough duplicate detection has not been enabled. This in turn is completely skewing the message counts. The publisher does not publish messages with a JMS property called '_HQ_DUPL_ID'.

For example, we have a test client that will publish the same message 5000 times. After we publish all 5000 messages (the client runs to success), the mbean attribute 'MessageCount' always shows 3816. If we run a jms consumer against the same destination, we get back 5000 messages but the last (5000-3816 = 1184) messages have an additional property called "_HQ_DUPL_ID=". The publisher never set this property on the outbound message.

Based on the documentation, I am concluding that HQ is identifying the last 11184 messages as duplicate messages. And these duplicate messages do not add to the counts that are exposed via the mbeans. The magic number '3816' seems to be a function of the message size. If I publish very small (but identical) messages, then none of the 5000 messages got tagged as duplicates. If I publish larger messages, then the threshold seems to be much lower than 3816.

So my first question is why are these messages being tagged as duplicates, and how does one prevent it?

This issue is similar to another issue that had been reported earlier, http://community.jboss.org/message/540927.

Thanks
Aspi Engineer

Putnam Investments

1. Re: Issues with duplicate message detection

timfox Jul 6, 2010 3:10 PM (in response to aengineer)

If you post some code that demonstrates the issue, someone can take a look.
Actions
2. Re: Issues with duplicate message detection

aengineer Jul 6, 2010 4:34 PM (in response to timfox)
I am uploading the java class to reproduce this issue. This class will publish 5000 messages, retrieve the message count via JMX and then consume all 5000 messages.

You need to pass in a single argument, which is the location of the data file from which to load the message body. Same data file is being loaded. Application behavior does seem dependent upon the size of the message, so please use the attached data file only.

Also attached is the output from the sample code. Lines 5007 and 8828 are significant in the log.

Thanks
Aspi Engineer

DuplicateDetectionIssue-DataFile-8K.txt.zip 977 bytes

DuplicateDetectionIssue.log.zip 84.8 KB

DuplicateDetectionIssue.java.zip 3.0 KB
Actions
3. Re: Issues with duplicate message detection

timfox Jul 6, 2010 4:43 PM (in response to aengineer)

Can you post your server config too, and mention what version you are running?

Also, are you using paging, or using bridges?

The only time the duplicate id header gets added is when paging, or bridging messages from one node to another.

In your last post you mentioned that your test consumes all 5000 messages. I am confused here, I thought the issue was that not all messages are received?
Actions
4. Re: Issues with duplicate message detection

timfox Jul 6, 2010 4:49 PM (in response to timfox)

My guess here is you have set the size of the queue to some upper limit and enabled paging in your config.

This means only (say) 3000 messages are in memory and the rest are paged. This is why your message count is only 3000 since only 3000 messages are actually in the queue.

When messages get paged the duplicate id header is added, this is to prevent the same message being depaged more than once in event of crash and recovery.
Actions
5. Re: Issues with duplicate message detection

aengineer Jul 6, 2010 5:00 PM (in response to timfox)
We are running 2.1.0.CR1 (auraria, 118)

And yes I do have paging enabled:

<address-settings>
      
      <address-setting match="#">
         <dead-letter-address>jms.queue.DLQ</dead-letter-address>
         <expiry-address>jms.queue.ExpiryQueue</expiry-address>
         <redelivery-delay>0</redelivery-delay>
         
         <max-size-bytes>104857600</max-size-bytes>
         <page-size-bytes>10485760</page-size-bytes>
         <message-counter-history-day-limit>10</message-counter-history-day-limit>
         <address-full-policy>PAGE</address-full-policy>
      </address-setting>
   </address-settings>

But if this is related to paging, then I have 2 concerns with the way the system behaves.
1) The JMS server is reporting the wrong number of messages on the destination. Paging is something that the JMS server does internally to conserve memory. What the end client sees should be the total count.
2) The JMS server is adding additional JMS properties which could cause issues for consuming applications.

From my use case, the additional JMS property may be something that we can live with, but the message count has got to be accurate. Disabling paging does not seem like an option since that guarantees that at some point the server is going to run out of memory.

Thanks
Aspi Engineer

hornetq-jms.xml 3.2 KB

hornetq-configuration.xml 5.0 KB
Actions
6. Re: Issues with duplicate message detection

clebert.suconic Jul 6, 2010 5:12 PM (in response to aengineer)

Paging will page message at a state earlier before they enter the destination. Messages are paged at the address level. Paged messages are not available at the queue yet.

Instead of blocking the server from receiving messages, we cache them at the address level. As soon as messages are consumed on the address the page system will read the files and place them on the queue.

There's a JIRA requesting adding a counter on the number of messages at the address level: https://jira.jboss.org/browse/HORNETQ-31
Actions
7. Re: Issues with duplicate message detection

aengineer Jul 6, 2010 5:40 PM (in response to aengineer)

Also why did paging even occur?

<max-size-bytes>104857600</max-size-bytes>
which is 100 meg.

Each messag contains the contents from a file that is 8219 bytes in size.
==> 8219 * 5000 = 39 meg.

Even after factoring the messaging overhead, paging should not have occurred.
Actions
8. Re: Issues with duplicate message detection

timfox Jul 7, 2010 12:01 PM (in response to aengineer)

The file contains 8291 *characters*, you're using TextMessage which the body is represented by a string. A character in Java is 16 bits, so that's over 16500 bytes.

If you do the maths and take the overhead into account that's around 100 MiB.

Memory is cheap these days. Don't limit yourself to 100 MiB unless you really need to. We have people here running HornetQ with 50 GiB Heaps!
Actions
9. Re: Issues with duplicate message detection

aengineer Jul 7, 2010 1:41 PM (in response to timfox)

That is a steep overhead.

Assuming that max-size-bytes is 100Meg, and since paging consistently starts at 3816 messages, that would mean that each message is using up (100*1024*1024/3816) = 27478 bytes in memory

Each message is made up of 8291 characters, which equals (8291*2) = 16,582 bytes.

So are we saying that the messaging system adds an overhead of 27,478 - 16,582 = 10,896 bytes per message?

Unfortunately, even though memory is cheap, getting management to just buy more memory is a "challenge" to put it nicely. The first question that you get asked is 'why?' and then the numbers have to seem reasonable. I choose 100 Meg just as a random number. I am running HQ itself with a max heap size of 1.5 Gig. So it does not make sense to set the max-size-bytes to any value greater than 1.5G. Maybe 1 G might be more reasonable. We just dont have the hardware resources to run HQ in production with a heap greater than 4 G or maybe 6 G.

Lastly, would you have a recommendation/rule-of-thumb on what percentage of the heap should the max-size-bytes be set to?

Thanks
Aspi
Actions
10. Re: Issues with duplicate message detection

clebert.suconic Jul 8, 2010 10:42 AM (in response to aengineer)

Paging will use how much memory the message is taking into account.

When we send a message to the server, we don't send send the destination at the message. Then we need to re-encode the message with the address before storing it.

We are currently using Self-expanding buffers at the server, and at the point the destination is re-encoded (during the journal.store operation), the Server is adding another 10K to messageBuffer size.

so, each message is taking 27K of memory. (Even though the buffer is not used).

I've created this JIRA to fix this issue: https://jira.jboss.org/browse/HORNETQ-436

What would optimize memory usage for this scenario.

thanks for the report.
Actions
11. Re: Issues with duplicate message detection

clebert.suconic Jul 7, 2010 6:45 PM (in response to clebert.suconic)

As a note for other developers.. I'm talking about this part of the code

ServerSessionImpl::send...

      if (address == null)
      {
         if (message.isDurable())
         {
            // We need to force a re-encode when the message gets persisted or when it gets reloaded
            // it will have no address
             message.setAddress(defaultAddress);
         }
         else
         {
            // We don't want to force a re-encode when the message gets sent to the consumer
            message.setAddressTransient(defaultAddress);
         }
      }

IMO we can optimize that and avoid re-encoding
Actions
12. Re: Issues with duplicate message detection

timfox Jul 8, 2010 3:48 AM (in response to clebert.suconic)

Not sure I'm with you, are you saying the destination name is 10K? That's very long.
Actions
13. Re: Issues with duplicate message detection

clebert.suconic Jul 8, 2010 10:40 AM (in response to timfox)

No... The buffer will add 10K when it expanded itself (almost doubling its size). the message will have a buffer with 24K bytes while only 12K are used (write position versus read position). The messages are consuming more memory than necessary.
Actions

Go to original post