11 Replies Latest reply on Sep 7, 2015 2:22 AM by werrenmi

Wrong routing result

werrenmi Aug 3, 2015 11:07 AM

Hello

First off all i have to say we run some of the HornetQ clients in an OSGi environment. The HornetQ server is a common JBoss AS 8.1.Final. Now in the scenario we restart a client then when he comes up again, from time to time messages will be routed wrong they comes from producers at the OSGi clients. Debugging has shown that the producer seems to be correct configured, but on the server the HornetQ trace logs says "Message after routed=ServerMessage...". The message will be routed to the wrong address, and therefore the wrong consumer invoked.

Has anyone the same behavior or a suggestion?

Thanks in advance.

Regards

Michel

1. Re: Wrong routing result

jbertram Aug 3, 2015 3:58 PM (in response to werrenmi)

I haven't seen that behavior before. What version of HornetQ are you using? Also, are you using Wildfly 8.1.Final? I ask because there's really no such thing as JBoss AS 8.1.Final.

Finally, do you have a test-case that reproduces the problem?
Actions
2. Re: Wrong routing result

werrenmi Aug 3, 2015 4:52 PM (in response to jbertram)

Sorry for the misspelling yes we use Wildfly 8.1.Final.

Unfortunately there is no test-case perse, but it's reproducible (most times) when the client goes down like unexpected without unsubscription. After the client becomes connected again, then this behavior may occur. The solution is to restart the server also. We have defined the connection ttl to 1 hour.
The confusing thing is that also a producer exists on this client in which address the failed routing results. So in the first moment it looked like that just the wrong producer was used.
Actions
3. Re: Wrong routing result

clebert.suconic Aug 3, 2015 7:38 PM (in response to werrenmi)

TBH I didn't even understand what is the issue.

with a git grep on "Message after routed" I see that this happens every time a message is routed (It's a trace message, so not an issue whatosever).

A common issue with consumers, since you're running with OSGI (maybe you have two instances of the consumer, or maybe a consumer leak) is to have two consumers and getting only half of the messages, on which case just close your consumer properly. I'm not saying this is the issue since I have to data here, but just giving you a hint on what could be the issue.

We can look at it If you provide us some more data.

Just a parentheses: (Since you are an OSGI user we are in the process of designing OSGI on ActiveMQ Artemis (which is now the upstream for Hornetq... hornetq being the old / legacy version). maybe you could be part of the initial help on at least help us understand what users need... ARTEMIS-93)
1 of 1 people found this helpful
Actions
4. Re: Wrong routing result

jbertram Aug 3, 2015 8:33 PM (in response to werrenmi)

I'm not sure I really understand the problem, either. Can you clarify the use-case a bit more? What kind of clients do you have (e.g. producers or consumers) and how many of each type? How should they be interacting and what are you actually observing?

Lastly, a connection TTL of 1 hour seems a bit high to me. That means that the subscription for any non-durable subscriber that disconnects without explicitly unsubscribing will still be valid and collecting messages for up to 1 hour after the client has disconnected. This could cause performance problems on the server as messages accumulate in defunct subscriptions without any valid consumers. Can you elaborate on why you set the connection TTL so high?
Actions
5. Re: Wrong routing result

werrenmi Aug 4, 2015 4:44 AM (in response to werrenmi)

Hello together

First i have to say we use the core API and the HornetQ version is 2.4.1.Final.

We have amongst others two addresses / queues: address.a (queue.a), address.b (queue.b). The consumer for queue.a is deployed on one Wildfly 8.1.Final (also HornetQ server) and the consumer for queue.b on another Wildfly 8.1.Final. There are just one queue (Consumer) per address.

The producers for both addresses are deployed one time on multiple Apache Karaf 2.3.5 instances (One producer for each address on each Karaf). Each producer is deployed in a separate bundle. The HornetQ client session is registered as an OSGi service (Blueprint singelton) and so shared between both producers. The session and producers have a threadsafe design, so no access to the session or message will be send concurrent. The Karaf instances are running on ARMv5 embedded machines.

The issue is now, when a Karaf goes down unexpectelly and comes up again. Than it can occur in our case, that messages send by the producer which is registered to address.a are routed and delivered to the consumer for queue.b. The messages they are send by the producer on address.b are routed and delivered correct to the consumer for queue.b.
When we restart the Wildlfy that act as HornetQ server, then it would work again as expect.

We have a so high ttl because this embedded machines (Karaf clients) are running dedicated customer LAN's. So the risk for longer connection lost's is relatively high.

I will make more investigations about this next weekend. Maybe after that more informations are available.

Also important is, that we not have any additional configurations like diverts and all queues are durable.

<subsystem xmlns="urn:jboss:domain:messaging:2.0">
   <hornetq-server>
   <bindings-directory path="${hornetq.bindings.directory}"/>
   <journal-directory path="${hornetq.journal.directory}"/>
   <paging-directory path="${hornetq.paging.directory}"/>
   <large-messages-directory path="${hornetq.large-messages.directory}"/>
   <persistence-enabled>true</persistence-enabled>
   <security-domain>other</security-domain>
   <security-enabled>false</security-enabled>
   <async-connection-execution-enabled>true</async-connection-execution-enabled>
   <journal-type>ASYNCIO</journal-type>
   <journal-file-size>102400</journal-file-size>
   <journal-min-files>2</journal-min-files>
   <persist-id-cache>true</persist-id-cache>
   <message-expiry-scan-period>10000</message-expiry-scan-period>

   <core-queues>
...

   </core-queues>

   <connectors>
   <netty-connector name="netty" socket-binding="messaging">
   <param key="use-nio" value="true"/>
   </netty-connector>
   <netty-connector name="netty-ssl" socket-binding="messaging-ssl">
   <param key="use-nio" value="true"/>
   <param key="ssl-enabled" value="true"/>
   </netty-connector>
   <netty-connector name="netty-throughput" socket-binding="messaging-throughput">
   <param key="batch-delay" value="50"/>
   </netty-connector>
   <in-vm-connector name="in-vm" server-id="0"/>
   </connectors>

   <acceptors>
   <netty-acceptor name="netty" socket-binding="messaging">
   <param key="use-nio" value="true"/>
   </netty-acceptor>
   <netty-acceptor name="netty-throughput" socket-binding="messaging-throughput">
   <param key="batch-delay" value="50"/>
   <param key="direct-deliver" value="false"/>
   </netty-acceptor>
   <netty-acceptor socket-binding="messaging-ssl" name="netty-ssl">
   <param key="use-nio" value="true"/>
   <param key="ssl-enabled" value="true"/>
   <param key="key-store-path" value="${jboss.home.dir}/hornetq-keystore.jks"/>
   <param key="key-store-password" value="${VAULT::msg-keystore::password::0}"/>
   <param key="trust-store-path" value="${jboss.home.dir}/hornetq-dev-truststore.jks"/>
   <param key="trust-store-password" value=""${VAULT::msg-truststore::password::0}""/>
   <param key="need-client-auth" value="true"/>
   </netty-acceptor>
   <in-vm-acceptor name="in-vm" server-id="0"/>
   </acceptors>

   <security-settings>
   ...
   </security-settings>

   <address-settings>
   <address-setting match="#">
   <dead-letter-address>jms.queue.DLQ</dead-letter-address>
   <expiry-address>jms.queue.ExpiryQueue</expiry-address>
   <redelivery-delay>0</redelivery-delay>
   <max-size-bytes>10485760</max-size-bytes>
   <page-size-bytes>7864320</page-size-bytes>
   <page-max-cache-size>3</page-max-cache-size>
   <address-full-policy>PAGE</address-full-policy>
   <message-counter-history-day-limit>10</message-counter-history-day-limit>
   </address-setting>
   </address-settings>
   </hornetq-server>
</subsystem>

On client the following addtional configurations are defined:

- blockOnAcknowledge = true
- blockOnDurableSend = false
- blockOnNonDurableSend = false
- preAck = false

Regards
Michel
Actions
6. Re: Wrong routing result

jbertram Aug 4, 2015 10:58 AM (in response to werrenmi)

I recommend you change your producers so that they don't share a session at all. I realize you said the session is shared in a thread-safe manner, but my hunch is that sharing the session is still causing this problem.
Actions
7. Re: Wrong routing result

jbertram Aug 4, 2015 11:00 AM (in response to jbertram)

For what it's worth, sharing a session is never recommended because they weren't designed to be shared. Giving your application the ability to share sessions is just extra work and complexity that is simply unnecessary (and could potentially hurt performance significantly) . Just share the connection object and create non-shared sessions from that.
1 of 1 people found this helpful
Actions
8. Re: Wrong routing result

werrenmi Aug 4, 2015 3:18 PM (in response to jbertram)

Thanks Justin for this hint!

I will make it so. For clearness ... The best way is to obtain a session for each producer? ... and therefore also for consumers? I ask this because that may result in x000 of sessions in our case.

Regards
Michel
Actions
9. Re: Wrong routing result

werrenmi Aug 4, 2015 3:27 PM (in response to clebert.suconic)

Clebert ... thanks for this information about the OSGi integration. As we have no further issues atm with HornetQ in OSGi im agree with your comment on this issue about the uber jar. When i see other possible improvements, i will let you know them.
Actions
10. Re: Wrong routing result

jbertram Aug 4, 2015 5:54 PM (in response to werrenmi)

The best way is to obtain a session for each producer? ... and therefore also for consumers?

Yes.

I ask this because that may result in x000 of sessions in our case.

I don't see a problem with that at this point.
Actions
11. Re: Wrong routing result

werrenmi Sep 7, 2015 2:22 AM (in response to jbertram)

Hello Justin

The refactoring has successful fixed the entire issue as i can say until now.

Thanks a lot!

Regards
Michel
Actions

Go to original post