3 Replies Latest reply on Sep 17, 2014 1:16 PM by jbertram

App unable to see new messages on one of the clustered HornetQ servers, even though msgs are being added...journal files increasing.

dr1985 Sep 16, 2014 12:27 PM

Environment:
Hornetq v230 on Linux x86
Running with 2 clustered HornetQ servers.

Problem:
A Developer reported that it appeared that our production HornetQ servers were losing messages.

The Developer could run a transaction which would put a single message on a queue.

Some messages that he was sending processed normally, while other messages never showed up.

jconsole showed:

DeliveryingCount = 0
Message Count = 0
Messages Added = 148195

After doing some investigation, I noticed that instead of the 10 journals that are normally out there,
we had 25 journals on one of our HornetQ servers...and the number of journals was growing.
The other HornetQ server in the cluster still showed just 10 journals.

Bouncing the tomcat servers didn't help, So, we bounced the HornetQ server with the 25 journals.

When HornetQ restarted, the jconsole Message Count immediately showed there were over 13k messages queued.

Eventually all the messages processed, but the fact that messages were being Journaled, while the
Messages Counters were not being incremented is obviousily a serious concern.

We have a QDepth monitor set up, but up 'til this point, we hadn't thought to monitor the number of
journal files in existence.

I couldn't find anything regarding this problem, but I thought I'd check if this was a known issue?

1. Re: App unable to see new messages on one of the clustered HornetQ servers, even though msgs are being added...journal files increasing.

jbertram Sep 16, 2014 3:14 PM (in response to dr1985)

That problem doesn't ring any bells. I take it this is not something you can reproduce reliably.

You might consider moving to a later release.
Actions
2. Re: App unable to see new messages on one of the clustered HornetQ servers, even though msgs are being added...journal files increasing.

dr1985 Sep 17, 2014 12:42 PM (in response to jbertram)

Thanks for the response. We've got HornetQ 2.4 on our sandbox servers with plans to move forward with it.

As for the problem, more information has come to light which I think might explain why messages were journaled, but the QDepth Counters were never incremented.

Obviously I don't know the internals of HornetQ, so I may totally off base.

anyway...

This problem actually spanned multiple (5) days!
Again we run with 2 clustered HornetQ servers (HQ1 and HQ2).

1. It looks like our problem started with a network issue last Thursday....and:
the Bridge from HQ1 to HQ2 appears to have reconnected once the network was working again, but...
the Bridge from HQ2 to HQ doesn't appear to have reconnected.

HQ1 log messages:
HQ212037: Connection failure has been detected: HQ119014: Did not receive data from....The connection will now be closed. [code=CONNECTION_TIMEDOUT]
HQ222061: Client connection failed, clearing up resources for session 4faff486-32b8-11e4-be70-4525a9df527d
HQ222107: Cleared up resources for session 4faff486-32b8-11e4-be70-4525a9df527d
HQ221027: Bridge ClusterConnectionBridge@729021a5......is connected

HQ2 log messages:
HQ212037: Connection failure has been detected: HQ119011: Did not receive data from server...
HQ222095: Connection failed with failedOver=false: HornetQException[errorType=CONNECTION_TIMEDOUT message=HQ119011: ...from HQ1
HQ212037: Connection failure has been detected: ....The connection will now be closed

The HQ2 log doesn't have the "HQ221027....is connected" message in the log.

2. Now for the app that reported the problem...it had 2 consumers on HQ1, and no consumers running on HQ2.

So, based on test observations from when an app has 2 consumers on HQ1, and no consumers on HQ2...AND when the Bridge is working,
messages going to HQ2 would automatically be forwarded from HQ2 to HQ1 (where the consumers are), and the
Message counters for the queue on HQ2 would never be incremented.

3.Basically, I'm thinking:

- HQ1 told HQ2 that HQ1 had consumers on the queue....(the HQ1 to HQ2 bridge is working)
- HQ2 had no consumers on the queue, so it tries to forward the message to HQ1...but since
   the HQ2 to HQ1 bridge is down, HQ2 just ends up queuing the message on the Bridge-cluster-connection queue.

I realize this is all just conjecture, but I'm trying to come up with a reasonable explanation for what we saw.

...if the problem were to reoccur, I would definitely check the Bridge-cluster-connection queue counters in the jconsole...that would answer this question for me.

4. Here's the cluster connection definition used by both HQ1 and HQ2. btw, we've had the network drop and re-connect before   without any issues:

HQ1 and HQ2 use an identical cluster-connections definitions: hornetq-configuration.xml




   <cluster-connections>
      <cluster-connection name="xxxxx.cluster.connection">
         <address>jms</address>
         <connector-ref>primary-connector</connector-ref>
         <retry-interval>10000</retry-interval>
         <reconnect-attempts>999</reconnect-attempts>
         <use-duplicate-detection>true</use-duplicate-detection>
         <forward-when-no-consumers>false</forward-when-no-consumers>
         <max-hops>1</max-hops>
         <confirmation-window-size>1048576</confirmation-window-size>
         <discovery-group-ref discovery-group-name="DiscoveryGroupxxxxxxxxx"/>
      </cluster-connection>
   </cluster-connections>

...any help is appreciated.
Actions
3. Re: App unable to see new messages on one of the clustered HornetQ servers, even though msgs are being added...journal files increasing.

jbertram Sep 17, 2014 1:16 PM (in response to dr1985)

That seems like a reasonable explanation. A series of thread dumps and an inspection of the store-and-forward queue (as you suggested) would help confirm.
Actions

Go to original post