-
1. Re: App unable to see new messages on one of the clustered HornetQ servers, even though msgs are being added...journal files increasing.
jbertram Sep 16, 2014 3:14 PM (in response to dr1985)That problem doesn't ring any bells. I take it this is not something you can reproduce reliably.
You might consider moving to a later release.
-
2. Re: App unable to see new messages on one of the clustered HornetQ servers, even though msgs are being added...journal files increasing.
dr1985 Sep 17, 2014 12:42 PM (in response to jbertram)Thanks for the response. We've got HornetQ 2.4 on our sandbox servers with plans to move forward with it.
As for the problem, more information has come to light which I think might explain why messages were journaled, but the QDepth Counters were never incremented.
Obviously I don't know the internals of HornetQ, so I may totally off base.
anyway...
This problem actually spanned multiple (5) days!
Again we run with 2 clustered HornetQ servers (HQ1 and HQ2).
1. It looks like our problem started with a network issue last Thursday....and:
the Bridge from HQ1 to HQ2 appears to have reconnected once the network was working again, but...
the Bridge from HQ2 to HQ doesn't appear to have reconnected.
HQ1 log messages:
HQ212037: Connection failure has been detected: HQ119014: Did not receive data from....The connection will now be closed. [code=CONNECTION_TIMEDOUT]
HQ222061: Client connection failed, clearing up resources for session 4faff486-32b8-11e4-be70-4525a9df527d
HQ222107: Cleared up resources for session 4faff486-32b8-11e4-be70-4525a9df527d
HQ221027: Bridge ClusterConnectionBridge@729021a5......is connected
HQ2 log messages:
HQ212037: Connection failure has been detected: HQ119011: Did not receive data from server...
HQ222095: Connection failed with failedOver=false: HornetQException[errorType=CONNECTION_TIMEDOUT message=HQ119011: ...from HQ1
HQ212037: Connection failure has been detected: ....The connection will now be closed
The HQ2 log doesn't have the "HQ221027....is connected" message in the log.
2. Now for the app that reported the problem...it had 2 consumers on HQ1, and no consumers running on HQ2.
So, based on test observations from when an app has 2 consumers on HQ1, and no consumers on HQ2...AND when the Bridge is working,
messages going to HQ2 would automatically be forwarded from HQ2 to HQ1 (where the consumers are), and the
Message counters for the queue on HQ2 would never be incremented.
3.Basically, I'm thinking:
- HQ1 told HQ2 that HQ1 had consumers on the queue....(the HQ1 to HQ2 bridge is working)
- HQ2 had no consumers on the queue, so it tries to forward the message to HQ1...but since
the HQ2 to HQ1 bridge is down, HQ2 just ends up queuing the message on the Bridge-cluster-connection queue.
I realize this is all just conjecture, but I'm trying to come up with a reasonable explanation for what we saw.
...if the problem were to reoccur, I would definitely check the Bridge-cluster-connection queue counters in the jconsole...that would answer this question for me.
4. Here's the cluster connection definition used by both HQ1 and HQ2. btw, we've had the network drop and re-connect before without any issues:
HQ1 and HQ2 use an identical cluster-connections definitions: hornetq-configuration.xml
<!--##########################################################-->
<!--#################### Cluster Connections #################-->
<!--##########################################################-->
<cluster-connections>
<cluster-connection name="xxxxx.cluster.connection">
<address>jms</address>
<connector-ref>primary-connector</connector-ref>
<retry-interval>10000</retry-interval>
<reconnect-attempts>999</reconnect-attempts>
<use-duplicate-detection>true</use-duplicate-detection>
<forward-when-no-consumers>false</forward-when-no-consumers>
<max-hops>1</max-hops>
<confirmation-window-size>1048576</confirmation-window-size>
<discovery-group-ref discovery-group-name="DiscoveryGroupxxxxxxxxx"/>
</cluster-connection>
</cluster-connections>
...any help is appreciated.
-
3. Re: App unable to see new messages on one of the clustered HornetQ servers, even though msgs are being added...journal files increasing.
jbertram Sep 17, 2014 1:16 PM (in response to dr1985)That seems like a reasonable explanation. A series of thread dumps and an inspection of the store-and-forward queue (as you suggested) would help confirm.