-
15. Re: Cluster connection on failover
jnelas Aug 3, 2011 10:56 AM (in response to ataylor)Via JMX I can see that the queue "sf.cluster-connection......" has a MessageCount of 6000+
Is there a better way to see if they are in the store and forward queue?
Also by JMX I can see that the cluster connection is in state = "Started"
And when I trigger more messages I can see them beeing treated on both servers.
I'm using version 2.2.5.Final (HQ_2_2_5_FINAL_AS7, 121)
-
16. Re: Cluster connection on failover
ataylor Aug 3, 2011 10:59 AM (in response to jnelas)ok, i thought you were only sending 1000 messages, anyway, if u look at the second server on restart does the queue have any messages in
-
17. Re: Cluster connection on failover
jnelas Aug 3, 2011 11:09 AM (in response to ataylor)Andy Taylor wrote:
ok, i thought you were only sending 1000 messages, anyway, if u look at the second server on restart does the queue have any messages in
That was Hugo, he is trying to replicate the problem on a smaller configuration. I'm seeing this on a production server.
On restart the second server has no messages on the sf.cluster-connection..... queue.
-
18. Re: Cluster connection on failover
ataylor Aug 3, 2011 11:11 AM (in response to jnelas)no not in the cluster queue in the actual destination queue for messages
-
19. Re: Cluster connection on failover
jnelas Aug 3, 2011 11:16 AM (in response to ataylor)That's also empty.
As is the same queue on the other server.
-
20. Re: Cluster connection on failover
ataylor Aug 3, 2011 11:17 AM (in response to jnelas)can you do a netstat to see if there is a physical network connection between the 2 servers
-
21. Re: Cluster connection on failover
jnelas Aug 3, 2011 11:27 AM (in response to ataylor)I see two connections from server1:5445 to server2 in different high ports
I see some more connections on server2:
TCP server2:49194->10.94.94.213:50655 (ESTABLISHED)
TCP server2:49194->10.94.94.213:50681 (ESTABLISHED)
TCP server2:49194->10.94.94.213:49845 (ESTABLISHED)
TCP server2:49194->10.94.94.213:50508 (ESTABLISHED)
TCP server2:5445 (LISTEN)
TCP server2:49197->server1:5445 (ESTABLISHED)
TCP server2:49198->server1:5445 (ESTABLISHED)
-
22. Re: Cluster connection on failover
ataylor Aug 3, 2011 11:29 AM (in response to jnelas)and the messages are definately stuck in the cluster queue on server 1
-
23. Re: Cluster connection on failover
jnelas Aug 3, 2011 11:31 AM (in response to ataylor)They seem to be, yes.
And this is something we've seen more times, it happens every 2-3 days.
-
24. Re: Cluster connection on failover
ataylor Aug 3, 2011 11:35 AM (in response to jnelas)is this something we can easily recreate do u think, could u provide a reproducable?
just out of curiosity, whats the delivery count on the queue
-
25. Re: Cluster connection on failover
jnelas Aug 3, 2011 11:39 AM (in response to ataylor)Do you mean the Delivering count?
It's 0 all the queues except the sf.cluster-connection on server1 where it's 3.
By the way, server1 generates all the messages.
-
26. Re: Cluster connection on failover
ataylor Aug 3, 2011 11:42 AM (in response to jnelas)Do you mean the Delivering count?
It's 0 all the queues except the sf.cluster-connection on server1 where it's 3.
that implies that some messages have been delivered but not acked. are all ur messages durable, and how are you killing the second server
-
27. Re: Cluster connection on failover
jnelas Aug 3, 2011 11:52 AM (in response to ataylor)All the messages are durable.
I'm not killing the second server, that was Hugo trying to replicate the problem.
On the production environment, it happears that there's a temporary loss of connection between the servers that causes this.
-
28. Re: Cluster connection on failover
jnelas Aug 3, 2011 11:57 AM (in response to ataylor)is this something we can easily recreate do u think, could u provide a reproducable?
I don't believe it's easy to recreate, we're only seeing it in production after 2-3 days of normal usage (50k+ msgs delivered on the cluster queue).
-
29. Re: Cluster connection on failover
ataylor Aug 3, 2011 11:57 AM (in response to jnelas)ok, if u can give me a full run down of your configuration/topology/clients etc i will try and recreate this.
If you think of anything further let me know