-
1. Re: Failover & Paiging...
clebert.suconic Jan 8, 2009 10:38 PM (in response to clebert.suconic)The DuplicateIDCache will not work:
- The MessageID lookup could fail in two places:
I - on ServerConsuemrImpl::deliverReplicated, where we need to temporarily remove the messages from the Queue and place it on deliveringRefs.
II - On ServerConsuemrImpl::acknowledge... as (I) failed, as deliveringRefs will not have the reference to ack.
On (I), we can't really add anything to duplicateIDCache, as the ID is just being placed on the deliveringReferences.
I have created a Branch with my current changes:
https://svn.jboss.org/repos/messaging/branches/Branch_Failover_Page
And PagingfailoverTest will pass with this Hack on QueueImpl:// This is a temporary hack, for the temporary branch only for (int i = 0; i < 10; i++) { System.out.println("Retry " + i); Thread.sleep(100); ref = removeReferenceWithID(id, false); if (ref != null) { System.out.println("Finally found it:"); break; } }
-
2. Re: Failover & Paiging...
clebert.suconic Jan 8, 2009 10:39 PM (in response to clebert.suconic)There is an issue with DuplicateIDCache and rollback also.
Case the user decide for a rollback after the failOver, we need to make sure the data will come back to the Queue. -
3. Re: Failover & Paiging...
clebert.suconic Jan 9, 2009 1:37 AM (in response to clebert.suconic)It was just faster to act and implement something now while everybody in Europe was sleeping and talk about other options later, so we could move forward now.
It was actually relatively simple to force depage when the reference is not found, what fixed the problems I raised on my previous posts.
All of this is being controlled at ServerConsuemerImpl::deliverReplicated. Now I'm also sending the address used on delivery. If the reference is not found I will force a depage.
I don't think we would actually have an issue with OMEs or anything. Even if there are order differences between the two nodes, I don't think both page systems would be too different. We will talk about it when I wake up.
All of this is on the branch I created:
https://svn.jboss.org/repos/messaging/branches/Branch_Failover_Page -
4. Re: Failover & Paiging...
timfox Jan 10, 2009 4:10 AM (in response to clebert.suconic)What's the latest status on this?
Also, what's the status on large message replication- we haven't discussed the design on that one yet. -
5. Re: Failover & Paiging...
clebert.suconic Jan 10, 2009 12:10 PM (in response to clebert.suconic)Paging is implemented on the Branch, per my last request on this thread. Forcing a depage when the message is not found on replicateDelivery.
I'm just waiting your approval before merging it on the branch.
For LargeMessage, I'm first debugging why the credits are not being replicated between the nodes for LargeMessages, making the consumer busy. Per our last discussion we may not need to do anything on LargeMessages... just replicate credits and let is send the chunks.
After I've found the root cause of the failures I will decide if I need any design for a fix. -
6. Re: Failover & Paiging...
clebert.suconic Jan 13, 2009 9:54 PM (in response to clebert.suconic)Besides sending the AddressName on replicate delivery, and forcing a depage until a reference is found, I also had to make sure replicateDelivery wouldn't subtract sizes on the PageControl.
replicateDelivery was calling removeFromQueue, and that would call addSize(-size).
addSize is supposed to be called only on ACK, so I had done a few tweaks around the method.