-
1. Re: Why are tests commented out in LargeMessageFailoverTest
clebert.suconic Jan 21, 2010 3:27 PM (in response to timfox)Those tests are failing because of split brain.
Sending a single message is an atomic operation. When it fails the whole operation fails.
With large messages the failure could happen at the middle of the package transfer, and I experienced scenarios where part of the sessions are still on the previous node.
I would need to play with a remote server and do real failures to fix the test. I also believe we have some work to be done with spli brain on 2.1
My bad though, I should have this written on a comment ^^^
-
2. Re: Why are tests commented out in LargeMessageFailoverTest
timfox Jan 21, 2010 3:45 PM (in response to clebert.suconic)What has this got to do with network partitions?
I don't understand your explanation. Perhaps you are using the wrong terminology?
Either way, can you please explain the problem, and reason for commenting them out in more detail?
-
3. Re: Why are tests commented out in LargeMessageFailoverTest
timfox Jan 21, 2010 4:01 PM (in response to timfox)BTW, all the tests *pass* when I de-comment them... -
4. Re: Why are tests commented out in LargeMessageFailoverTest
clebert.suconic Jan 21, 2010 4:28 PM (in response to timfox)BTW, all the tests *pass* when I de-comment them...
I see them eventually failing
What has this got to do with network partitions?
I don't understand your explanation. Perhaps you are using the wrong terminology?
In one of the cases when I debuged the test I had a scenario where the Live node was sending a message for one of the consumers while the backup node was already active with other consumers. As a result I got message duplicates.
-
5. Re: Why are tests commented out in LargeMessageFailoverTest
timfox Jan 21, 2010 6:08 PM (in response to clebert.suconic)This is all far too hand wavy.
Once a connection has failed over from live to backup it should not be possible for a client to receive messages from the previously live node.
If you think that is the case, please provide evidence.
In any case, a JIRA needs to be opened to fix this. And they need to be fixed.
-
6. Re: Why are tests commented out in LargeMessageFailoverTest
clebert.suconic Jan 21, 2010 6:16 PM (in response to timfox)I will need to do some debug on this first.
The problem I remember was on the live node sending messages to the consumer while on loop to continue messages. A scenarios that wouldn't happen in production.
I did a quick test and I got a failure. I will do some debug later and provide more evidence.
-
7. Re: Why are tests commented out in LargeMessageFailoverTest
timfox Jan 22, 2010 7:15 PM (in response to clebert.suconic)Well... I decommented the tests earlier on today.
And, so far, there have been no hudson failures...
I'm still waiting for your feedback on this.
-
8. Re: Why are tests commented out in LargeMessageFailoverTest
clebert.suconic Jan 22, 2010 9:30 PM (in response to timfox)I was going to look at this next week.
I can't make it fail any more either.
I just tried at Beta3 (when I commented out the test) and it was failing.
It failed for me when you opened this thread, but it be some issue at my environment? (during your refactoring maybe?)
I can spend more time on this after I finish the AS6 integration.