Feature Request: cross-process rendezvous
ryanhos Aug 26, 2011 2:44 PMI wanted to vet my feature request here before I wasted someone's time with a JIRA.
I'm still loving Byteman and evangelizing about it to everyone I know who has tough testing problems. I'm even going to demo it to my local JUG. One feature that I've been wanting for a while is cross-process rendezvous. My scenario is this: A servlet filter deployed on multiple jbossweb instances was using a "check then act" pattern to handle initialization of a database record based on the authenticated user (which could not be pre-initialized). The bug appears when two JVMs both clear the check condition and then race to perform the act, which results in one JVM failing. Needless to say, reproducing this by chance is problematic. We have a workable solution, but we cannot prove that it works in an automated manner. If I had cross-process rendezvous, Icould easily get both threads past the initial check, and then coordinate their progress through the act step, in order to assert that the interleaved calls result in the desired output and complete success. Of course, we could set-up a unit test with two threads in a single JVM that could use byteman to do the same coordination, but if the solution relies upon concurrency controls that only work within a single JVM, the proposed solution will still fail in our 40 JVM cluster. My motivation here is that I'd like to have someone who understands our cluster concurrency issues write the acceptance test, and then be able to assign anyone on the team to the bug, no matter their level of experience.
The solution would need some easy method of communicating rendezvous arrivals and then having the coordinator (JVM that called createRendezvous()) release the threads from the blocking state. UDP multicast immediately comes to mind because it's nearly zero-conf, but the lack of guaranteed delivery may leave some threads hung forever (though a simple main() could be written to send the "end rendezvous" message to release the stuck threads from a failed test). TCP solves the guaranteed delivery problem, but there would have to be an agreed-upon configuration in order to get all processes talking together.
So what does the forum think? Is this as useful as I seem to think it is? Or am I attempting to make more out of Byteman than was ever intended?