I noticed the ClusteredSingleSignOnUnitTestCase was failing intermittently on the 3.2 branch build reports. I checked this out and it looks to be because the tc5-cluster-service TreeCache has been switched to REPL_ASYNC. It seems the test client is making requests against the 2nd node in the cluster before that node's tree cache has been updated.
I have a JMeter test that basically replicates the ClusteredSingleSignOnUnitTestCase. When I fire a bunch of threads at a test cluster, with REPL_SYNC I have no problems, but with REPL_ASYNC it fails fairly frequently.
At first I thought the test case scenario where the client logs in to one war and then immediately switches to another war wasn't too "real world". But this issue could also affect logouts, and the following seems fairly "real world":
war1 has a page that acts as a "home page" for a suite of web apps. It is login protected.
war2 is one of the apps. Also secured. It includes a logout JSP which invalidates the user's session and redirects the request to the root of war1. The expectation is that the sso valve will also invalidate any war1 session and the user will get a login prompt.
If the user logs out of war2, the browser redirect to war1 may hit the server before the logout is replicated across the cache, and the user will not be prompted for a logon.
I'd say switch back to *sync* repl for the unit test.
But IMO *async* repl is good enough in the real world.