This wiki provides best practices for Web Clustering, with HTTP session replication as central point of discussion
Session Stickiness and Cache Mode
When configuring HTTP session replication, choosing the right values for session stickiness and CacheMode in the JBoss Cache instance containing HTTP sessions are key to achieve application responsiveness while guaranteeing data correctness on failover. Here are the different options:
1.- Sticky Sessions Enabled and CacheMode is REPL_ASYNC - This combination of settings guarantees application responsiveness because an HTTP-session-originating server does not have to wait for replication to finish before returning control to the client, while reducing the possibility of stale data being read to failover situations. Unless there's failure, requests for a session will always land in the same node which avoids any possibility of retrieving stale data under normal conditions. Data correctness is not 100% here, but it's the best you can get without significant web application performance degradation and for that reason, we recommend these settings to our customers.
2.- Sticky Sessions Enabled and CacheMode is REPL_SYNC - This combination provides the best guarantee when it comes to data correctness. Under normal circumstances, every request linked to a session will always land on the same node, so there's no possibility of stale data. When failover occurs, having HTTP sessions replicated synchronously provides a higher guarantee that data has been replicated correctly to other nodes in comparison to asynchronous replication because it waits for replication to be finish before the application returns the control to the client. However, due to the synchronous data replication and therefore the need to wait for a response from all nodes to which data is replicated, the application would behave much slower causing a direct impact in end user's experience in comparison to the 1st option.
3.- Sticky Sessions Disabled and CacheMode is REPL_SYNC - Users often are mislead by this combination. On top of having the same performance issues as the 2nd option, it does not provide the same level of data correctness and can even lead to not being compliant with the Servlet specification:
Even with synchronous replication it's possible for the client to read a portion of the response, particularly headers, before the replication completes. This can happen because it's possible for a web app to trigger Tomcat to emit response headers before the replication processing occurs. For example, a web app can write via HttpServletRequest.getWriter() or getOutputStream(). If the web app calls flush() or close() on the PrintWriter/OutputStream, the response is considered 'committed' at which time Tomcat sends all request headers to the browser before sending the Writer/OutputStream content. If you then switch to another node, you can go to another server before the replication completes. This can lead to stale data being read and on top of that, it's contrary to the Servlet 2.4 specification which states in section 7.7.2.7: Within an application marked as distributable, all requests that are part of a session must be handled by one Java Virtual Machine1 ("JVM") at a time.. The 1st option explained does adhere to the specification because an HTTP session will be accessed either in the originating node or in another node in case of failover, but never in both at the same time.
Besides, you can't assume the browser is single threaded. It can easily make multiple simultaneous requests, for example with HTML frames or AJAX based front ends. Using REPL_SYNC and non sticky sessions, these multi threaded requests could lead to a REPL_SYNC deadlock, because each request would land on different instances.
It is because of these two reasons, and the high risk of reading stale data, that we always recommend using sticky sessions.
4.- Sticky Sessions Disabled and CacheMode is REPL_ASYNC - This is the worst of all combinations. It suffers from the same issues as explained in option 3 regarding not using sticky sessions, plus an even higher chance of retrieving stale data because of the asynchronous nature of the replication, in spite of offering an responsive web application to end users.
Referenced by:
Comments