first and foremost: please, do update to mod_cluster 1.2.6. It's a good, stable version, much better than 1.2.1.
Regarding sticky sessions question
What does it mean that "sticky sessions are getting lost"? Do you mean they loose their stickiness or they are getting actually lost (loosing session data for client)? If the former, it is O.K., under circumstances I will explain below, if it's the latter, it might be a serious bug.
There are these settings:
- sticky-session, if it's true, balancer will try to maintain the session on the worker it originated from. If the worker is unresponsive/overloaded/down and sticky-session-force is false, balancer will route the request to a different worker.
- sticky-session-remove, should the balancer remove the stickiness to the former worker after a failover occurred (true), or should the balancer return the session to the former node when it becomes available again (false)?
- sticky-session-force, if this is set to true, there is no failover under any circumstances - session sticks to the worker it originated from no matter whether it's available or not.
Now, had there been any reason why balancer decided to address a different worker?
Apache HTTP Server log on LogLevel debug will tell you more...
thanks for the quick answer!
I just received another error logfile from the provider that is operating the apache. It says several times a day:
[Wed Oct 16 17:16:28 2013] [error] (70007)The timeout specified has expired: ajp_ilink_receive() can't receive header
[Wed Oct 16 17:16:34 2013] [error] proxy: ajp: disabled connection for (xxxxxx)
I just learned from this discussion: Worker in error state after a request timeout that when a request takes longer than nodeTimeout the node is disabled until the next STATUS message from the node saying that it is OK arrived. This should be every 30 seconds.
If the node is disabled all requests for sessions on that node are directly forwarded to another node.
Is this right so far? The other discussion is about mod_cluster 1.0.0 so I'm not sure about that.
From time to time we have those long running requests witch exceeds the node timeout. Thats not good but thats the way it is. What would be the right solution? Using a larger nodeTimeout, or is there another way to tell mod_cluster not to disable the node after such a timeout?
you need to use as large enough node-timeout.