I've run into a similar issue with using picket-link as my idp. What is probably happing is that you have two instances of your idp, but the authentication keys are not replicated to both instances and when a box goes down the app session is replicated and when the session tries to verify the users identity it can't find the user, bc the idp's are not clustered. We found that we had to either have an idp on a separate box or if on same box as eap server they need to be clustered.
Also, you're seeing the case where your app is choosing an idp (whether on server a or b) and whenever you shut down the other server your instance remains. From what I recall from our design, we had httpd in front of all of our eap servers, so the idp was chosen at random, which would display the phenomon you describe above; when they are not clustered.
any ways hope that helps,
thanks for your reply, but you have written
authentication keys are not replicated to both instances and when a box goes down the app session is replicated and when the session tries to verify the users identity it can't find the user, bc the idp's are not clustered.
But I've disabled session replication, due to some issues caused by the framework I'm using that is putting non serializable object in session -.-, for this reason I've enabled sticky sessions, and forced them, so I'm expecting that, if a session has been started on server 1, it will continue on the same server even if the web-application context is switched from APP A to APP B...but it seems that this is not working as expected...
it is possible your node is failing which would route the client to a different node in the cluster. try setting the Sticky Session Force=true and test. you should get an error when you switch contexts (i.e. if the node fails).
1 of 1 people found this helpful
Switching in production environrment, we have changed load balancing system from software (mod_cluster) to hardware.
Now I'm not experimenting this problem anymore, so I guess that there are some troubles with apache mod_cluster module, which is not actually keeping the stickiness of a specific session.
Adding "worker-timeout=15" and "node-timeout =15" as attributes of mod-cluster-config tag, resolved the issue for me.
It was a worker-timeout problem, using default (-1) value, tells mod-cluster not to consider a specific node when an error happens on it, without waiting for the node to recover properly.
So, because I don't have session replication, my request was correctly forwarded to the other server, causing the orginal session drop.