Clustered SSO not following the semantics of lo...| JBoss.org Content Archive (Read Only)

15. Re: Clustered SSO not following the semantics of local SSO

cptnkirk Apr 6, 2006 2:31 PM (in response to cptnkirk)

Any news on 2314?

-Jim

16. Re: Clustered SSO not following the semantics of local SSO

brian.stansberry Apr 7, 2006 2:05 AM (in response to cptnkirk)

Hey Jim,

Finally got a chance to play with your wars, and things work as I expect -- but not as you want :(

The issue is that the Principal does not get propagated around the cluster; the username and password do. Two reasons for this:

1) Principal does not extend Serializable, thus you can't count on being able to replicate it.
2) The security layer requires an authentication on each server -- replicating around a Principal that is the result of an authentication on another server won't cut it.

If when a failover occurs you request one of your wars w/ a login config, the replicated username/password can be used to transparently authenticate you. Thereafter you have a Principal on that server and all is well.

If you fail over to a war w/o a login config, there is no way to authenticate you on the new server. Hence a 403.

If I uncomment the error page element in the hello/hello2 web.xml, and then do a failover to one of those pages, I get redirected to main. I do not, however, have to log in to main -- the sso valve is able to log in for me, since main has a login config.

Perhaps you can create a custom authenticator for hello/hello2.

In 4.0.4.CR2 there is the ability to pretty easily add your own authenticators. See jbossweb-tomcat55.sar's server.xml and META-INF/jboss-service.xml for ideas on how to configure that (there is probably a wiki page too).

Get the org.apache.catalina.authenticator.NonLoginAuthenticator as a template to create your own, and replace the authenticate method with this:

public boolean authenticate(Request request,
 Response response,
 LoginConfig config)
 throws IOException {

 // Have we already authenticated someone?
 Principal principal = request.getUserPrincipal();
 String ssoId = (String) request.getNote(Constants.REQ_SSOID_NOTE);
 if (principal != null) {
 if (log.isDebugEnabled())
 log.debug("Already authenticated '" +
 principal.getName() + "'");
 // Associate the session with any existing SSO session
 if (ssoId != null)
 associate(ssoId, request.getSessionInternal(true));
 return (true);
 }

 // Is there an SSO session against which we can try to reauthenticate?
 if (ssoId != null) {
 if (log.isDebugEnabled())
 log.debug("SSO Id " + ssoId + " set; attempting " +
 "reauthentication");
 // Try to reauthenticate using data cached by SSO.
 if (reauthenticateFromSSO(ssoId, request))
 return true;
 }

 // No principal + no SSO = reject!
 return false;

 }

Note I haven't tried that; just a suggestion :)

17. Re: Clustered SSO not following the semantics of local SSO

brian.stansberry Apr 7, 2006 2:11 AM (in response to cptnkirk)

It's late, the time when I dream up bad hacks.

What if you add a FORM login config block to hello/hello2, but use your loginredirect.jsp as the login page? I would expect if the user hits hello w/o a login, loginredirect.jsp would get called, redirecting to main. However, in the failover case, the war now has a usable login config and since there is an SSO the Tomcat FormAuthenticator will use the SSO credentials and transparently log you in.

Actually, I kinda like that; if it works I think it qualifies as a good hack :)

18. Re: Clustered SSO not following the semantics of local SSO

cptnkirk Apr 8, 2006 12:22 AM (in response to cptnkirk)

Assuming we start at node1 and then move to node2...

This hack works, but only if node1 stays active. In the scenario where sticky sessions are used and node1 crashes, traffic is sent to node2, however node2 still lacks the authentication context and forces a login. If someone where in a hello workflow, they'd be kicked back out to main. Getting better, but unfortunately not the transparent fail over our hello users are expecting.

It feels to me that in order to support true enterprise authentication (along with the already good clustered session support) a clustered aware authentication service would need to be developed. That while a Principal may not be able to be replicated, the underlying credentials in their various forms can. You also know when users authenticate and log off. It seems to me that this service could use these authentication events along with the underlying credential data to synchronously recreate a Principal on each node upon login, and clean up upon logoff. I suppose this creation could be deferred as long as you're willing to store the credential info and original authenticator mapping forever.

Just thinking out loud. What are your thoughts? Also what is JBoss' view of Clustered SSO/Enterprise SSO? This feature as I understand the concept seems broken to me. Regardless of our discussions and any eventual work around, is there commitment within the organization to fix this problem? Or is this particular situation not in line with current JBoss goals for Clustered/Enterprise SSO?

-Jim

PS: Still need to look into the NonLoginAuthenticator solution.

19. Re: Clustered SSO not following the semantics of local SSO

brian.stansberry Apr 8, 2006 12:22 PM (in response to cptnkirk)

Thanks! You found a bug in the distributed session manager that I'm fixing for 4.0.4.GA. http://jira.jboss.com/jira/browse/JBAS-3085

This hack works, but only if node1 stays active. In the scenario where sticky sessions are used and node1 crashes, traffic is sent to node2, however node2 still lacks the authentication context and forces a login. If someone where in a hello workflow, they'd be kicked back out to main. Getting better, but unfortunately not the transparent fail over our hello users are expecting.

Can you change the test wars so to remove the "distributable" element from web.xml? After that you should see this behavior:

1) Start on node1. Kill node1 (not a shutdown -- a kill -9 or End Process Tree from Windows TaskManager). Should work as you want.

2) Start on node1. Stop the node1 worker in Apache. Use the browser to fail over to node2. Should work OK. Then do a clean shutdown of node1. Should still work OK. (If the webapp is marked "distributable", won't work OK anymore -- this is the bug I fixed; it will work OK in 4.0.4.GA).

3) Start on node1. Do a clean shutdown of node1. Use browser to fail over to node2. You'll have to log in. I'll explain more about this below.

It feels to me that in order to support true enterprise authentication (along with the already good clustered session support) a clustered aware authentication service would need to be developed. That while a Principal may not be able to be replicated, the underlying credentials in their various forms can. You also know when users authenticate and log off. It seems to me that this service could use these authentication events along with the underlying credential data to synchronously recreate a Principal on each node upon login, and clean up upon logoff. I suppose this creation could be deferred as long as you're willing to store the credential info and original authenticator mapping forever.

What you described (with the deferred creation) is basically how ClusteredSSO works. The limitation it has is that it is tightly integrated with the Tomcat authentication layer -- the deferred creation is done by a Tomcat authenticator. It won't work if a webapp doesn't have an authenticator, which is what you were trying to do.

The reason #3 above doesn't work is we're not willing to store the credential info forever. We discard the info 1) if invalidate() is called on any session associated with the SSO (aka logout) or 2) if all sessions associated with the SSO across the cluster are expired. Sessions are expired either due to timeout or undeployment of their associated webapp.

In scenario #3, the only sessions associated with the webapp are on node1 (since you haven't accessed node2 yet -- once you do you have a session on node2). The sessions are all expired during shutdown of node1, so the SSO is invalidated.

I'm thinking about ways to solve #3 w/o leaking memory across the distributed system. Haven't solved it yet; what I have now is as far as it will go for 4.0.4. The workaround to the problem is to use the #2 approach -- stop accepting requests for the server about to be shut down, and then after a while shutdown the server. That's a better approach in general anyway.

Also what is JBoss' view of Clustered SSO/Enterprise SSO? This feature as I understand the concept seems broken to me. Regardless of our discussions and any eventual work around, is there commitment within the organization to fix this problem? Or is this particular situation not in line with current JBoss goals for Clustered/Enterprise SSO?

First, JBoss is definitely interested in making the existing ClusteredSSO feature as good as it can be. I really want to thank you Jim for pushing your use case -- working with it I've now found 2 issues that limit its usefulness.

That said, the current ClusteredSSO is a fairly limited feature. It will never, for example, move away from the tight coupling to the Tomcat authenticators. It is not intended to be JBoss' final answer to enterprise SSO -- there is also work underway on SSO solutions with broader applicability. Best to monitor the security forums to follow events.

20. Re: Clustered SSO not following the semantics of local SSO

cptnkirk Apr 10, 2006 12:26 PM (in response to cptnkirk)

Brian,

Thanks for your support and JBoss' continued commitment to enterprise quality middleware.

The reason #3 above doesn't work is we're not willing to store the credential info forever. We discard the info 1) if invalidate() is called on any session associated with the SSO (aka logout) or 2) if all sessions associated with the SSO across the cluster are expired. Sessions are expired either due to timeout or undeployment of their associated webapp.

In scenario #3, the only sessions associated with the webapp are on node1 (since you haven't accessed node2 yet -- once you do you have a session on node2). The sessions are all expired during shutdown of node1, so the SSO is invalidated.

The thing here is that due to having a distributed session, I thought I had session information already stored on these nodes as a result being a member of the cluster. If this is true, I should have valid session information on node 2. There hasn't been an invalidate and the session on node 2 hasn't expired yet. So I have node 1's entire session replicated to node 2 at this point, but auth fails because we aren't willing to store username, password and the string representation of the authenticator in a cluster wide cache? Wouldn't hitting each node in effect transfer this information cluster wide anyway?

I understand why the current implementation doesn't work. But don't see the drawbacks of changing the implementation to, in whatever way most convenient, make security information available to all interested nodes in push form. Pull just doesn't do it for me, especially when we're expressly trying to build a system tolerant of faults on the nodes you'd be pulling from.

Thanks,
Jim

21. Re: Clustered SSO not following the semantics of local SSO

brian.stansberry Apr 11, 2006 1:47 PM (in response to cptnkirk)

The security information is pushed. The problem is just one of altering the cleanup mechanism. That's a fixable problem, but it's too late to fix for 4.0.4.

You don't have a session on the 2nd node until you access the node. The session contents have been replicated to the second node, but the session object itself is not instantiated until you access it on the node.

22. Re: Clustered SSO not following the semantics of local SSO

cptnkirk Apr 11, 2006 3:26 PM (in response to cptnkirk)

Great. So the problem appears to be identified. Is there a jira tracking this issue? Do you know the anticipated release date of the next product cycle that this fix might be included in? Is there CVS or milestone access to this release so that I can test and give feedback in a more timely manner than in this past round?

Thanks,
Jim

23. Re: Clustered SSO not following the semantics of local SSO

brian.stansberry Apr 12, 2006 1:30 AM (in response to cptnkirk)

JIRA:

http://jira.jboss.com/jira/browse/JBAS-3096

Scheduled for 4.0.5.CR1 (or 5.0.0.Beta if it comes out first.

Release schedule:

Neither of the above has a scheduled date right now. To monitor:

http://jira.jboss.com/jira/secure/BrowseProject.jspa?id=10030&subset=-1

Kind of a pain, but that's where you (and I) look.

Access to code once the JIRA is fixed:

http://www.jboss.com/wiki/Wiki.jsp?page=CVSRepository

All alphas, betas, release candidates are available from the download page once released.