3 Replies Latest reply on Mar 22, 2013 2:35 AM by jaikiran

    Intermittent issues during jboss 7 startup with datasource security domains

    rkav94

      Forgive me if I am not following standard etiquette. This is my first post in this forum.

       

      I am facing a few problems in my application (ear file) during JBOSS (7.1.1.Final) startup. All of these seems to be timing related issues and related to how the startup sequences happens. As a workaround to a startup hang, we have increased the number of startup threads (msc) to 20.

       

      Here are the problems (all of then happen only intermittently and only on some machines. pointing to thread safety and race conditions between threads)

                   1. We are configuring everything in standalone-full.xml. But the following error is thrown (and not always)..

                         

      15:47:05,100 DEBUG [org.jboss.security.auth.login.XMLLoginConfigImpl] (MSC service thread 1-11) Failed to load config as XML: java.io.FileNotFoundException: C:\...\login-config.xml (The system cannot find the file specified)

      at java.io.FileInputStream.open(Native Method) [rt.jar:1.7.0_03]

      at java.io.FileInputStream.<init>(FileInputStream.java:138) [rt.jar:1.7.0_03]

      at java.io.FileInputStream.<init>(FileInputStream.java:97) [rt.jar:1.7.0_03]

      at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90) [rt.jar:1.7.0_03]

      at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188) [rt.jar:1.7.0_03]

      at java.net.URL.openStream(URL.java:1035) [rt.jar:1.7.0_03]

      at org.jboss.security.auth.login.XMLLoginConfigImpl.loadXMLConfig(XMLLoginConfigImpl.java:460) [picketbox-4.0.7.Final.jar:4.0.7.Final]

       

                  2.  We have configured two datasources, each with a different security domain. We have not configured cache-type for these security domain. Even so, on some machines when JBOSS starts up, I see 4 JBossCachedAuthenticationManager (2 JBossCachedAuthenticationManager per security domain) in a heap dump. More interesting is that 2 of those authentication managers has a domainCache set to null and 2 of them has domainCache set to an infinispan ConcurrentMap. We have not configured cache-type at all and still we see two JBossCachedAuthenticationManagers per security domain, one of which has a non-null domainCache.

       

                 3. On systems with the above issue #2, there is a huge and continuous  memory leak (even at low loads) as shown below..

       

      2833553 instances of class java.util.concurrent.ConcurrentLinkedQueue$Node

       

                    if I drill down to the queue, I see that the queue is the accessQueue of the domainCache of one of the JbossCachedAuthenticationManager. And in the log (with trace enabled for org.jboss.security and org.jboss.as.security), there is an equal amount of updateCache() calls. What I see that there is a call to isValid() from various threads and one call happens every second. The interesting thing is that the updateCache() happens only when a user logs in and continues every 3 seconds for a few hours even after the user logs out. When the principal is null, updateCache is not called..

                                 Begin isValid, principal:null, cache entry: null

      When the principal is non null, updateCache is called..Unfortunately, updateCache is called with the loggedin user, but this security domain is a datasource security domain which has single DB user.

       

       

      Actual Location (potential) of the code with the above problems:

       

      I have actually partially figured out the race condition that is causing all of the above issues..(I had not stepped through the code, so it is more from a scan of the code, so I may be wrong)....

       

      In the following startup code, the Binding Service is setup before any of the security domain services...This means a jndi lookup can happen before the security domain services are started...

       

      SecuritySubsystemRootResourceDefinition.performBootTime()...

       

      And the following code looks to be the culprit that causes problem number 2) described above:

       

      In SecurityDomainJndiInjectable.lookupSecurityDomain()....

       

            SecurityDomainContext sdc = securityManagerMap.get(securityDomain);

              if (sdc == null) {

                  sdc = securityManagement.createSecurityDomainContext(securityDomain, new DefaultAuthenticationCacheFactory());

                  securityManagerMap.put(securityDomain, sdc);

              }

       

       

      Please note that the above code will create a security domain with a default cache, even if the cache-type was null...If the security domain service is not yet setup and a data source layer happens to do a lookup on the security domain, we will have a new  and duplicate JbossCachedAuthenticationManager() coming up in addition to the original authentication manager that starts up as follows:

       

      In SecurityDomainService.start()  (this is the actual JbossCachedAuthenticationManager() instance that matches to our security domain configuration)

       

            try {

                  securityDomainContext = securityManagement.createSecurityDomainContext(name, cacheFactory);

              } catch (Exception e) {

                  throw SecurityMessages.MESSAGES.unableToStartException("SecurityDomainService", e);

              }

       

      In addition the continous leak in Jboss infinispan cache seems to be a problem where a wrong principal is added (and due to this the cache is never evicted, causing the leak)...This security domain is for the data source (we have a single user named 'something' and logged in ejb user is 'somethingelse')

       

      Questions:

       

      1. Why and who is calling the authenticate() every 3 seconds (this is only for data source security domains)?

      2. Sometimes the isValid() happens right at the beginning (causing the exception shown in 1) above) even before the security domain service is installed? Any idea where this call is coming from?

      3. Is there a config mistake causing this behaviour?

      4. If not, is there a fix available in later versions?