1 2 3 Previous Next 40 Replies Latest reply on Sep 9, 2016 5:59 AM by Hannu Lahtinen

    Wildfly 10 + JBoss OSGi problems with startup

    Hannu Lahtinen Newbie

      Hi,

      I'm migrating a big project to Java 8. We had to update several big pieces of our project:

      JBoss AS 7.2 -> Wildfly 10 Final

      JBoss AS OSGi submodule -> JBoss OSGi 2.5.2 Final

       

      We started experimenting after these updates but after a while realized our startup was eating tons (10Gb+) of memory and timing out at startup(10mins+). After some testing it seemed like Spring ORM was eating up resources. With it and dependant services commented away the startup was faster and ate up less memory, but was still not fast enough. I built a test project with far less bundles and tested with Spring 3 and 4 separately. Spring 3 took over 70s to start and Spring 4 took around 20 seconds to start.

       

      We chose to upgrade:

      Spring 3.2.3 -> Spring 4.2.5(bundled by Apache Servicemix)

      Eclipse Gemini Blueprint 1.0.2 -> 2.0.0.RELEASE

      Hibernate 4.2 -> 5.0.7

       

      Now I have reconstructed our project with the new versions and I'm testing starting up Wildfly again. Sadly, we still have major performance issues. Startup takes minutes(with JBoss AS 7 it took between 40-60s) and still eats a lot of memory as well. After the JBoss OSGi subsystem has started up GC will clear memory without issues, but the startup will crash with out of memory error if I limit the memory to for example 4gb(with -Xmx 4000M).

       

      I have been logging everything with DEBUG level and gone through the logs, but I can't find the root cause of the issue. When the memory limit is reached it seems to always be stuck on addExportedPaths() loop in JBoss MODULES Module.Java class. With VisualVM I can see that memory seems to fill up with Object arrays and FastCopyHashSet and Module$Visited objects. After reviewing Module.java code, the cause might be a dependency loop somewhere. Is this possible or would JBoss MODULES/JBoss OSGi detect it ? Note that the startup is slowed down while starting bundles, not when dependencies are first resolved and bundles installed. Does anyone have any other ideas on what could be causing this?

       

      If you need more information, please ask. I can't provide complete logs or standalone.xml due to company policy, but I can provide snippets based on any information you might need.

       

      -Hannu

        • 1. Re: Wildfly 10 + JBoss OSGi problems with startup
          Hannu Lahtinen Newbie

          So after investigating further, I found that ServiceMix bundled Spring ORM and Spring Context had a cyclical dependency on each other. Spring ORM is dependant on Spring Context and Spring Context is (optionally) dependant on Spring ORM. After removing this the startup time with the capability stack containing only Spring and Gemini Blueprint dropped from around 22s to 5 seconds. This did not fix the entire issue, since it seems other bundles in our stack cause memory to fill up as well.

          • 2. Re: Wildfly 10 + JBoss OSGi problems with startup
            Hannu Lahtinen Newbie

            Hi again,

            These performance problems still persist and it has proven very hard to track down the problem bundles. I have created an example standalone.xml with as little bundles on it as possible to prove it is easily reproducible. The attached standalone.xml takes anywhere from 15 to 70 seconds to start with the Spring ORM bundle. If the Spring ORM bundle is commented out, the startup takes around 5 seconds.


            I'm aware there is a cyclic dependency between Spring ORM and Spring Context that causes this slowness. Problem is cyclic dependencies are not that uncommon in manifests and are hard to track down because JBoss OSGi actually continues to start bundles but it only slows down and eats more memory over time.

             

            I'm also going to create a thread about this to Eclipse Gemini Blueprint forums, because this does not happen when their extender bundle is commented out. Reason why I think this might be an issue with JBoss OSGi/JBoss MODULES is that this was not an issue with JBoss AS 7.2 and Eclipse Gemini Blueprint 1.0.2. After upgrading to Wildfly 10 and JBoss OSGi 2.5.2 this issue has been present all the time with Spring 3 and 4 and Gemini Blueprint 1.0.2 and 2.0.0.

             

            NOTE: Seems it's not possible to attach text files? So I uploaded my standalone.xml to pastebin.com:

            Standalone.xml - Pastebin.com

            • 3. Re: Wildfly 10 + JBoss OSGi problems with startup
              Arcadiy Ivanov Newbie

              Any chance you could attach logs and/or heapdumps on OOMs and or stack traces?

               

              JBoss Modules is not something JBOSGI controls - we simply proxy OSGI dependencies and capabilities through onto JBoss Module infrastructure. It's possible that JBOSGI has a bug that abuses Modules, but without more data it's hard to say.

              • 4. Re: Wildfly 10 + JBoss OSGi problems with startup
                Hannu Lahtinen Newbie

                I'm not sure I can reproduce the OOM with this small stack of bundles that I have used to demo the effects. I have logs with root logger at "ALL" level for you and a heap dump. I'll grab the OOMError stack from our production side, but that means I can't share the logs around it.

                 

                Thank you for taking a look. I understand the issue might not be with JBoss OSGi, but I'm hoping you can point me in the right direction if that's the case.

                 

                One thing I have forgotten to mention is this has been reproducible in three different machines(Win 7 and Win 10) and three different JDK 8 updates(6x, 7x and 91). So the problem is not limited to some exact setup.

                 

                Is there a preferred method to share files(other than image and video) on this forum? I used Dropbox. heap dump, logs and OOMError stack.

                • 5. Re: Wildfly 10 + JBoss OSGi problems with startup
                  Arcadiy Ivanov Newbie

                  So, the good news: on a very preliminary investigation, this does not seem to actually be a bug in JBOSGI or MSC.

                  Bad news: this does seem to be a runaway dependency tree inspection that blooms out of control.

                   

                  Are you using Dynamic-ImportPackage: * by any chance anywhere?
                  Any chance I could take a look at those foo.bar.x.y.z:1.2.3.Final descriptors, metadata and manifests?

                  • 6. Re: Wildfly 10 + JBoss OSGi problems with startup
                    Hannu Lahtinen Newbie

                    We have a general rule to not use "Dynamic-ImportPackage" at all. I have not gone through all our bundles to make sure every bundle obeys to this rule. I can try to check this on Monday.

                     

                    The thing is, the bundle I used to get the OOM is dependent on around 10 of our own bundles and 10 3rd party bundles, so I would probably have to provide you with the whole chain of dependencies under it. We have around 200 bundles(maybe 100 of our own and 100 3rd party) setup as capabilities and 10 WARs in deployment. The slow down and memory hogging happens throughout startup. Some bundles start in milliseconds and do not hog memory. Others take too much time and hog too much memory but they do start, if I give the Java process a lot(16Gb) memory. Some that have the longest dependency chains might not even start at all. They just fill up the 16Gb and throw the OOM after some time.

                     

                    The heap dump and logs are from a really simple configuration without any of our bundles. Logs were created with the one I posted here. While creating the heap dump I added 10 bundles just to try and reproduce the OOM.

                     

                    It is likely that some mistakes in manifests cause this. I managed to find the circular dependency in Spring manifests, but it is not easy when the amount of bundles grows. Even if it's not a bug, would it be possible to detect this is happening and throw some exception during startup? So the users could locate and fix the bundle(s) that have dependency issues.

                    • 7. Re: Wildfly 10 + JBoss OSGi problems with startup
                      Arcadiy Ivanov Newbie

                      Sigh.

                       

                      The thing is, JBOSGI doesn't really keep any resolution and linkage state, so graph cycle detection is not an option as we delegate all resolution and merely proxy the module states in an OSGI-compliant manner. The OOM is precipitated by a single "org.jboss.osgi.framework.internal.ModuleManagerImpl.loadModule(ModuleManagerImpl.java:539)" passing one parameter "foo.bar.x.y.z:1.2.3.Final " to the module loader. If I had your 10-bundle package causing the OOM I would at least be able to tell you exactly what's causing the issue.

                       

                      Looking at MSC code, it's clear that it's not designed to traverse large trees and also doesn't handle cycle detection. I might add that functionality into MSC as the fix there would be more prudent.

                       

                      If you're uncomfortable posting the 10-bundle package here you can email the link to me privately and I can take a look.

                      • 8. Re: Wildfly 10 + JBoss OSGi problems with startup
                        Hannu Lahtinen Newbie

                        I'm sorry, but it's not possible for me to share the bundles. The software we are developing is not open source and I'm not allowed to share them.

                         

                        I found that one of our bundles had DynamicImport-Package: * in manifest.mf. I was able to clean this up, but it had no effect. When I was looking for this I noticed that "DynamicImport-Package: *" is used by at least OpenSplice DDS, Active MQ, Hibernate and Spring Data Commons. These are all bundles we use.

                         

                        I also ran JBoss Tattletale through our bundles and modules folders. It can detect circular dependencies. It detected 0 circular dependencies in our own bundles, but multiple 3rd party bundles had circular dependencies. For example

                         

                        I looked through JBoss MODULES issues section and they have an improvement open about OOME issues (https://issues.jboss.org/projects/MODULES/issues/MODULES-246?filter=allopenissues). Is this related?

                        • 9. Re: Wildfly 10 + JBoss OSGi problems with startup
                          Arcadiy Ivanov Newbie

                          Nope, bug doesn't appear to be related. And we're not talking about a true unbounded loop dependency (otherwise the linkage would never finish at all in any of the cases), but rather something that causes the export paths to grow uncontrollably.

                           

                          Then there are only three options:

                          1) You debug the problem yourself. Specifically, collect data from org.jboss.modules.Module.addExportedPaths(Module.java:1277) to see which modules export paths uncontrollably. Could be not a direct dependency but A exporting B's packages and B exporting packages related to A transitively.
                          2) You create a reproducible case using third party libraries in question but excluding your the proprietary code. This may be as small as creating a scaffold and setting test case dependencies to cause the traversal.
                          3) I could sign an NDA but, naturally, in that case it'll be paid contract work.

                           

                          Cheers!

                          • 10. Re: Wildfly 10 + JBoss OSGi problems with startup
                            Hannu Lahtinen Newbie

                            Since we are a defense security firm, #3 would require some steps to be approved and you'd probably have to fly to Finland. So for now, I'll try #1

                             

                            I spent all day debugging. I couldn't figure out the exact bundle causing the problems though. Some observations:

                            I noticed that every bundle is injected with a dependency to jbosgi.framework? This causes all of jbosgi.framework's dependencies to be looped through for every dependency of every bundle. In our case thousands of times.

                            The size of "visited" FastCopyHashSet(created in link() method of Module.java) grows to millions before it gets too big to open with Eclipse.

                            I repacked few 3rd party bundles to remove circular dependencies and "DIP: *"s. This did not help the OOME bundle to start but maybe shaved of a couple of seconds from bundles that are slow but do start.

                             

                            I wanted to ask what are the best practices for using system modules? When should I add them to system.modules.extra and system.packages.extra and when should I use <capability name="x"/> ? Also is it better to always use system modules when possible, rather than using ~same jars(maybe with manifest) that we add in our bundles folder. Our AS7 setup had a lot of added bundles, that can be found as modules in WFLY10.

                             

                            I'll continue debugging tomorrow.

                            • 11. Re: Wildfly 10 + JBoss OSGi problems with startup
                              Arcadiy Ivanov Newbie

                              I haven't been in Finland for about 2 decades and would welcome the opportunity

                               

                              On a serious note, generally speaking, when you deploy a module that is a proper OSGi bundle, you actually don't need to specify it as part of a capability in a descriptor, you simply deploy it as a regular deployment. Capabilities exist to expose JBoss modules that *are not OSGi bundles* to the OSGi subsystem.

                               

                              >> The size of "visited" FastCopyHashSet(created in link() method of Module.java) grows to millions before it gets too big to open with Eclipse.

                               

                              If you put a breakpoint in org.jboss.modules.Module.addExportedPaths(Module.java:1277) you can see what module exports which of its dependencies, by looking at "this" members.

                              • 12. Re: Wildfly 10 + JBoss OSGi problems with startup
                                Hannu Lahtinen Newbie

                                So today I wrote a class that counts the occurrences of different modules in addExportedPaths() method and writes to a file(file per thread) every 1k occurrences. This seems to give some insight into which dependencies are used most and seem to be looping.

                                 

                                main:

                                javax.sql.api:main has occurred 1729000 times.

                                 

                                Main thread's file only has occurences of javax.sql.api in the file.

                                Controller Boot Thread:

                                This thread had multiple different "loops" of modules. I'm just going to write down the highest number they reached before crashing. This loop appeared first and last in the file:

                                javax.api:main has occurred 950000 times.

                                org.jboss.msc:main has occurred 951000 times.

                                org.jboss.osgi.deployment:main has occurred 951000 times.

                                org.jboss.osgi.metadata:main has occurred 951000 times.

                                org.jboss.osgi.resolver:main has occurred 951000 times.

                                org.jboss.osgi.spi:main has occurred 951000 times.

                                org.jboss.osgi.vfs:main has occurred 951000 times.

                                org.osgi.core:main has occurred 951000 times.

                                javax.api:main has occurred 951000 times.

                                org.jboss.msc:main has occurred 952000 times.

                                 

                                Another loop was:

                                org.jboss.osgi.resolver:main has occurred 847000 times.

                                org.jboss.osgi.repository:main has occurred 848000 times.

                                org.jboss.osgi.resolver:main has occurred 848000 times.

                                org.jboss.osgi.repository:main has occurred 849000 times.

                                 

                                And:

                                javax.interceptor.api:main has occurred 796000 times.

                                javax.annotation.api:main has occurred 797000 times.

                                javax.interceptor.api:main has occurred 797000 times.

                                javax.annotation.api:main has occurred 798000 times.

                                 

                                And:

                                org.jboss.jts:main has occurred 872000 times.

                                javax.orb.api:main has occurred 873000 times.

                                org.jboss.jts:main has occurred 873000 times.

                                javax.orb.api:main has occurred 874000 times.

                                 

                                On top of these

                                org.osgi.core:main, org.jboss.logmanager:main and javax.activation.api:main appeared multiple times in a row in the file like this:

                                javax.activation.api:main has occurred 152000 times.

                                javax.activation.api:main has occurred 153000 times.

                                 

                                It seems like my counter was not thread safe enough because the above mentioned loops often started on top of another like: javax.annoorg.jboss.osg.repository:main has occurred X times. But I still believe it did its job.

                                 

                                Service Thread 1-1:

                                com.vividsolutions.jts:main has occurred 974000 times.

                                javax.api:main has occurred 975000 times.

                                javax.inject.api:main has occurred 975000 times.

                                org.apache.xerces:main has occurred 975000 times.

                                org.jboss.as.controller-client:main has occurred 975000 times.

                                org.jboss.as.osgi:main has occurred 975000 times.

                                org.jboss.dmr:main has occurred 975000 times.

                                org.jboss.logging:main has occurred 975000 times.

                                org.jboss.modules:main has occurred 975000 times.

                                org.jboss.msc:main has occurred 975000 times.

                                org.jboss.osgi.framework:main has occurred 975000 times.

                                org.jboss.osgi.provision:main has occurred 975000 times.

                                org.jboss.osgi.repository:main has occurred 975000 times.

                                org.jboss.osgi.resolver:main has occurred 975000 times.

                                org.slf4j:main has occurred 975000 times.

                                org.slf4j.impl:main has occurred 975000 times.

                                org.omg.api:main has occurred 975000 times.

                                org.hibernate:main has occurred 975000 times.

                                org.apache.log4j:main has occurred 975000 times.

                                com.vividsolutions.jts:main has occurred 975000 times.

                                javax.api:main has occurred 976000 times.

                                 

                                Service thread 1-1 actually only printed these modules until the whole application crashed. 1k was reached only when the problem bundle was being started. So these modules had their "getDependenciesInternal()" fed to addExportedPaths() almost million times.

                                 

                                Service Thread 1-3:

                                This thread had many different loops similar to Controller Boot Thread.

                                 

                                Most common:

                                deployment.com.springsource.org.apache.tools.an:1.8.1 has occurred 218000 times

                                jbosgi.framework:main has occurred 219000 times.

                                javax.xml.bind.api:main has occurred 219000 times

                                org.apache.commons.codec:main has occurred 219000 times

                                org.slf4j.jcl-over-slf4j:main has occurred 219000 times

                                deployment.com.springsource.org.apache.tools.an:1.8.1 has occurred 219000 times

                                jbosgi.framework:main has occurred 220000 times.

                                 

                                Others:

                                jbosgi.framework:main has occurred 17000 times

                                deployment.com.springsource.org.aopalliance:1.0.0 has occurred 18000 times

                                org.slf4j.jcl-over-slf4j:main has occurred 18000 times.

                                org.apache.commons.pool:main has occurred 18000 times.

                                deployment.org.apache.servicemix.bundles.spring-beans:4.2.5.RELEASE_1 has occurred 18000 times.

                                deployment.org.apache.servicemix.bundles.spring-core:4.2.5.RELEASE_1 has occurred 18000 times.

                                jbosgi.framework:main has occurred 18000 times.

                                deployment.com.springsource.org.aopalliance:1.0.0 has occurred 19000 times

                                 

                                and the second longest "loop":

                                 

                                deployment.org.apache.service mix.bundles.spring-expression:4.2.5.RELEASE_1 has occurred 13000 times.

                                deployment.com.springsource.edu.emory.mathcs.backport:3.1.0 has occurred 14000 times.

                                javax.annotation.api:main has occurred 14000 times.

                                jbosgi.framework:main has occurred 14000 times.

                                org.omg.api:main has occurred 14000 times.

                                javax.jms.api:main has occurred 14000 times.

                                javax.persistence.api:main has occurred 14000 times.

                                javax.validation.api:main has occurred 14000 times.

                                deployment.org.apache.servicemix.specs.jaxws-api-2.2:2.2.0 has occurred 14000 times.

                                deployment.com.springsource.org.aopalliance:1.0.0 has occurred 14000 times.

                                org.slf4j.jcl-over-slf4j:main has occurred 14000 times.

                                deployment.self.bundled.org.joda.time:2.3.0 has occurred 14000 times.

                                deployment.org.apache.servicemix.bundles.spring-aop:4.2.5.RELEASE_1 has occurred 14000 times.

                                deployment.org.apache.servicemix.bundles.spring-core:4.2.5.RELEASE_1 has occurred 14000 times.

                                deployment.org.apache.servicemix.bundles.spring-beans:4.2.5.RELEASE_1 has occurred 14000 times.

                                deployment.org.apache.servicemix.bundles.spring-expression:4.2.5.RELEASE_1 has occurred 14000 times.

                                deployment.com.springsource.edu.emory.mathcs.backport:3.1.0 has occurred 15000 times.

                                 

                                Other service threads had nothing written in their files.

                                 

                                None of our own bundles were mentioned in any of the files, just 3rd party dependencies. Hope this gives some insight to how the dependencies are being handled and what's wrong with them. A lot of these "loops" have modules in them. When I looked at the module.xmls for example javax.api has a dependency on javax.sql.api and vice versa. Same is true for org.slf4j and org.slf4j.impl as well as javax.orb.api and org.jboss.jts. Maybe I'm using system modules wrong?


                                EDIT:  Whoops. Accidentally posted while typing out the loops.

                                EDIT2: Overlined threads. Look next post.

                                • 13. Re: Wildfly 10 + JBoss OSGi problems with startup
                                  Hannu Lahtinen Newbie

                                  Okay. My previous post is misleading. After making my monitoring class thread-safe, all the dependencies listed above are concentrated in to one service thread.

                                  I will update this post after analyzing it a bit.

                                   

                                  I don't really know what to say about it. Same bundles have the most hits as I mentioned in the above post but the "loops" are not visible in the same way. addExportedPaths() has ~6M hits starting all the bundles before the worst one(one of the earlier bundles takes up 2-3M of these) and starting the bundle with biggest problems calls addExportedPaths() over 10 million times before running out of memory.

                                  • 14. Re: Wildfly 10 + JBoss OSGi problems with startup
                                    Arcadiy Ivanov Newbie

                                    Because those are not clear direct dependency loops. Something is causing export paths to loop without there being actual apparent dependencies. Try taking all of the OSGI bundles that are not shipped with JBOSGI out of capabilities in the standalone.xml and deploying them as regular deployments into WFLY. Let me know how it goes.

                                    1 2 3 Previous Next