14 Replies Latest reply on Feb 9, 2015 10:03 AM by Stian Lund

    Differences in resource hierarchy for AS7 servers in RHQ 4.13?

    Stian Lund Expert

      Hi,

      Since upgrading to the later versions, 12, 13, I've noticed that some of our AS7 servers have a hierarchy looking like this:

       

       

      While others still have the older type of sub-resource list:

       

       

      Of course, the first example, with 'Server Configuration' is the cleanest one, and I assume this is because the list for AS7 started to get pretty big.

      But why are some AS7 still with the old style? They are the exact same version of AS7 (7.1.1.Final "Brontes"), and the same version of the plugin (4.13.1).

       

      It seems to vary from RHQ installation which leads to me believe something has gone wrong with the resource upgrade code when upgrading RHQ?

       

      I have tried doing Uninventory of all children under AS7 and autodiscover but the old style is still back after a while.

       

      Also there are several metrics missing in those that use the 'old' style;

      - Session metrics for web subdeployment.

      - Collection Time for Garbage Collector

       

      Any ideas?

        • 1. Re: Differences in resource hierarchy for AS7 servers in RHQ 4.13?
          jay shaughnessy Expert

          I don't know but perhaps try a restart on an Agent responsible for an "old" hierarchy (perhaps in Debug) and see if you find any resource upgrade issues in the logs.  Also, there was, I think, a fairly major change in the handling of subcategories at some point in fairly recent history.  Not sure if that could be playing a part.  Finally, I guess validate the version of the as7 plugins in action on the different agents.  I can't see how they would differ from the version on the server, but I guess you could still verify things are as expected.

          • 2. Re: Differences in resource hierarchy for AS7 servers in RHQ 4.13?
            Stian Lund Expert

            Yeah, it's really strange...

            I have two installations in test where the new style with "Server Configuration" is used, and four installations in test and production where the "old" style is still used.

             

            It might look like the resource upgrade code has not done it's job properly.

             

            I will try to Uninventory a resource and import it with debug turned on in the agent and the server.

             

            Also, is there a way to manually start the resource upgrade process again so we could force the upgrade?

             

            Are there someone who knows more about this change in the handling of subcategories and could  you point them to have a look here, please?

             

            Edit: I am getting the following in the agent log with DEBUG:

             

            2015-02-03 10:36:11,367 DEBUG [ResourceDiscoveryComponent.invoker.daemon-2] (rhq.core.pluginapi.util.JavaCommandLine)- Parsing JavaCommandLine[arguments=[/opt/jboss/java/bin/java, -D[Standalone], -server, -XX:+UseCompressedOops, -Xms512m, -Xmx4096m, -XX:MaxPermSize=256m, -Djava.net.preferIPv4Stack=true, -Dorg.jboss.resolver.warning=true, -Dsun.rmi.dgc.client.gcInterval=3600000, -Dsun.rmi.dgc.server.gcInterval=3600000, -Dfile.encoding=UTF-8, -Djboss.modules.system.pkgs=org.jboss.byteman, -Djava.awt.headless=true, -Djboss.server.default.config=standalone.xml, -XX:+PrintGCDetails, -XX:+PrintGCTimeStamps, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=/app/dump/elaring, -Djboss.bind.address=0.0.0.0, -Djboss.bind.address.management=0.0.0.0, -Djboss.server.base.dir=/opt/jboss/server/elaring, -Dorg.jboss.boot.log.file=/opt/jboss/server/elaring/log/boot.log, -Dlogging.configuration=file:/opt/jboss/server/elaring/configuration/logging.properties, -jar, /opt/jboss/jboss-modules.jar, -mp, /opt/jboss/modules, -jaxpmodule, javax.xml.jaxp-provider, org.jboss.as.standalone, -Djboss.home.dir=/opt/jboss, -Djboss.server.base.dir=/opt/jboss/server/elaring], includeSystemPropertiesFromClassArguments=true, shortClassOptionFormat=[WHITESPACE, EQUALS_SIGN], longClassOptionFormat=[WHITESPACE, EQUALS_SIGN]]...
            2015-02-03 10:36:11,565 DEBUG [ResourceDiscoveryComponent.invoker.daemon-2] (rhq.modules.plugins.jbossas7.BaseProcessDiscovery)- Management user properties file not found at [].
            2015-02-03 10:36:11,576 DEBUG [ResourceDiscoveryComponent.invoker.daemon-2] (rhq.modules.plugins.jbossas7.BaseProcessDiscovery)- Defaulting to supportsPatching = false
            java.lang.IllegalArgumentException: Malformed version string []
                at org.rhq.core.domain.util.OSGiVersion.<init>(OSGiVersion.java:86)
                at org.rhq.modules.plugins.jbossas7.BaseProcessDiscovery.supportsPatching(BaseProcessDiscovery.java:823)
                at org.rhq.modules.plugins.jbossas7.BaseProcessDiscovery.buildResourceDetails(BaseProcessDiscovery.java:236)
                at org.rhq.modules.plugins.jbossas7.BaseProcessDiscovery.discoverResources(BaseProcessDiscovery.java:157)
                at org.rhq.modules.plugins.jbossas7.StandaloneASDiscovery.discoverResources(StandaloneASDiscovery.java:106)
                at org.rhq.core.pluginapi.inventory.ResourceContext.getNativeProcess(ResourceContext.java:365)
                at org.rhq.modules.plugins.jbossas7.BaseProcessDiscovery.upgrade(BaseProcessDiscovery.java:595)
                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
                at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                at java.lang.reflect.Method.invoke(Method.java:597)
                at org.rhq.core.pc.util.DiscoveryComponentProxyFactory$ComponentInvocationThread.call(DiscoveryComponentProxyFactory.java:305)
                at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
                at java.util.concurrent.FutureTask.run(FutureTask.java:138)
                at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                at java.lang.Thread.run(Thread.java:662)
            2015-02-03 10:36:11,577 DEBUG [ResourceDiscoveryComponent.invoker.daemon-2] (rhq.modules.plugins.jbossas7.BaseProcessDiscovery)- Discovered new JBossAS7 Standalone Server Resource (key=[hostConfig: /opt/jboss/server/elaring/configuration/standalone.xml], name=[AS (0.0.0.0:9443)], version=[null]).
            
            
            • 3. Re: Differences in resource hierarchy for AS7 servers in RHQ 4.13?
              Lukas Krejci Apprentice

              It seems to me that the problem lies elsewhere than on the agent side in this case.

               

              That debug message is expected - older versions of EAP/AS didn't have the expected version attributes and from that fact we can assume that they also don't support patching, which is what that debug message tries to tell rather verbosely.

               

              But the issue here I think is that even on the server side, the resource type hierarchy looks stale in your second picture. This would indicate to me that the upgrade to the new version of the plugin failed on the server side already - otherwise the metadata would have been updated and the hierarchy would have changed even before the plugin reached the agents.

               

              This might be as simple as a stale browser cache but can also mean that the upgrade to the new version of the plugin really failed on that server. Can you share the RHQ server logs from the time when you uploaded the new version of the plugin to it?

              • 4. Re: Re: Differences in resource hierarchy for AS7 servers in RHQ 4.13?
                Stian Lund Expert

                Thanks for the feedback Lukas,

                I am attaching logs from the RHQ Server upgrade on 2/2. And also from the 30/1 upgrade in test that was successful in upgrade of AS7 resources.

                 

                You will notice AS7 upgrade code starting about 11:22 in the log.

                 

                This is a pre-production server with about 70 JbossAS7 servers.

                 

                Also, here's a thing. In the working RHQ installation logs, I notice the following:

                 

                11:23:03,898 INFO  [org.rhq.enterprise.server.resource.metadata.ResourceMetadataManagerBean] (pool-6-thread-1) Updating resource type [JBossAS7:Webservices(id=0)]...
                
                

                 

                This resource does not seem to be present in the logs for the pre-production RHQ server during upgrade. Also, the working logs look like they contain a lot more resources for AS7.

                • 5. Re: Differences in resource hierarchy for AS7 servers in RHQ 4.13?
                  Stian Lund Expert

                  Well, at least my messing up of the config playing around with the DB trying to fix this yesterday, forced me to do a full reinstall from scratch, and now the AS7 servers in this installation come up correctly.

                   

                   

                  I still have another test-env with the old type AS7 which is unable to be updated, probably because of transaction timeouts set too low, as well as two production installas where obviously I am not going to install from scratch so any tips would be most appreciated

                  • 6. Re: Re: Differences in resource hierarchy for AS7 servers in RHQ 4.13?
                    Stian Lund Expert

                    It appears the following is where the server hangs for a very long time (indefinitely?) against the Oracle server.

                     

                    15:17:37,348 INFO  [org.rhq.enterprise.server.core.plugin.ProductPluginDeployer] (http-/0.0.0.0:7080-12) Newer version of [JBossAS7] plugin found (version 4.13.5) - older version (4.13.4) will be ignored.
                    15:17:37,367 INFO  [org.rhq.enterprise.server.core.plugin.ProductPluginDeployer] (http-/0.0.0.0:7080-12) Deploying [1] new or updated agent plugins: [JBossAS7]
                    15:17:37,447 INFO  [org.rhq.enterprise.server.resource.group.definition.GroupDefinitionManagerBean] (http-/0.0.0.0:7080-12) Updating dynaGroup based on [Platforms] CannedGroupExpression [id=Platforms, name=Groups by platform, expr=[resource.type.category = PLATFORM, groupby resource.type.name]]
                    15:17:37,451 INFO  [org.rhq.enterprise.server.resource.group.definition.GroupDefinitionManagerBean] (http-/0.0.0.0:7080-12) Updating dynaGroup based on [Platforms] CannedGroupExpression [id=Downed resources, name=All resources currently down, expr=[resource.availability = DOWN]]
                    15:17:39,100 INFO  [org.rhq.enterprise.server.resource.metadata.ResourceMetadataManagerBean] (http-/0.0.0.0:7080-12) Updating resource type [JBossAS7:JBossAS7 Host Controller(id=0)]...
                    15:17:40,149 INFO  [org.rhq.enterprise.server.resource.metadata.ResourceMetadataManagerBean] (http-/0.0.0.0:7080-12) Updating resource type [JBossAS7:JBossAS7 Standalone Server(id=0)]...
                    15:17:41,884 INFO  [org.rhq.enterprise.server.resource.metadata.ResourceMetadataManagerBean] (http-/0.0.0.0:7080-12) Updating resource type [JBossAS7:Wildfly/JBoss EAP Patch Handler(id=0)]...
                    15:17:41,963 INFO  [org.rhq.enterprise.server.resource.metadata.ResourceMetadataManagerBean] (http-/0.0.0.0:7080-12) Updating resource type [JBossAS7:Osgi(id=0)]...
                    15:17:42,690 INFO  [org.rhq.enterprise.server.resource.metadata.ResourceMetadataManagerBean] (http-/0.0.0.0:7080-12) Updating resource type [JBossAS7:ServerGroup(id=0)]...
                    15:17:43,327 INFO  [org.rhq.enterprise.server.resource.metadata.ResourceMetadataManagerBean] (http-/0.0.0.0:7080-12) Updating resource type [JBossAS7:Resource Adapters(id=0)]...
                    15:17:43,924 INFO  [org.rhq.enterprise.server.resource.metadata.ResourceMetadataManagerBean] (http-/0.0.0.0:7080-12) Updating resource type [JBossAS7:SocketBindingGroup(id=0)]...
                    15:17:45,519 INFO  [org.rhq.enterprise.server.resource.metadata.ResourceMetadataManagerBean] (http-/0.0.0.0:7080-12) Updating resource type [JBossAS7:Transactions Subsystem (Standalone)(id=0)]...
                    15:17:46,186 INFO  [org.rhq.enterprise.server.resource.metadata.ResourceMetadataManagerBean] (http-/0.0.0.0:7080-12) Updating resource type [JBossAS7:Managed Server(id=0)]...
                    15:17:46,842 INFO  [org.rhq.enterprise.server.resource.metadata.ResourceMetadataManagerBean] (http-/0.0.0.0:7080-12) Updating resource type [JBossAS7:Datasources (Standalone)(id=0)]...
                    15:17:47,441 INFO  [org.rhq.enterprise.server.resource.metadata.ResourceMetadataManagerBean] (http-/0.0.0.0:7080-12) Updating resource type [JBossAS7:JCA(id=0)]...
                    15:17:48,035 INFO  [org.rhq.enterprise.server.resource.metadata.ResourceMetadataManagerBean] (http-/0.0.0.0:7080-12) Updating resource type [JBossAS7:JBossWeb(id=0)]...
                    15:17:48,946 INFO  [org.rhq.enterprise.server.resource.metadata.ResourceMetadataManagerBean] (http-/0.0.0.0:7080-12) Updating resource type [JBossAS7:Messaging(id=0)]...
                    15:17:49,488 INFO  [org.rhq.enterprise.server.resource.metadata.ResourceMetadataManagerBean] (http-/0.0.0.0:7080-12) Updating resource type [JBossAS7:General JCA connectors(id=0)]...
                    15:17:49,961 INFO  [org.rhq.enterprise.server.resource.metadata.ResourceMetadataManagerBean] (http-/0.0.0.0:7080-12) Updating resource type [JBossAS7:EJB3(id=0)]...
                    15:17:50,652 INFO  [org.rhq.enterprise.server.resource.metadata.ResourceMetadataManagerBean] (http-/0.0.0.0:7080-12) Updating resource type [JBossAS7:DomainDeployment(id=0)]...
                    15:17:51,247 INFO  [org.rhq.enterprise.server.resource.metadata.ResourceMetadataManagerBean] (http-/0.0.0.0:7080-12) Updating resource type [JBossAS7:Threads(id=0)]...
                    15:17:51,753 INFO  [org.rhq.enterprise.server.resource.metadata.ResourceMetadataManagerBean] (http-/0.0.0.0:7080-12) Updating resource type [JBossAS7:DeploymentScanner(id=0)]...
                    15:17:52,492 INFO  [org.rhq.enterprise.server.resource.metadata.ResourceMetadataManagerBean] (http-/0.0.0.0:7080-12) Updating resource type [JBossAS7:Deployment(id=0)]...
                    15:17:53,482 INFO  [org.rhq.enterprise.server.resource.metadata.ResourceMetadataManagerBean] (http-/0.0.0.0:7080-12) Updating resource type [JBossAS7:Host(id=0)]...
                    15:17:53,909 INFO  [org.rhq.enterprise.server.resource.metadata.ResourceMetadataManagerBean] (http-/0.0.0.0:7080-12) Updating resource type [JBossAS7:Naming(id=0)]...
                    15:17:54,603 INFO  [org.rhq.enterprise.server.resource.metadata.ResourceMetadataManagerBean] (http-/0.0.0.0:7080-12) Updating resource type [JBossAS7:Security(id=0)]...
                    15:17:54,610 INFO  [org.rhq.enterprise.server.resource.metadata.ResourceMetadataManagerBean] (http-/0.0.0.0:7080-12) Removing type [JBossAS7:Security(id=10209)] from parent type [JBossAS7:Managed Server(id=10167)]...
                    15:17:54,678 INFO  [org.rhq.enterprise.server.resource.metadata.ResourceMetadataManagerBean] (http-/0.0.0.0:7080-12) Removing type [JBossAS7:Security(id=10209)] from parent type [JBossAS7:Profile(id=10174)]...
                    
                    

                     

                    The time is now 15:23.

                     

                    I have asked Oracle admin to set the distributed transaction timeout to 12 hours --- we will see if this is enough?

                     

                    As you can see, the only way I've found to trigger the Resource type update is changing version of AS7 plugin. Hopefully this does not hit me in a bad way in the future...

                    • 7. Re: Re: Differences in resource hierarchy for AS7 servers in RHQ 4.13?
                      Lukas Krejci Apprentice

                      Interestingly that is the exact same spot as in the not working logs you attached yesterday. I wonder if that could be a clue to this issue. I still think that this is a little bit non-standard situation because your JBossAS7:Security resource type seems to be (partially?) stuck in the state it was in RHQ 4.4.0, yet you're upgrading from 4.13 to 4.13.1 as fair as I gathered from the logs. I'm interested to see how it pans out after the 12hrs period.

                      • 8. Re: Differences in resource hierarchy for AS7 servers in RHQ 4.13?
                        jay shaughnessy Expert

                        Stian,


                        Certainly an exception like this during meta-data-update of the plugin is a serious issue than could leave your resource type in a bad state.  Let us know if you were able to avoid the 60 second timeout with the custom timeout you mention above.

                         

                        Looking in the code we basically try to perform the update of each type in its own Tx,  What you are seeing is that we may need to look at breaking up the update, specifically, the removal of obsolete child types, into their own nested Txs.  In your situation it seems the "Security" type is failing to be removed its former parent types.

                         

                        It's a little interesting that you would hit an issue as we've successfully upgraded some large installations. But each installation has its own nuances.

                        • 9. Re: Re: Differences in resource hierarchy for AS7 servers in RHQ 4.13?
                          Stian Lund Expert

                          Well, it didn't take that long of a time, this looks like it hit some local timeout on 30 minutes -  since there is no ORA-exception?

                           

                          15:47:55,442 ERROR [org.rhq.enterprise.server.core.plugin.ProductPluginDeployer] (http-/0.0.0.0:7080-12) Failed to register RHQ plugin file [file:/opt/rhq/rhq-server-4.13.1/modules/org/rhq/server-startup/main/deployments/rhq.ear/rhq-downloads/rhq-plugins/rhq-jboss-as-7-plugin-4.13.5.jar]: javax.ejb.EJBTransactionRolledbackException: Transaction rolled back
                              at org.jboss.as.ejb3.tx.CMTTxInterceptor.handleEndTransactionException(CMTTxInterceptor.java:138) [jboss-as-ejb3-7.4.0.Final-redhat-4.jar:7.4.0.Final-redhat-4]
                              at org.jboss.as.ejb3.tx.CMTTxInterceptor.endTransaction(CMTTxInterceptor.java:118) [jboss-as-ejb3-7.4.0.Final-redhat-4.jar:7.4.0.Final-redhat-4]
                          ...
                          Caused by: javax.transaction.RollbackException: JBAS014585: Transaction 'TransactionImple < ac, BasicAction: 0:ffff0a330849:39d62bff:54d379a7:b0b8 status: ActionStatus.ABORTED >' was already rolled back
                              at org.jboss.as.ejb3.tx.CMTTxInterceptor.endTransaction(CMTTxInterceptor.java:99) [jboss-as-ejb3-7.4.0.Final-redhat-4.jar:7.4.0.Final-redhat-4]
                              ... 216 more
                          
                          15:47:55,451 INFO  [org.rhq.enterprise.server.core.plugin.ProductPluginDeployer] (http-/0.0.0.0:7080-12) Plugin metadata updates are complete for [1] plugins: [JBossAS7]
                          
                          

                           

                          There are first several warnings on transactions that have run for a long time:

                           

                          15:35:44,392 WARN  [org.jboss.as.ejb3] (EJB default - 4) JBAS014143: A previous execution of timer [rhq.rhq-server.StartupBean 6a648ee0-6f17-442e-8a28-2b3ba5b0ba9b] is still in progress, skipping this overlapping scheduled execution at: Thu Feb 05 15:35:44 CET 2015
                          15:40:44,392 WARN  [org.jboss.as.ejb3] (EJB default - 10) JBAS014143: A previous execution of timer [rhq.rhq-server.StartupBean 6a648ee0-6f17-442e-8a28-2b3ba5b0ba9b] is still in progress, skipping this overlapping scheduled execution at: Thu Feb 05 15:40:44 CET 2015
                          15:45:44,393 WARN  [org.jboss.as.ejb3] (EJB default - 6) JBAS014143: A previous execution of timer [rhq.rhq-server.StartupBean 6a648ee0-6f17-442e-8a28-2b3ba5b0ba9b] is still in progress, skipping this overlapping scheduled execution at: Thu Feb 05 15:45:44 CET 2015
                          15:47:54,604 WARN  [com.arjuna.ats.arjuna] (Transaction Reaper) ARJUNA012117: TransactionReaper::check timeout for TX 0:ffff0a330849:39d62bff:54d379a7:b0b8 in state  RUN
                          15:47:54,611 WARN  [org.hibernate.engine.transaction.synchronization.internal.SynchronizationCallbackCoordinatorTrackingImpl] (Transaction Reaper Worker 1) HHH000451: Transaction afterCompletion called by a background thread; delaying afterCompletion processing until the original thread can handle it. [status=4]
                          
                          

                           

                          Then at 15:47 it stops, exactly 30 mins after last AS7 plugin log message about "Removing type [JBossAS7:Security"

                           

                          It very much looks like a complete purge of DB and reinstall is the only way to get this to upgrade, unless you guys have any ideas on how to clean up stuff?

                           

                          Would it for instance be possible to delete the "Security" type from DB manually, but if so, how would it be done without breaking a lot of constraints?

                           

                          Also, please, if possible I would really appreciate a pointer on how to trigger this upgrade manually... I am up to version 4.13.5 AS7 plugin now

                           

                          Stian

                          • 10. Re: Re: Differences in resource hierarchy for AS7 servers in RHQ 4.13?
                            jay shaughnessy Expert

                            Well it seems it's more of a deadlock situation, so it will hang for as long as it can wait for a timeout.  So, somehow something has to change to get this past the problem point.  I think ultimately this may require a code change like what I mentioned above, to use nested Txs to handle each of the removals.

                             

                            Perhaps one thing you could try is to ignore the Security type in advance of doing the upgrade.  By doing that you will remove all the Security resources and they will not be re-imported.  Maybe that will alter the behavior just enough to make a difference.  Administration->Configuration->Ignored Resource Types and then navigate to the Security type.

                            • 11. Re: Differences in resource hierarchy for AS7 servers in RHQ 4.13?
                              Stian Lund Expert

                              Thanks for the tips - I have tried it and it actually proceeded to remove the Security type now... but stops again on the type "JBossAS7:ModCluster Standalone Service" so I will try the same.

                               

                              Strange thing is, I notice in the reinstalled RHQ from scratch that these types still exist for AS7 Standalone:

                               

                               

                               

                              So There's still a Security type and a ModCluster type in the plugin? What happens if these are then re-enabled?

                               

                              Stian

                              • 12. Re: Differences in resource hierarchy for AS7 servers in RHQ 4.13?
                                Stian Lund Expert

                                Oh and this is interesting; Oracle DBA sent me snapshot of contention in the database during the upgrade:

                                 

                                • 13. Re: Differences in resource hierarchy for AS7 servers in RHQ 4.13?
                                  jay shaughnessy Expert

                                  Hmmm, well that is unfortunate that it then blocked on a subsequent type.  This really looks like it will need a code change in the Tx handling of meta-data update during upgrade.  I'm not sure what it is about your DB that brings this issue to the surface, but it doesn't really matter, there is clearly an issue.

                                  • 14. Re: Differences in resource hierarchy for AS7 servers in RHQ 4.13?
                                    Stian Lund Expert

                                    Thanks for understanding Jay

                                    I'd create a BZ but I think for me it would be too technical to explain it in transaction-handling terms...

                                     

                                    And to give status: I got past the ModCluster blockage - but then it stopped on "JBossAS7:Security" again, even though I'd disabled everything named Security in both AS7 Standalone Server and Host Controller. So it was blocking on a resource type that should not even be there...

                                     

                                    I have decided to start from a clean DB in production - it will mean we have no monitoring for some hours until we can inventory all resources and set correct passwords and so on. Wish I'd had some scripts to do it but it's quick enough I guess. Thank *** for dynagroups.

                                     

                                    At least starting from scratch will ensure further updates go without a hitch eh

                                     

                                    By the way, it's not that it doesn't "work", it's just that these inconsistencies are annoying, and I really want to be able to have metrics on sessions and gc/minute that I get in the updated plugin version.

                                     

                                    And generally there seems there could be a lot gained by improving the Tx-handling - I had problems similar to this with purging old configuration history, and had to set a small batch purge size to make it finish. It attempted to purge too large a number of dependent resources in DB and the tx timed out. BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1174747

                                     

                                    Stian