1 2 Previous Next 29 Replies Latest reply on Mar 25, 2013 1:15 PM by mazz Go to original post
      • 15. Re: RHQ 4.6 high CPU usage on Linux (RHEL)
        pathduck

        Ok this is definitely good. I imported the resources I had set to Ignored in the Discovery Queue. They were stuff like mentioned before, Cobbler, Cron, Grub etc...

         

        Then I did an Uninventory on all of them, waited a bit and they did not show up in the Disc. Queue, as expected, since the plugins are disabled, so that was good.. Did a restart of the server to be sure. Now CPU usage looks much better so it's very hopeful. I will let it run over the weekend to have some data on actual cpu usage. But load is down from ~2.5 to 0.04

         

        We have another RHQ installation of 4.6 so if you guys want me to test other things there, make dumps etc to maybe nail the ultimate cause of this then let me know. There must be a bug somewhere in the discovery implementation - since we have the exact same plugins disabled in production 4.5.1 and we have resources in our Disc. Queue there as well that are just Ignored.

         

        Have a good weekend and thanks for the help so far

         

        -Stian

        • 16. Re: RHQ 4.6 high CPU usage on Linux (RHEL)
          pathduck

          Update: I have had RHQ 4.6 running over the weekend, cpu usage is down to the same as in 4.5.1 before.

           

          I have done an Unignore, Import and Uninventory on the resources that were set in the Ignored list, except on one server, to be able to do a bit more analyzing if possible. It would be great if someone else was able to reproduce though.

           

          Possible steps to reproduction (4.6):

          - Uninventory some OS services like Cron, Cobbler, GRUB, OpenSSH, Samba, Postfix and wait until they show up in discovery queue.

          - Disable the Agent Plugins for these services.

          - Do an Ignore on the discovered resources.

          - Do an Update Plugins on the RHQ Agent(s).

          - Possibly do a restart of the RHQ server (not sure if needed).

           

          You should see cpu increase, network traffic increase as well as constant requests to DiscoveryServerService.getResourcesAsList() on the agent for the resources that are in the Ignored list.

          • 17. Re: RHQ 4.6 high CPU usage on Linux (RHEL)
            mazz

            Stian, In your first step you say " Uninventory some OS services like Cron, Cobbler, GRUB, OpenSSH, Samba, Postfix and wait until they show up in discovery queue"

             

            Can I assume you DID have these committed into inventory at one point in time before? I think it might be an important piece if the resources were at one time committed but later uninventoried then ignored.

            • 18. Re: RHQ 4.6 high CPU usage on Linux (RHEL)
              pathduck

              John Mazzitelli wrote:

               

              Can I assume you DID have these committed into inventory at one point in time before? I think it might be an important piece if the resources were at one time committed but later uninventoried then ignored.

               

              That is right, they were in the inventory, and were uninventoried, then they were ignored. The agent plugins for the resource types were also disabled but not totally sure about the order there - I think the plugins were disabled after the resources were put in the Ignore list.

               

              So Ignore, then Disable plugins then Reload Plugins I guess.

               

              Not sure if they *have* to be OS services, possibly they might be any type of resource.

              • 19. Re: RHQ 4.6 high CPU usage on Linux (RHEL)
                mazz

                I replicated this and I see some odd behavior stepping through the code using your replication steps. Things don't have to be committed into inventory first - you can just start a clean server and agent, then, ignore a resource (can be anything) and then disable that plugin that defined the type of resource you just ignored. Once you do that, this odd behavior happens.

                • 20. Re: RHQ 4.6 high CPU usage on Linux (RHEL)
                  pathduck

                  That's good to hear John, "glad" you managed to replicate it on your side

                   

                  Are you running on RHEL as well, or would this be a more general problem unrelated to the underlying OS?

                   

                  You are seeing cpu usage increase as well? I guess dependent on how powerful a machine it is, and the number of agents as well as network bandwith, these are all factors that will affect how much cpu usage one will see.

                  • 21. Re: RHQ 4.6 high CPU usage on Linux (RHEL)
                    mazz

                    I'm on Fedora 15, but it doesn't matter. Its OS-indepedent. I didn't see my CPU peg to 100%, but I can see the agent spinning doing things over and over that it shouldn't be doing. So I could definitely see, depending on machine and number of resources in this odd state, where the CPU usage could spike. And its definitely continually going to the server, so you will see additional network usage as well. Overall, we managed to blow things up just perfectly

                     

                    The good news is you found the work around - just make sure you get those IGNORED resources back out of inventory. The problem I think stems from the combination of ignoring resources when you then later disabled their plugins.

                    • 22. Re: RHQ 4.6 high CPU usage on Linux (RHEL)
                      pathduck

                      Thanks John,

                      I'll go through our environments and get those resources out of the way so it doesn't hit us when eventually upgrading to 4.6 there.

                       

                      But I reckon I will wait a bit before moving this release up the stack, I installed it to test if some other things were fixed, hopefully they will be in the next release

                       

                      So how do we go on with this case, should I create a Bugzilla case with the recreation steps etc, so it can be looked at further?

                       

                      Stian

                      • 23. Re: RHQ 4.6 high CPU usage on Linux (RHEL)
                        mazz

                        Stian, yes, please create a bugzilla on this with your recreation steps, etc. I'll be looking at this starting today - in fact, its kind of serendipitous, because I was in this code fixing something very similar last week but I haven't finished. But definitely create a BZ and I'll link the two. I think if I fix that one I just linked, it will fix yours - killing two birds with one stone.

                        • 24. Re: RHQ 4.6 high CPU usage on Linux (RHEL)
                          pathduck

                          Ok,

                          I created the bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=923210

                           

                          The other case sounds good to get a change done for - I've been thinking it would be simple to be able to have a "Remove" button on the Discovery Queue, since in the case of disabling agent plugins they would still show up in the Ignored list. It is a lot of work to first Import a resource, then do Uninventory of it, to make sure it's gone for good - especially if you have a lot of servers.

                           

                          Thanks a lot for the help so far John   Hope this can help get this bug into a possible coming 4.6.1

                           

                          all the best,

                          Stian

                           

                           

                          • 25. Re: RHQ 4.6 high CPU usage on Linux (RHEL)
                            jayshaughnessy

                            I agree. And honestly, isn't "Remove" what people really want when using "Ignore"?  Does anyone ever really want to unignore something, I doubt it.  I think they never want to see it again .  And most likely, anything of that type.

                            • 26. Re: RHQ 4.6 high CPU usage on Linux (RHEL)
                              mazz

                              As you know, the problem with "removing" entirely the resource is that, once its gone from the DB and the agent, the agent will simply just rediscover it later and report it again. Hence, the purpose of "Ignore".

                               

                              Jay - you should implement a DELETE inventory status to indicate a removed resource. Oh wait....

                              • 27. Re: RHQ 4.6 high CPU usage on Linux (RHEL)
                                mazz

                                I have completed the work on this issue.

                                 

                                In fact,https://bugzilla.redhat.com/show_bug.cgi?id=923210 has incorporated in the fix for https://bugzilla.redhat.com/show_bug.cgi?id=535289

                                 

                                So, what this means is a new feature is being proposed to be added into master branch. You can now ignore resource types and any resources of that type that currently exist in inventory will be removed and no new resources of that type will get into inventory. On the agent side, each time the agent sends up a inventory report (which merges/syncs inventory between server and agent), the agent will be told of the latest set of ignored types. The agent will then no longer discover or manage resources of those ignored types.

                                 

                                How does this help the issue in this thread? The agent will also examine inventory status of resources that it syncs with the server - if the resource is in the IGNORED state, it removes it from agent-side inventory entirely. Once this happens, the CPU issues go away because the agent won't do anything related to those ignored resoruces.

                                 

                                The code lives in the RHQ fedorahosted repo under the  branch  named "bug/rhq-1"

                                 

                                I'll do some more testing on this and allow folks to chime in with questions/concerns prior to merging to master.

                                • 28. Re: RHQ 4.6 high CPU usage on Linux (RHEL)
                                  pathduck

                                  Sounds very good John, thanks for the update

                                   

                                  I guess this means once some more testing is done,  this fix can make it into an eventual 4.6.1 (or other) release, whenever that comes?

                                   

                                  Also a small question; Maybe I'm being a bit daft, but the cpu usage was on the server - it was the server constantly sending those discovery messages to the agents. I guess this will also make sure the server won't do anything related to those resources? I guess since all of that type are completely ignored it won't, just wondering a little about the specifics.

                                   

                                  Will there be a button "Ignore Resource Type" in the UI for this?

                                  • 29. Re: RHQ 4.6 high CPU usage on Linux (RHEL)
                                    mazz

                                    right CPU cycling on server - the issue was, the agent was continually bombarding the server with requests for inventory when it shouldn't have been. It appears that resources with an ignored status was causing the agent to go into a spin which in turn caused it to send all of those messages to the server every couple of seconds.

                                     

                                    If all goes well, this should be in the next release of RHQ. I'm gonna try to get this into master branch soon. I'm writing up some tests now that have to go in first.

                                     

                                    The Ignore Resource Type feature will be a new menu item on the left side menu of items in the Adminstration page. It will be Administration>Ignored Resource Types. It will look just like the "Alert  Templates" page (with the three tables/trees of resource types) - only when you click the "pencil" edit icon it will flip the setting from Enabled to Ignored and back again.

                                    1 2 Previous Next