4 Replies Latest reply on Aug 27, 2015 10:50 AM by kucerarichard

    how to multi-step job with failures,  event-driven?


      Hi !


      First of all,  thank you for RHQ.


      I would like to do a multi-step job (drupal staging app refreshed from production).


      This is a thing in the past I would have done with RunDeck job features (ssh no longer allowed,  agent-based is control tier but that's a dead project).   It's also somewhat awkward to code the job for RunDeck when at job is targeting specific nodes,  there is no "overall" view above the targeted nodes.


      A reasonable thing to do would probably be Jenkins.


      However,  I had thought before dragging jenkins into this,  to try event-driven approach.


      That is,  at every step where there is a fail possible,    that piece writes its status to a log file which is monitored by RHQ.


      Then the alerting is setup to respond to success/failure statuses,   and it could even be many values for status,  so almost rules-like.


      The alerting calls one of the 3 kinds of script responses that an RHQ server can take -- resource script,  operation,  or CLI script.


      This way the entire thing is broken apart from a monolithic batch job,  I can trigger a mgmt response at any point in the batch job,  from simply writing to an event log somewhere,  even manually.   Or I can set the whole process off by simply moving a directory "www_dir" to "www_dir-REFRESH"... since that could be drift monitored.


      There can be "multiple" subscribers to an event that happened within a batch job,   this seems unheard of in rundeck.


      What's my question? 


      What question should I have?  Could this get out of hand or does it seem OK?   Is there something I am missing with System Mgmt and JON/RHQ and multi-step jobs?


      Could this be the "event driven enterprise"




        • 1. Re: how to multi-step job with failures,  event-driven?

          I think everything you propose is true. You can use Log events and/or Drift detection in conjunction with Alerting to basically execute some some of business process.  I'm not 100% sure it won't get out of hand because no where (jn RHQ) is the overall process defined or viewable. And it may be difficult to audit the executing path in a way that you find useful.  You can certainly give it a try, though.  It could work for you if the process is not too complex.  An alternative that comes to mind is maybe using some sort of business process mechanism, or something rules based, like Overlord/Drools.

          • 2. Re: how to multi-step job with failures,  event-driven?

            Thanks I'll give it a try! 


            Large parts of web infrastructure is currently unmanaged,  like a manual Goldberg machine.   For example I just got done with a set of bundles to match a handful of system mgmt use cases for a multi-instance multi-core Solr infrastructure (these bundles all turned out to be fileTemplateBundles,  despite them being considered simplistic).   At least with RHQ I'll have an automated Goldberg machine


            RHQ should add some system mgmt,  and I'll talk to the Mule ESB guys about definition/overview...we've already got that.    

            • 3. Re: how to multi-step job with failures,  event-driven?

              Currently I am comparing RHQ to Salt Orchestration for this.  I think Salt has fairly steep learning curve and if I don't use it every day it seems rather opaque, even though I've had the week training,  which is perhaps why SaltStack is introducing a graphical web console in the enterprise edition   


              Leaning towards RHQ...

              • 4. Re: how to multi-step job with failures,  event-driven?

                Just following up on myself...


                In RHQ/JON,  Multi-step orchestration across platforms/resources (ScriptServers) can be simply chained through the alert system.   No need to set up event monitoring for this purpose (that approach could still solve some other problems though,  very long awaited for events/completions).


                You can,  on success of a step,  alert through email that the next step is going to begin and also execute that next step on a related resource.


                You can also alert on fail of a step rather simply.


                This versus Salt Orchestration-- I think I would have to set up email alerting and code it somehow(not built in),  it's not clear if a step were to just fail I would have to go into the command line and query the job history to find out where it failed and what was going on etc.  Dunno have to review salt for that answer but it seems likely to not be implemented.   In the end you still just have a pile of code vs a model of the system in RHQ.


                Regards RHQ community,