14 Replies Latest reply on May 1, 2013 3:53 PM by William DeCoste

    Showcase on Openshift keeps failing

    Brian Leathem Master

      Hello all,

       

      The Openshift showcase is repeatedly failing.  We have a monitor set up (uptimerobot.com) which notifies us when it fails, but it's failing frequently.  Here is the outage report for the past 2 months:

       

       02/26/2013 06:21:12    Down    No Response From The Website.    
       02/19/2013 07:23:56    Up    Successful response received.    
       02/19/2013 05:49:20    Down    No Response From The Website.    
       02/16/2013 11:03:01    Up    Successful response received.    
       02/15/2013 08:59:45    Down    No Response From The Website.    
       02/14/2013 06:43:09    Up    Successful response received.    
       02/14/2013 01:52:51    Down    No Response From The Website.    
       02/13/2013 20:40:07    Down    No Response From The Website.    
       02/13/2013 06:04:43    Up    Successful response received.    
       02/13/2013 05:01:33    Down    No Response From The Website.    
       01/29/2013 13:07:04    Up    Successful response received.    
       01/29/2013 06:37:40    Down    No Response From The Website.    
       01/17/2013 09:10:34    Up    Successful response received.    
       01/17/2013 09:10:33    Up    Successful response received.    
       01/17/2013 08:58:14    Down    No Response From The Website.    
       01/16/2013 00:19:11    Up    Successful response received.    
       01/15/2013 07:26:23    Down    No Response From The Website.    
       01/10/2013 09:28:52    Up    Successful response received.    
       01/10/2013 08:19:59    Down    No Response From The Website.    
       01/01/2013 12:17:34    Up    Successful response received.    
      

       

      The RichFaces dev and qe teams get notification when it fails, and we restart it based on whoever reacts first.  Until now, we haven't done anything to remedy the problem - I'd like to change that.

       

      For starters we need to track the events themselves.  We need to know:

       

      1. when it failed
      2. why it failed (server.log? openshift infrastructure problem?)
      3. what we did to get it going again (rhc command? irc/e-mail discussion with the openshift team?)

       

      Then we can review the information when time for it allows, and look at implementing a fix, or reporting a systematic problem to the openshift team.

       

      My question is where should we track these outages?

      • A wiki it too unstructured. 
      • Should we use jira? 
        • Do we do this as a single issue with a comment for each event? 
        • One issue for each event?
      • Any other SaaS tools people recommend for tracking production issues?

       

      Brian