12 Replies Latest reply on Apr 9, 2013 2:56 PM by frankgrimes97

    EmbeddedHornetQ.start() hangs on "Waiting to obtain live lock"

    frankgrimes97

      Hi All,

       

      We are running an embedded HornetQ server in our application. (version 2.2.19.Final-build2)

      We hit a case where occasionally our users try to launch our application twice and so one instance has grabbed the live lock and the second hangs indefinitely.

       

      Would there be a way to ensure that HornetQServerImpl/FileLockNodeManager only wait a configurable amount of time before giving up on starting?

       

      Thanks,

       

      Frank Grimes

        • 1. Re: EmbeddedHornetQ.start() hangs on "Waiting to obtain live lock"
          clebert.suconic

          That means something else is holding the lock the lock file.. another server running, a ghost process maybe? (you think it's dead and it's not!)

          • 2. Re: EmbeddedHornetQ.start() hangs on "Waiting to obtain live lock"
            frankgrimes97

            Yes, our app has been double-started and the embedded hornetq server is one of the first things we try start.

             

            How can we detect this ourselves and error out the second process?

            Hanging indefinitely really isn't desirable, is it?

             

            Would it not make sense to have both a EmbeddedHornetQ.start() and a EmbeddedHornetQ.start(maxWaitMilliseconds) API?

             

            FWIW, I noticed that the latest code in HornetQServerImpl.java has the following:

             

              @Override

               public boolean waitForActivation(long timeout, TimeUnit unit) throws InterruptedException

               {

                  return activationLatch.await(timeout, unit);

               }

             

            Could something like that be used?

             

            Thanks,

             

            Frank Grimes

            • 3. Re: EmbeddedHornetQ.start() hangs on "Waiting to obtain live lock"
              jbertram

              Hanging indefinitely really isn't desirable, is it?

              As I understand it, the live server will wait until it can get a lock because there might be a back-up which is holding the lock and when the back-up goes down then the live server can start up again.

               

              I'm not sure if EmbeddedHornetQ.start(maxWaitMilliseconds) is something we necessarily want.  It should be fairly simple for you to implement this functionality yourself by using an Executor and a CountDownLatch, eh?

              • 4. Re: EmbeddedHornetQ.start() hangs on "Waiting to obtain live lock"
                frankgrimes97

                Ok, so I guess I have to spin up my own Thread to wait/watch the server...

                 

                I'm looking at the code, and calling EmbeddedHornetQ.stop() doesn't seem to interrupt the live lock waiting.

                If my timeout is reached, what do I need to call to trigger FileLockNodeManager.interrupt()?

                 

                Thanks,

                 

                Frank Grimes

                • 5. Re: EmbeddedHornetQ.start() hangs on "Waiting to obtain live lock"
                  frankgrimes97

                  I tried testing this by calling EmbeddedHornetQ.stop() after my timeout expires but it is also blocking indefinitely.

                   

                  It looks like both start and stop synchronize on HornetQServerImpl.this:

                   

                     public synchronized void start() throws Exception { ...}

                   

                     protected void stop(boolean failoverOnServerShutdown, boolean criticalIOError) throws Exception {

                        synchronized (this) {

                        ...

                     }

                   

                   

                  Here is my sample code:

                   

                          EmbeddedHornetQ hornetQServer = new EmbeddedHornetQ();

                          ...

                          ExecutorService service = Executors.newSingleThreadExecutor();

                          try {

                              Runnable r = new Runnable() {

                                  @Override

                                  public void run() {

                                      try {

                                          hornetQServer.start();

                                      } catch (final Exception e) {

                                          throw new RuntimeException(e);

                                      }

                                  }

                              };

                   

                              Future<?> f = service.submit(r);

                   

                              f.get(30, TimeUnit.SECONDS);

                          } catch (final Exception e) {

                              hornetQServer.stop(); // BLOCKS

                              throw e;

                          }

                   

                  Message was edited by: Frank Grimes

                  • 6. Re: EmbeddedHornetQ.start() hangs on "Waiting to obtain live lock"
                    jbertram

                    Have you tried invoking stop() like you invoke start() (i.e. with a timeout)?  When your main() exits (assuming that's what will happen) I believe all the threads will be terminated.

                    • 7. Re: EmbeddedHornetQ.start() hangs on "Waiting to obtain live lock"
                      frankgrimes97

                      I might as well not even call stop() then, if it'll just hang and the JVM will terminate its thread when it exits.

                       

                      What I mean is that I don't see any difference between:

                       

                      1) HornetQServerImpl.start() acquires the synchronize lock on HornetQServerImpl.this

                      2) FileLockNodeManager tries to grab the lock file (looping/retrying indefinitely until interrupted flag is set)

                      3) HornetQServerImpl.stop() blocks on the synchronize lock on HornetQServerImpl.this

                      4) Main thread exits and JVM terminates

                       

                      and

                       

                      1) HornetQServerImpl.start() acquires the synchronize lock on HornetQServerImpl.this

                      2) FileLockNodeManager tries to grab the lock file (looping/retrying indefinitely until interrupted flag is set)

                      3) Main thread exits and JVM terminates

                       

                      Assuming of course that this would be safe and that no state/resources needs to be cleaned up by a call the stop() at this point.

                      • 8. Re: EmbeddedHornetQ.start() hangs on "Waiting to obtain live lock"
                        frankgrimes97

                        Actually, because HornetQ already spun up some threads just exiting my main thread isn't sufficient to stop the process.

                        I will need to explicitly invoke System.exit()... is that safe to do?

                        • 9. Re: EmbeddedHornetQ.start() hangs on "Waiting to obtain live lock"
                          jbertram

                          I think so.  From a HornetQ perspective very little has actually happened so there shouldn't be much (if anything) to clean up.

                          • 10. Re: EmbeddedHornetQ.start() hangs on "Waiting to obtain live lock"
                            frankgrimes97

                            Might the start() not have completed after my timeout but before I call stop()/System.exit()?

                             

                            Safety notwithstanding, being forced to spin up my own threads to implement start/stop timeouts and calling System.exit() is very ugly.

                            An embedded component (EmbeddedHornetQ) really should not force the embedding application to jump through hoops, nor should it force the embedding application to choose between hanging indefinitely and dying entirely.

                            I could maybe get it to work in my case, but only because HornetQ is a required component for my application's startup.

                            If it was an optional, I wouldn't be able to call System.exit() and would have no recourse.

                             

                            I really think it would make sense to allow for a configurable timeout on those kinds of blocking lifecycle operations.

                            Would you be willing to consider adding such a feature? Shall I create a JIRA?

                             

                            Thanks,

                             

                            Frank Grimes

                            • 11. Re: EmbeddedHornetQ.start() hangs on "Waiting to obtain live lock"
                              jbertram

                              Might the start() not have completed after my timeout but before I call stop()/System.exit()?

                              If you call stop() or System.exit() immediately after the timeout I think it's extremely unlikely that start() would complete.  Your talking nanoseconds here.

                               

                               

                              Safety notwithstanding, being forced to spin up my own threads to implement start/stop timeouts and calling System.exit() is very ugly.

                              I understand where you're coming from, but the design assumption here is that the process holding the lock on the journal is legitimate.  Your environment is comprimised, so to speak, because you have a zombie process or something holding the lock. 

                               

                               

                              I really think it would make sense to allow for a configurable timeout on those kinds of blocking lifecycle operations.

                              Would you be willing to consider adding such a feature? Shall I create a JIRA?

                              Feel free to add a JIRA.  However, if I get time to implement this I would probably simply add a flag to the configuration (e.g. FailIfUnableToLock) which would result in RuntimeException if HornetQ were unable to acquire the journal lock.  If the caller wanted, they could sleep/retry as desired.

                              1 of 1 people found this helpful