1 2 3 4 Previous Next 57 Replies Latest reply on Jul 22, 2010 2:55 AM by azserve.luca

    IllegalStateException when loading journal on startup

    dfisher

      Using JBoss 5.1.0, Java 1.6, Hornetq 2.1.0 Beta3

       

      I'm seeing an intermittent error where JBoss cannot start the Hornetq server when there are durables messages in the journal:

      Deployment "JMSServerManager" is in error due to the following reason(s): java
      .lang.IllegalStateException: Cannot find queue messages 2147533706

      The logs show the following stacktrace:

      ERROR [org.jboss.kernel.plugins.dependency.AbstractKernelController] (main) Error installing to Start: name=JMSServerManager state=Create
      java.lang.IllegalStateException: Cannot find queue messages 2147533706
              at org.hornetq.core.persistence.impl.journal.JournalStorageManager.loadM
      essageJournal(JournalStorageManager.java:874)
              at org.hornetq.core.server.impl.HornetQServerImpl.loadJournals(HornetQSe
      rverImpl.java:1206)
              at org.hornetq.core.server.impl.HornetQServerImpl.initialisePart2(Hornet
      QServerImpl.java:1056)
              at org.hornetq.core.server.impl.HornetQServerImpl.start(HornetQServerImp
      l.java:310)
              at org.hornetq.jms.server.impl.JMSServerManagerImpl.start(JMSServerManag
      erImpl.java:233)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

      Is there a way to recover from this?

      I've had to manually delete the journal to get a clean start.

        • 1. Re: IllegalStateException when loading journal on startup
          timfox

          Are you sure you don't have more than one server instances pointing at the same data?

          • 2. Re: IllegalStateException when loading journal on startup
            dfisher

            There is only one java process running on the host.

            We do use a separate partition for data, but I don't think that is related.

             

            I've had a hard time reliably replicating this problem.

            I think we've seen it once or twice a week over two months of testing.

            The stacktrace seems to indicate that the journal loaded delivery count data but found no corresponding messages.

             

            Perhaps this is expected behavior? with:

            <persist-delivery-count-before-delivery>false</persist-delivery-count-before-delivery>
            • 3. Re: IllegalStateException when loading journal on startup
              clebert.suconic

              Did you have any errors on the server before you shut the instance?

               

               

              If you see it happening again.. it would be wise to keep the data directory. Maybe I could evaluate what happened by looking at the data.

              • 4. Re: IllegalStateException when loading journal on startup
                clebert.suconic

                Did you delete any queues before shutdown?

                 

                Or did you delete the bindings journal while restarting it?

                 

                The message is saying it can't find the queue during startup:

                 

                               if (queueMessages == null)
                               {
                                  throw new IllegalStateException("Cannot find queue messages " + encoding.queueID);
                               }

                 

                 

                 

                Maybe you could try replicating the procedure you did right before it failed.

                • 5. Re: IllegalStateException when loading journal on startup
                  dfisher

                  Did you delete any queues before shutdown?

                  No.

                  Or did you delete the bindings journal while restarting it?

                  No.

                  The message is saying it can't find the queue during startup:

                   

                                 if  (queueMessages == null)
                                 {
                                    throw new  IllegalStateException("Cannot find queue messages " + encoding.queueID);
                                 }

                  Specifically, doesn't the source indicate that it cannot find the queue in order to update the delivery count for a message?

                  Would this be indicative of a message having been ack'd, removed from the journal, and then shutdown before the delivery count could be updated?

                  Maybe you could try replicating the procedure you did right before it failed.

                  I'll continue trying to replicate this problem in a simplfied setup.

                  I'll be sure to watch logs before and after the event.

                   

                  Regardless of what's causing this problem with the journal, is it reasonable to ignore this type of error on startup?

                  I certainly wouldn't mind discarding delivery count updates if it meant I could continue processing the journal and get the server started.

                  • 6. Re: IllegalStateException when loading journal on startup
                    clebert.suconic

                    I will make the whole switch to this (at the end of the post)...

                     

                    You could try it out. However there should be other records also missing the queue that would be critical missing.

                     

                    You could try trunk now if you want. It's almost CR1 at this point. (we should retag it this week.. planned for tomorrow ATM)

                     

                    https://jira.jboss.org/browse/HORNETQ-394

                     

                     

                    
                    
                    
                    case UPDATE_DELIVERY_COUNT:
                                {
                                   long messageID = record.id;
                    
                                   DeliveryCountUpdateEncoding encoding = new DeliveryCountUpdateEncoding();
                    
                                   encoding.decode(buff);
                    
                                   Map<Long, AddMessageRecord> queueMessages = queueMap.get(encoding.queueID);
                    
                                   if (queueMessages == null)
                                   {
                                      log.warn("Cannot find queue "  + encoding.queueID + " to update delivery count");
                                   }
                                   else
                                   {
                                      AddMessageRecord rec = queueMessages.get(messageID);
                       
                                      if (rec == null)
                                      {
                                         throw new IllegalStateException("Cannot find message " + messageID);
                                      }
                       
                                      rec.deliveryCount = encoding.count;
                                   }
                    
                                   break;
                                }
                    
                    
                    
                    
                    
                    
                    • 7. Re: IllegalStateException when loading journal on startup
                      clebert.suconic

                      fyi: I have made another change on the method after I posted. Look at trunk if you want.

                      • 8. Re: IllegalStateException when loading journal on startup
                        timfox

                        Can you explain what the problem is here first, before jumping in with the solutions?

                        • 9. Re: IllegalStateException when loading journal on startup
                          clebert.suconic

                          We don't know what's the problem is yet... we're still investigating.

                           

                          So far the only thing I'm doing is making a Critical error into a Warning. As we could just throw warnings on logs for UPDATE_DELIVERY_COUNT.

                          • 10. Re: IllegalStateException when loading journal on startup
                            timfox

                            I don't get it.

                             

                            How can you change the error to a warning if you don't know what the problem is? How is this appropriate. Sounds like you'rejust supressing the issue and masking the true problem.

                            • 11. Re: IllegalStateException when loading journal on startup
                              clebert.suconic

                              We must find ACKs and ADD. We should interrupt the loading if that happened.

                               

                              However on Update-Count... we can't ignore the error.. but we could throw a warning on logs and keep loading the rest of the information.

                               

                              If the issue is just on update-count.. I don't see a reason to interrupt starting the server.

                              • 12. Re: IllegalStateException when loading journal on startup
                                dfisher

                                I was able to find a backup of a corrupted journal.

                                Testing with latest trunk produced the following error for a journal created with 2.1.0Beta3:

                                Deployment "JMSServerManager" is in error due to the following reason(s): HornetQException[errorCode=6 message=Journal data belong to a different version]

                                I then applied your patch to Beta3 and confirmed that you are correct.

                                There are other critical records missing.

                                 

                                I then changed the other cases that throw IllegalStateException to log warnings instead and got the server to load the journal.

                                What we saw is a lot of logs indicating that delivery count could not be updated with some cases where the ACKNOWLEDGE_REF could not be found either.

                                WARN  [org.hornetq.core.persistence.impl.journal.JournalStorageManager] (main) Cannot find message 118 to update delivery count
                                WARN  [org.hornetq.core.persistence.impl.journal.JournalStorageManager] (main) Cannot find message 119 to update delivery count
                                WARN  [org.hornetq.core.persistence.impl.journal.JournalStorageManager] (main) Cannot find message 120 to update delivery count
                                WARN  [org.hornetq.core.persistence.impl.journal.JournalStorageManager] (main) Cannot find message (ACKNOWLEDGE_REF) 120
                                WARN  [org.hornetq.core.persistence.impl.journal.JournalStorageManager] (main) Cannot find message 120 to update delivery count
                                WARN  [org.hornetq.core.persistence.impl.journal.JournalStorageManager] (main) Cannot find message 107 to update delivery count

                                 

                                My hope was that there was viable records in the journal that could be recovered, this turned out not to be the case.

                                However, once hornetq was started the journal was usable.

                                 

                                While I think the journal load on startup should be as robust as possible, I'll hold off on submitting a feature request until I can produce a journal that has recoverable records in it.

                                 

                                We'll continue trying to reproduce this....

                                • 13. Re: IllegalStateException when loading journal on startup
                                  clebert.suconic

                                  do you have confidential data on the journal?

                                   

                                  Would you be ok on sharing it with me at least? I just wanted to take a peek on what could possibly cause it.

                                   

                                  you could use my private email if you need to share it privately only. csuconic at redhat dot com

                                  • 14. Re: IllegalStateException when loading journal on startup
                                    clebert.suconic

                                    What kind of load do you have?

                                     

                                    what is the average size of your messages?

                                     

                                    What partition type are you using where the journal is located?

                                    1 2 3 4 Previous Next