1 2 Previous Next 28 Replies Latest reply on Feb 1, 2013 8:00 PM by takeshi10

    Messages not being delivered or a queue in a weird state

    takeshi10

      So... something weird happened on our production server today. All of a sudden, hornetq decided to stop delivering messages on one of the queues (the others were fine). Unfortunatelly, it was our most important queue. Restarting the consumers, pausing and resuming de queue and also restarting the server had no effect.

      I also noticed that the journal directory was very loaded (1.5gb) and while restarting hornetq, i've got some IllegalArgumentException's saying that the state of the journal was wrong:

       

      17:38:10,083  WARN PageCursorProviderImpl:76 - Couldn't complete cleanup on paging

      java.lang.IllegalStateException: Journal must be loaded first

       

      and also, all of the journal would be loaded into memory (the heap would get over 4gb).

       

      So I stopped hornetq, cleaned up the journal directory and everything started working fine (except that i probably lost those messages but i cannot tell). I can probably provide the data files for hornetq but it will not be easy (it uses 300mb gzipped).

       

      Starting a hornetq with those files lead me to the same state: no messages get delivered to that queue and inspecting said queue through JMX (using jconsole) leads me to a very weird state: if says i have 130k+ messages but no method can look at them (or move / delete / expire) and i see no way to, at least, remove them from the heap.

       

      Can anybody help me or give me any directions? Any help at all will be much apreciated  - we're moving to the point where we will have to switch to another JMS provider if that happens again.

       

      Oh and I'm using hornetq 2.2.21.Final but the same thing happens if i use the same data files and hornetq 2.3.0.Beta1 (except that new messages do not get stuck but gets delivered normally)

       

      Many many thanks

        • 1. Re: Messages not being delivered or a queue in a weird state
          clebert.suconic

          Start by looking at print-data, and print-pages.... that file may be easier to share.

           

          I'm wondering what caused that.

          • 2. Re: Messages not being delivered or a queue in a weird state
            takeshi10

            where can i find those? any pointers to how do i analyze them?

             

            thanks a lot

            • 3. Re: Messages not being delivered or a queue in a weird state
              takeshi10

              so it may or may not be related, but something weird showed up in the logs:

               

              15:29:38,814 ERROR QueueImpl:66 - Failed to deliver

              java.lang.RuntimeException

                        at org.hornetq.core.paging.cursor.impl.PageSubscriptionImpl$CursorIterator.next(PageSubscriptionImpl.java:1241)

                        at org.hornetq.core.paging.cursor.impl.PageSubscriptionImpl$CursorIterator.hasNext(PageSubscriptionImpl.java:1392)

                        at org.hornetq.core.server.impl.QueueImpl.deliver(QueueImpl.java:1830)

                        at org.hornetq.core.server.impl.QueueImpl.access$1000(QueueImpl.java:77)

                        at org.hornetq.core.server.impl.QueueImpl$DeliverRunner.run(QueueImpl.java:2481)

                        at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100)

                        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

                        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

                        at java.lang.Thread.run(Thread.java:722)

              Caused by: java.lang.NullPointerException

                        at org.hornetq.core.paging.cursor.impl.PageCursorProviderImpl.getMessage(PageCursorProviderImpl.java:119)

                        at org.hornetq.core.paging.cursor.impl.PageSubscriptionImpl.getReference(PageSubscriptionImpl.java:347)

                        at org.hornetq.core.paging.cursor.impl.PageSubscriptionImpl.access$1300(PageSubscriptionImpl.java:67)

                        at org.hornetq.core.paging.cursor.impl.PageSubscriptionImpl$CursorIterator.moveNext(PageSubscriptionImpl.java:1269)

                        at org.hornetq.core.paging.cursor.impl.PageSubscriptionImpl$CursorIterator.next(PageSubscriptionImpl.java:1236)

                        ... 8 more

               

               

              does it ring any bell?

               

              also, i've tried PrintPages and PrintData but i could not understand what was what

               

              thansk in advance

              • 4. Re: Messages not being delivered or a queue in a weird state
                clebert.suconic

                There were a few fixes beyond 2.2.21...

                 

                In particular...

                 

                https://issues.jboss.org/browse/JBPAPP-10338 paging may lose messages if acks are done out of order during restart

                 

                 

                 

                It seems that would be a good idea to move to the latest available version.. even if you use git to get to it (case you don't have an EAP subscription).

                 

                 

                 

                It seems that the page file was deleted maybe while the journal still showing some acks in a previous page?  I would need your data to know what's going on.   (maybe you could send the print-data nad print-pages outputs to me).  (take a look on the properties before you send to me... to make sure you're not sending anything you coudn't send... although I never look at anyt of that data).

                • 5. Re: Messages not being delivered or a queue in a weird state
                  clebert.suconic

                  My email is fairly easy to be found   for sending the print-data and print-pages

                   

                   

                  and you can even speak in Portuguese to me if you like on the email.. from what I see you're Brazilian as well.

                  • 6. Re: Messages not being delivered or a queue in a weird state
                    takeshi10

                    Hi Clebert thanks for all the attention. The previous NPE was a result of my moving files around i think so i guess we can safely ignore it (although NPE is, IMO, always a bug).

                    I'm back to square one: messages are stuck in the queue and are never delivered (i also cannot move/expire them using JMX so something is absolutely wrong). It happened again monday on a non critial queue so it was not a big deal.

                     

                    I also get some instances of

                     

                    12:35:50,790  WARN PageCursorProviderImpl:76 - Couldn't complete cleanup on paging

                    java.lang.IllegalStateException: Journal must be loaded first

                              at org.hornetq.core.journal.impl.JournalImpl.appendDeleteRecord(JournalImpl.java:973)

                              at org.hornetq.core.journal.impl.JournalImpl.appendDeleteRecord(JournalImpl.java:961)

                              at org.hornetq.core.persistence.impl.journal.JournalStorageManager.deletePageComplete(JournalStorageManager.java:727)

                              at org.hornetq.core.paging.cursor.impl.PageSubscriptionImpl.onDeletePage(PageSubscriptionImpl.java:786)

                              at org.hornetq.core.paging.cursor.impl.PageCursorProviderImpl.onDeletePage(PageCursorProviderImpl.java:523)

                              at org.hornetq.core.paging.cursor.impl.PageCursorProviderImpl.cleanup(PageCursorProviderImpl.java:492)

                              at org.hornetq.core.paging.cursor.impl.PageCursorProviderImpl$1.run(PageCursorProviderImpl.java:310)

                              at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100)

                              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

                              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

                              at java.lang.Thread.run(Thread.java:662)

                     

                    while it is starting up and it looks like a racing condition (but i dont think it has anything to do with my problem anyway)

                     

                    Im checking out from git to try it out and i will send the print data and print pages if it doesnt work.

                    Thank you so much

                     

                    Oh and yes i am brazilian and i think we went to the same college (IME-USP) but a couple of years apart

                    • 7. Re: Messages not being delivered or a queue in a weird state
                      clebert.suconic

                      You should probably upgrade to the latest version I told you ASAP. there were some bug fixes on paging.

                       

                       

                      It's hard for me to pinpoint the cause without looking at your data. But the upgrade will definitely fix things.

                       

                       

                      It would also be great to look at the full logs... there was probably an error before that.

                       

                       

                       

                      And regarding the NPE... the cache the position was supposed to be there. I could of course treat the NPE and throw another exception. But there would be an error regardless if you move files.

                      • 8. Re: Messages not being delivered or a queue in a weird state
                        clebert.suconic

                        @Marcelo: Why don't you come online on IRC and we can talk about this.

                         

                        You should definitely move to the newer version. But I'm interested on learning what happened, to make sure it won't happen again. You are probably using some usecase that wasn't planned. (Are you using filters on paging for instance?.. if ou are.. you should move even sooner)

                        • 9. Re: Messages not being delivered or a queue in a weird state
                          takeshi10

                          Ok so im back. I could join IRC but i dont want to bother you so much.

                          At any rate, i did some experiments and here are the results (still using 2.2.21.Final)

                          -the queue getting stuck is certainly something to do with paging because if i clear that, everything almost works again except that messages that are journaled are not consumed by the message listeners

                          -the journal files are loaded into memory but are never collected (and i suspect it may cause OOM's down the line but i am unsure) but, like i said above, the messages are never consumed and never leave the heap even after a full GC

                           

                          Also i've tried the latest version from the 2.2.EAP branch - namely 2.2.24.EAP.snapshot - and the "queue stuck issue" is definitely gone, but the other issue (queue with X messages on its metadata and using a lot of heap) is still there and it may be related to the exception below:

                           

                          16:43:24,439  WARN PageSubscriptionImpl:76 - Error while deleting page-complete-record

                          java.lang.IllegalStateException: Journal must be loaded first

                                    at org.hornetq.core.journal.impl.JournalImpl.appendDeleteRecord(JournalImpl.java:973)

                                    at org.hornetq.core.journal.impl.JournalImpl.appendDeleteRecord(JournalImpl.java:961)

                                    at org.hornetq.core.persistence.impl.journal.JournalStorageManager.deletePageComplete(JournalStorageManager.java:727)

                                    at org.hornetq.core.paging.cursor.impl.PageSubscriptionImpl.onDeletePage(PageSubscriptionImpl.java:830)

                                    at org.hornetq.core.paging.cursor.impl.PageCursorProviderImpl.onDeletePage(PageCursorProviderImpl.java:582)

                                    at org.hornetq.core.paging.cursor.impl.PageCursorProviderImpl.cleanup(PageCursorProviderImpl.java:551)

                                    at org.hornetq.core.paging.cursor.impl.PageCursorProviderImpl$1.run(PageCursorProviderImpl.java:340)

                                    at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100)

                                    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

                                    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

                                    at java.lang.Thread.run(Thread.java:662)

                           

                          Can i use (from a licence point of view) use this version even though its marked EAP?

                           

                          Also do you need more information? Do i still send you the print-data and print-pages (they're pretty huge though)?

                           

                          Thanks a lot

                          • 10. Re: Messages not being delivered or a queue in a weird state
                            takeshi10

                            And i forgot to mention, but i dont use any fancy feature and my use case should be pretty trivial:

                             

                            I have a small cluster (2 nodes at the moment) with 6 queues (including DLQ) and 1 topic (just for statistics broadcasting) and all of them are clustered.

                            Each queue (except DLQ) has a number of consumers (64 currently). The producers usually put an average of 300 messages per second (and rarely over 600) on all the queues with an almost identical body that is at most 500 bytes each transacted (so its all or nothing on putting messages on the queues). The consumers either process the message or put another messages on the queue.

                            I dont use any filters or selectors and messages are rarely paged (or at least the queue size doesnt go up so much that it has to be paged).

                            Oh and hornetq is running embedded in my application so local message consumption (is this even a word?) is local using in-vm connection factories and the netty connectors are used only to bridge messages between the node clusters. (i also had some questions about how such routing is done but ill look the code first).

                            I think thats all. If you need any more info, ill gladly provide'em

                            • 11. Re: Messages not being delivered or a queue in a weird state
                              clebert.suconic

                              > Can i use (from a licence point of view) use this version even though its marked EAP?

                               

                               

                              Of course.... the source code is Apache License with 1 class in LGPL.. so, you're free to use.

                               

                              EAP adds support and build distribution (including patches... etc).

                               

                               

                              so, EAP is a value added where you get some structure from RedHat along the software. If you get the software directly.. you're still free to use it.

                               

                               

                               

                               

                               

                              .... Ahhh.. that queue stuck was a bug that was fixed indeed.. I remember about that now... I'm still not sure what caused the Journal loaded.. maybe some of your embedded code is wrong?

                              • 12. Re: Messages not being delivered or a queue in a weird state
                                clebert.suconic

                                What application server are you using BTW?

                                 

                                 

                                If you are using Embedded.. you should probably use the Branch_2_2_AS7. if you are using AS7.. definitely get the AS7 branch. (The only differences are classLoading... nothing much beyond that).

                                • 13. Re: Messages not being delivered or a queue in a weird state
                                  clebert.suconic

                                  >> Ok so im back. I could join IRC but i dont want to bother you so much.

                                   

                                   

                                  It would be my pleasure... really!

                                   

                                   

                                  I like to help users in trouble so we get more users using Hornetq, and also.. mainly...  for the sake of avoiding future errors...

                                  • 14. Re: Messages not being delivered or a queue in a weird state
                                    takeshi10

                                    Im using tomcat 7 and when i say embedded really is just invoking EmbeddedJMS's start method and sniffing it to get the queue connection factory and registering a TopologyListener. I dont think it has to do with it, but i will try the standalone to see what i get. Im mimicing the startup script but maybe my configuration is just funky. I will clean it up and attach it too.

                                    Im deploying the latest version now on our testing server and everything seems to be working great.

                                    1 2 Previous Next