1 2 Previous Next 25 Replies Latest reply on Sep 13, 2010 6:46 PM by clebert.suconic Go to original post
      • 15. Re: OOM errors when using chained diverts
        clebert.suconic

        Thanks again Ronny,

         

         

        I have your test running for over 1 week here, and I couldn't get any failures.

         

        However, I could speed up testing on this now. If I set journalFileSize=100K, all the failures would happen in less than 5 or 10 minutes.

         

        I could already run your test with journalSize=100k for 4 hours, what created 1 million files, what would stress compacting and moveNextFiles (where the issues were relaying).

         

         

        I believe you would get a clean run now with trunk

        • 16. Re: OOM errors when using chained diverts
          ronnys

          Hi Clebert,

           

          thanks a lot, I'll give the new version a try.

           

          Best regards,

          Ronny

          • 17. Re: OOM errors when using chained diverts
            clebert.suconic

            You should try 2.1.2

             

            As I told you earlier I could accelarate the tests by setting file-size = 100k, what helped me to finish the tests by the release. I did a few more minor tweaks after my last message as part of the release testing.

             

            I don't think you would have issues, but please Let us know otherwise.

            • 18. Re: OOM errors when using chained diverts
              ronnys

              Hi Clebert,

              You should try 2.1.2

               

              As I told you earlier I could accelarate the tests by setting file-size = 100k, what helped me to finish the tests by the release. I did a few more minor tweaks after my last message as part of the release testing.

               

              I don't think you would have issues, but please Let us know otherwise.

               

              I tested with 2.1.2 and 100k journaling files; ~6.4 million files have been created. The issue did not reappear. Great job! All I got were the journaling errors I reported in my other thread (http://community.jboss.org/message/558274#558274).

               

              Best regards,

              Ronny

              • 19. Re: OOM errors when using chained diverts
                ronnys

                Hi Clebert,

                 

                sorry, new OOM condition found for Branch_2_1/r9608.

                 

                There are no HornetQ setup changes since the last test. This test is using the already known divert configuration again to distribute messages to 2 destinations.

                 

                I ran a test with 2 separate programs, a Message generator (attached) and a Message bridge (stripped down to the receiver part and attached), that forwards messages to one of our backend stores. 2 bridges are running, each one is receiving messages from one of the target queues of the divert. The properties files required by the programs are again the same as for the other tests.

                 

                After ~48h, HornetQ failed with OOM errors (Eclipse MAT report attached). It looks like transaction references are not being cleaned up. This test created much more transactions (~5.7M) than the tests before as each message got committed separately. This might be the reason why it didn't appeared earlier.

                 

                Could you please have a look?

                 

                Thanks & Best regards,
                Ronny

                • 20. Re: OOM errors when using chained diverts
                  clebert.suconic

                  Small transactions and paging is not a good combination.

                   

                  We can't guarantee (ATM) ACID by using the paging alone. So, we keep a living object for each living TX on paging. So, we can control if the Commit was accepted or not at the time of depage.

                   

                  We keep messages in memory until commit is called. At that point we save the data on paging, and add a PageTX on the journal. Everything will be accepted if the Commit succeeds and everything will be ignored otherwise. (ACID control).

                   

                  I have added a JIRA for 2.2 where I will do a bunch of improvements on paging, and that will include improving ACID control https://jira.jboss.org/browse/HORNETQ-499.

                   

                  That is a subtask of:

                   

                  https://jira.jboss.org/browse/HORNETQ-498

                   

                  ATM you could avoid this situation by either adding more memory for the active PageTransactions, or don't use Transaction at the producer side, and use duplicate detection.

                   

                  Anyway, I don't think there's a leak ATM. I can take a look if you tell me you are consuming all the messages, but I have looked into this possible leak recently and I didn't conclude there was a leak.

                  • 21. Re: OOM errors when using chained diverts
                    clebert.suconic
                    Small transactions and paging is not a good combination.

                     

                    I meant to say.. infinite paging.. like.. paging and never consuming any data.

                    • 22. Re: OOM errors when using chained diverts
                      ronnys

                      Hi Clebert,

                       

                      thanks for your reply. The bridge program previously attached consumes all messages from the divert target queues (2 bridge program running, one for each divert target queue). Just checked the logs of this test regarding the backlog: Shortly before the OOM error was thrown there were just about 7000 messages in each divert target queue, the rest was already consumed.

                       

                      Speaking of diverts: Could this be caused by the diverts? If you remember, I'm using an inbound queue that has 2 exclusive diverts to 2 outbound queues. Messages are never consumed from the inbound queue, cause this queue is always empty. Could this be the reason for the open transactions?

                       

                      Btw, I can send you the heap dump file as well if you need it. Do you have a drop box somewhere where I could upload it to (~931Mb)?

                       

                      Best regards,

                      Ronny

                      • 23. Re: OOM errors when using chained diverts
                        clebert.suconic

                        Are you consuming from both addresses after diverted? The pageTransaction elements will only be gone when you have consumed the messages from both places.

                        • 24. Re: OOM errors when using chained diverts
                          ronnys

                          Are you consuming from both addresses after diverted?

                           

                          Yes, I do.

                          • 25. Re: OOM errors when using chained diverts
                            clebert.suconic

                            This was a leak that I introduced on my last commit on Paging (I basically forgot to remove it from the list :-) )

                            ... I didn't create a JIRA as it didn't regress on any release.

                             

                             

                            I'm still working on the ordering issue now.

                            1 2 Previous Next