1 2 Previous Next 16 Replies Latest reply on Dec 3, 2002 12:24 PM by cstone

    Dying of JBossMQ server under load

    yamax

      Hi all,

      We have an application which generates 300 messages/second. These messages get published onto a Topic.
      These 300 messages represent 300 separate producer streams, all of which get published onto a topic. The consumers make distinctions based on selectors if necessary.
      As long as there are no consumers (subscribers to the topics) server hums along merrily.

      When i attach a consumer (1 subscription) with a selector to select only one stream (so 1 message/second) and let it receive data for about 15 minutes, and disconnect the subscriber all hell breaks loose, the CPU consumption becomes erratic, and keeps on increasing till the server no longer responds and ultimately dies.

      I have started to look into the code under server topic, it would be useful if someone could throw some hints (to narrow down suspects) as to where i should be looking, it would be greatly appreciated. I am trying to fix the problem.

      We are using JBoss 3.0.0 (tried 3.0.1 and 3.0.2 but it brought additional problems with EJB jar deployments and dependecies), on Win2K server on a 1.2GHz dual CPU machine.

      All help is appreciated,

      Regards,

      nitin

        • 1. Re: Dying of JBossMQ server under load
          skendorski

          Can't help, but I think I'm seeing the same thing. I have multiple publishers and subscribers using selectors and it works great - for about 24,000 messages (using JBoss 3.0.2) then performance grinds to a halt, I get JVM out of memory errors and the server dies. Changing the High and Max memory around will just shift the problem (mostly so it shows up earlier). Things got a little better with 3.0.3 (65,000 messages), but it's still unusable in my application. I'm running on Windows 2000, 512MB of ram, non-persistent messages, java 1.4. Help!

          • 2. Re: Dying of JBossMQ server under load
            erok

            Hi,

            I have no final solution, but I have experienced a similar scenario, and made a workaround for our application:

            Our application currently runs on:
            JBoss 3.0.2
            Redhat linux 7.2
            Sun jdk 1.4.0
            (But I have tried with JBoss 3.0.0 and 3.0.1 as well with no luck)

            Our ejb-application had ~30 non-ejb clients communicating via JMS, both topics and queues. (Since I couldn't find any solution to this problem with topics, I moved all JMS stuff to queues, since I have found them to work better, although increasing the workload of the ejb, selecting where to send messages)

            We had a number of OOM exceptions's before finding what caused it, JMS messages was caching to disk. After a 24 hour period we could have 400000 messages cached to disk, making it a nessecity to restart JBoss every 12 hours to keep it at fairly good performance. My search in this forum led me to belive that it was caused somehow because I used topics with selectors, so I removed all selectors from the clients and implemented a check in the client code instead. It did NOT solve my problem, messages was still caching to disk!

            In my continued search for the problem I noticed that clients that didn't disconnect it's TopicConnection correctly from the JBossMQ Server, if for instance there had been a network failure or machine failure, the JBossMQ server NEVER released that connection until restarted, thus the JBossMQ thought it still had that "dead" client connected as a subsrciber, and cached messages for it to consume. After moving all my usage of topics to queues, I have seen the same caching of messages BUT the connection is eventually released, and caching of messages stops when the connection is released!!

            We have not needed to restart JBoss for two weeks now!!

            Hope this helps,
            /Erik Lindqvist

            • 3. Re: Dying of JBossMQ server under load
              erok

              Hi,

              I have no final solution, but I have experienced a similar scenario, and made a workaround for our application:

              Our application currently runs on:
              JBoss 3.0.2
              Redhat linux 7.2
              Sun jdk 1.4.0
              (But I have tried with JBoss 3.0.0 and 3.0.1 as well with no luck)

              Our ejb-application had ~30 non-ejb clients communicating via JMS, both topics and queues. (Since I couldn't find any solution to this problem with topics, I moved all JMS stuff to queues, since I have found them to work better, although increasing the workload of the ejb, selecting where to send messages)

              We had a number of OOM exceptions's before finding what caused it, JMS messages was caching to disk. After a 24 hour period we could have 400000 messages cached to disk, making it a nessecity to restart JBoss every 12 hours to keep it at fairly good performance. My search in this forum led me to belive that it was caused somehow because I used topics with selectors, so I removed all selectors from the clients and implemented a check in the client code instead. It did NOT solve my problem, messages was still caching to disk!

              In my continued search for the problem I noticed that clients that didn't disconnect it's TopicConnection correctly from the JBossMQ Server, if for instance there had been a network failure or machine failure, the JBossMQ server NEVER released that connection until restarted, thus the JBossMQ thought it still had that "dead" client connected as a subsrciber, and cached messages for it to consume. After moving all my usage of topics to queues, I have seen the same caching of messages BUT the connection is eventually released, and caching of messages stops when the connection is released!!

              We have not needed to restart JBoss for two weeks now!!

              Hope this helps,
              /Erik Lindqvist

              • 4. Re: Dying of JBossMQ server under load
                erok

                Hi,

                I have no final solution, but I have experienced a similar scenario, and made a workaround for our application:

                Our application currently runs on:
                JBoss 3.0.2
                Redhat linux 7.2
                Sun jdk 1.4.0
                (But I have tried with JBoss 3.0.0 and 3.0.1 as well with no luck)

                Our ejb-application had ~30 non-ejb clients communicating via JMS, both topics and queues. (Since I couldn't find any solution to this problem with topics, I moved all JMS stuff to queues, since I have found them to work better, although increasing the workload of the ejb, selecting where to send messages)

                We had a number of OOM exceptions's before finding what caused it, JMS messages was caching to disk. After a 24 hour period we could have 400000 messages cached to disk, making it a nessecity to restart JBoss every 12 hours to keep it at fairly good performance. My search in this forum led me to belive that it was caused somehow because I used topics with selectors, so I removed all selectors from the clients and implemented a check in the client code instead. It did NOT solve my problem, messages was still caching to disk!

                In my continued search for the problem I noticed that clients that didn't disconnect it's TopicConnection correctly from the JBossMQ Server, if for instance there had been a network failure or machine failure, the JBossMQ server NEVER released that connection until restarted, thus the JBossMQ thought it still had that "dead" client connected as a subsrciber, and cached messages for it to consume. After moving all my usage of topics to queues, I have seen the same caching of messages BUT the connection is eventually released, and caching of messages stops when the connection is released!!

                We have not needed to restart JBoss for two weeks now!!

                Hope this helps,
                /Erik Lindqvist

                • 5. Re: Dying of JBossMQ server under load
                  erok

                  Hi,

                  I have no final solution, but I have experienced a similar scenario, and made a workaround for our application:

                  Our application currently runs on:
                  JBoss 3.0.2
                  Redhat linux 7.2
                  Sun jdk 1.4.0
                  (But I have tried with JBoss 3.0.0 and 3.0.1 as well with no luck)

                  Our ejb-application had ~30 non-ejb clients communicating via JMS, both topics and queues. (Since I couldn't find any solution to this problem with topics, I moved all JMS stuff to queues, since I have found them to work better, although increasing the workload of the ejb, selecting where to send messages)

                  We had a number of OOM exceptions's before finding what caused it, JMS messages was caching to disk. After a 24 hour period we could have 400000 messages cached to disk, making it a nessecity to restart JBoss every 12 hours to keep it at fairly good performance. My search in this forum led me to belive that it was caused somehow because I used topics with selectors, so I removed all selectors from the clients and implemented a check in the client code instead. It did NOT solve my problem, messages was still caching to disk!

                  In my continued search for the problem I noticed that clients that didn't disconnect it's TopicConnection correctly from the JBossMQ Server, if for instance there had been a network failure or machine failure, the JBossMQ server NEVER released that connection until restarted, thus the JBossMQ thought it still had that "dead" client connected as a subsrciber, and cached messages for it to consume. After moving all my usage of topics to queues, I have seen the same caching of messages BUT the connection is eventually released, and caching of messages stops when the connection is released!!

                  We have not needed to restart JBoss for two weeks now!!

                  Hope this helps,
                  /Erik Lindqvist

                  • 6. Re: Dying of JBossMQ server under load
                    erok

                    OOPS! My first post didn't show right away...so I reposted it...3 times...sorry guys!

                    • 7. Re: Dying of JBossMQ server under load
                      skendorski

                      I don't see that NOT using selectors or topics is viable for my application. The question is why don't they work? And why is all this caching going on if I'm using non-persistant messages and no one is disconnecting? I tried using "rollinglogged" as the persistence manager, but it doesn't work either.

                      • 8. Re: Dying of JBossMQ server under load
                        yamax

                        Hi Erik,

                        Thank you for ur comments.

                        It is clear that the runaway and eventual death happens because of "bad" cache behaviour, however what causes it is not so clear. One of the current theories that i have is same as what you mentioned, that disconnection "contract" is not properly abided by the JMSServer. It looks like JMSTopic implementation realizes the Topic functionality with a layer on top of BasicQueue mechanism and when the disconnect is not honored cleanly it assumes that the remote client is still present and continues to cache, causing overflow, and eventually cachec overflow.
                        We thought about moving to queues, but since the availability of a consumer is not under our control (consumers for data may/may not be present) we cannot move to that.
                        I also tried to see if it is because of dispatching multiple producers onto a single topic. What happens is that since selector is evaluated at the end, presence of any client is sufficient to trigger cache to start caching all the messages!
                        I tried to create multiple topics (30 topics) and spread the streams over these topics, it resulted in a slightly better behaviour (it ran for 3 hours instead of 15 minutes), it only shifts the time of death.
                        I wonder if there is anyway to disable the cache completely and bring the QOS to "discard if not sure" i.e. compromise data delivery in favour of existence :)

                        Still searching ...

                        regards,

                        nitin

                        • 9. Re: Dying of JBossMQ server under load
                          cstone

                          Wow, I was glad to see this forum thread because other people are experiencing BOTH of the show-stopper problems that my company is experiencing with JBossMQ. However, to shed some light on the situation I would like to point out that I think there are actually 2 different problems here:

                          Problem 1) When publishing msgs to topics that have subscribers active (or durable but inactive, i.e. unconnected) JBossMQ blows up quite ungracefully with JVM memory exceptions after 10's of thousands of msgs, requiring a restart of JBoss :-(

                          Problem 2) When a network error disconnects a topic subscriber, JBossMQ doesn't detect this and thinks the client is still connected. This creates a terrible situation where subsequent attempts to reconnect the subscriber are denied since the clientid is "still connected"; this requires a restart of JBoss :-(

                          Please note that there are a few differences in my usage pattern and observations compared to some of the previous respondants. In particular:

                          a) I get the the memory blow-ups (Problem 1) even if I do not have an ungracefully disconnected client (Problem 2). Also, I get Problem 2 even if I haven't swamped the system to trigger Problem 1.
                          b) My application uses durable subscriptions and durable message publishing.

                          Additional info:
                          -The subscribers use msg selectors.
                          -My app uses 5 topics, each with only 1 or 2 durable subscribers (though I bet this can be reproduced with 1 topic and 1 durable subscriber).
                          -The topic sessions are created with AUTO_ACKNOWLEDGE
                          -The problems occur whether I use transactional JMS pub/sub sessions or non-transactional sessions.
                          -The memory problem still exists even though the messages are being properly persisted to disk (using the org.jboss.mq.pm.file.PersistenceManager). So why the heck does memory have to be squandered when there is a good disk back-up? Apparently a bad memory cache algorithm being used here.
                          - The memory bloating continues even after all topics have been drained of msgs by all durable subscribers. The persistent directories are empty, no more msgs exist, but memory remains bloated, and future msgs keep bloating it steadily until the inevitable JVM out-of-memory exception.
                          - I am using JBoss 3.0.3 (problem also encountered on 3.0.0) in a non-EJB scenario (i.e. no MDB's; my application runs in a standalone JVM and uses JBossMQ as a pure JMS provider).
                          - I have tried adjusting the HighMemoryMark and MaxMemoryMark values to both be less than the JVM's memory allocation; this does not solve the memory problem.

                          I hope that this info helps in debugging the issues. Both of these are serious problems that currently are forcing my company to consider using a $$commercial$$ JMS provider :-( !!!


                          • 10. Re: Dying of JBossMQ server under load

                            We have seen simial behaviour on JBoss 3.0.0. We use a small MBean as queue manifolds. Queue->Topic with durable subscribers->Several Queues. The server is getting slower and slower and eventually dies. Does anyone know a solution to this problem?

                            • 11. Re: Dying of JBossMQ server under load
                              temafm

                              Hi,
                              I read somewhere that the performance can be increased if a file is used as a file-system to persist messages. Doing this may affect the reliability.
                              Has anyone tried this or may try?

                              Here is, how to create file system in a file under Linux:

                              1) Create the file
                              # dd if=/dev/zero of=jbossmq-file.ex2 bs=1024 count=131072
                              'count' is the size of file in KB.

                              2) Create the filesystem
                              # mke2fs jbossmq-file.ex2
                              Answer yes to the complaint about not being a block device.

                              3) Mount the new file. You will need loopback support, either as a module or built in the kernel.
                              # mount jbossmq-file.ex2 /mnt -o loop=/dev/loop0

                              • 12. Re: Dying of JBossMQ server under load
                                skendorski

                                I am currently using the file PersistenceManager, and it does not solve the problem. The problem is that whatever the caching scheme used, the cache manager is the problem, not the method of persistence. The method of persistence will only effect the system performance.

                                • 13. Re: Dying of JBossMQ server under load
                                  cholliday

                                  I have been seeing this problem for a while and wondered if I was the only one. I tried 3.0 then went to 3.0.2 then went backwards to 2.4.9. I had the same problem with all of them. I figured I was going to see a fix or figure it out by the time I went into production. But here I am going into production tonight. I am just trying to limit my exposure until a fix comes or until I go another direction.

                                  • 14. Re: Dying of JBossMQ server under load
                                    yamax

                                    Hi all,

                                    After much consternation (and 2weeks of digging around), i found a few potential issues and hacked it and created a work-around.

                                    1. There are a few clean-up issues when a subscriber disconnects (unsubscribes) its connection. If there are any unacknowledged messages the cleanup is defered and no one seems to take it up. (Authors please correct me if i am wrong).
                                    I created a hack for it by proactively (and aggressively ;) ) cleaning up the messages related to a particular client on unsubscribing.
                                    2. I distributed the load on to 1 topic/stream (see my first post in this thread for problem description) so no selectors.
                                    3. I scaled down the QOS to drop old unconsumed messages but continue to "survive". The JMS server now no longer spills over to the disk, if it is overwhelmed by the input data, it starts dropping the oldest messages (this is trivially accomplished by commenting out the saveToStorage in MessageCache :)).
                                    4. I modified a synchronization lock in org.jboss.mq.server.ClientConsumer.java (version 1.11.2.1) method removeRemovedSubscription line 337 (take line numbers with a pinch of salt because of loggers that i may have inserted) to synchronize onf removedSubscriptions instead of subscriptions (seemed more reasonable to me ;).
                                    5. I volume tested this fix for 100 messages/second and 100 listener clients for 12 hours and the memory footprint of the JMS Provider stayed below 40MB, and CPU utilization @7% (on a dual 1.2GHz windows box)

                                    This was an acceptable workaround for me, and may not work for everyone.

                                    There is a another thread http://www.jboss.org/modules/bb/index.html?module=bb&op=viewtopic&t=forums/ seems to address similar issue, and someone seems to have fixed it. I hope that is true, i cannot test that fix for my problem right now, because 3.0.2 introduced some new problems for me, maybe later.

                                    Regards,

                                    nitin

                                    1 2 Previous Next