6 Replies Latest reply on Sep 5, 2008 7:46 AM by Adrian Brock

    Pooling stress tests failing

    Adrian Brock Master
        • 1. Re: Pooling stress tests failing
          Adrian Brock Master

          NOTE: You can run these stress tests for longer and with more threads, etc.
          by starting JBoss with the relevant system properties:

          See EJBTestCase:

           protected int getThreadCount()
           {
           int result = Integer.getInteger("jbosstest.threadcount", JBossTestServices.DEFAULT_THREADCOUNT).intValue();
           log.debug("jbosstest.threadcount=" + result);
           return result;
           }
          
           protected int getIterationCount()
           {
           int result = Integer.getInteger("jbosstest.iterationcount", JBossTestServices.DEFAULT_ITERATIONCOUNT).intValue();
           log.debug("jbosstest.iterationcount=" + result);
           return result;
           }
          
           protected int getBeanCount()
           {
           int result = Integer.getInteger("jbosstest.beancount", JBossTestServices.DEFAULT_BEANCOUNT).intValue();
           log.debug("jbosstest.beancount=" + result);
           return result;
           }
          


          • 2. Re: Pooling stress tests failing
            Jesper Pedersen Master

            The issue should be transient as the testcases are passing on my laptop.

            It looks like #881 had problems with load on the machine as BasicTimerUnitTestCase also had a timeout problem.

            IMHO - we need a dedicated machine (non-VM) to run the testsuite - otherwise the *StressTestCase suites will fail randomly.

            I'll create a new issue if the problem persist or re-open JBAS-5095 if some of the parameters should be changed.

            • 3. Re: Pooling stress tests failing
              Adrian Brock Master

               

              "jesper.pedersen" wrote:
              The issue should be transient as the testcases are passing on my laptop.


              If you think that is really the issue then increase the timeouts in the tests
              so they are not so brittle (same goes for the timer test).

              I haven't seen this test fail before, so you need to validate that you haven't introduced
              a point of contention with your fix. I doubt this new transient failure is a conincidence.

              e.g. Run this test (and maybe some of the other stress tests)
              with more load for a longer time with both the old and new code
              and see hoiw long it takes.

              • 4. Re: Pooling stress tests failing
                Jesper Pedersen Master

                The TxConnectionManagerStressTestCase is new -- committed together with the fix for JBAS-5095.

                I have used both the BaseConnectionManagerStressTestCase (Non-TX) and TxConnectionManagerStressTestCase during the development - of course with a full run of org.jboss.test.jca.test.* before committing.

                #881 was the first run on Hudson with TxConnectionManagerStressTestCase - if #882 also fails I'll increase the timeout value.

                The TxConnectionManager based stress testcases shows a small increase in CPU time due to the synchronization on wasFreed() on my machine compared to the old implementation, but it's a very small increase.

                • 5. Re: Pooling stress tests failing
                  Adrian Brock Master

                   

                  "jesper.pedersen" wrote:
                  The TxConnectionManagerStressTestCase is new -- committed together with the fix for JBAS-5095.

                  I have used both the BaseConnectionManagerStressTestCase (Non-TX) and TxConnectionManagerStressTestCase during the development - of course with a full run of org.jboss.test.jca.test.* before committing.

                  #881 was the first run on Hudson with TxConnectionManagerStressTestCase - if #882 also fails I'll increase the timeout value.


                  One failure is enough, increase it now.
                  How is somebody supposed to know whether they broke something
                  if there are spurious failures?


                  The TxConnectionManager based stress testcases shows a small increase in CPU time due to the synchronization on wasFreed() on my machine compared to the old implementation, but it's a very small increase.


                  Points of contention are caused by waits not cpu utilization (unless the cpu is maxed out).
                  I'm suprised you can even measure the cpu utilization of a single synchronization
                  unless you're deliberately stressing it and it is doing a lot of "spin locking"

                  • 6. Re: Pooling stress tests failing
                    Adrian Brock Master

                     

                    "adrian@jboss.org" wrote:

                    Points of contention are caused by waits


                    So you should look at the real clock times.