1 2 3 Previous Next 30 Replies Latest reply on Feb 5, 2008 10:39 AM by adhi Go to original post
      • 15. Re: TimeoutFactory flagged as point of contention
        clebert.suconic

        Adrian, what other information do you want?

        We already have this whole thread, and we kind of already acknowledge there is a problem. So, isn't that enough to be a bug?

        Phillip could attach the whole thread dump to the case, but he can't do that because the JIRA is closed. It's what comes first.. the chicken or the egg?

        • 16. Re: TimeoutFactory flagged as point of contention

          Just reopen the bug report.

          I know for a fact, that anybody who creates a bug report can reopen it.

          Failing that, I'll reopen when I know what the real problem is,
          i.e. what needs fixing.

          I'm only interested in seeing the real thread dump so we can
          see it isn't some other problem, to what we are *guessing* at on
          this thread.

          • 17. Re: TimeoutFactory flagged as point of contention

            I tried removing the continued lookup of the singleton.
            That made no discernable difference.

            So tomorrow, I am going to experiment with using a different "lock"
            for the main thread's wait/notify and the maintenance of the queue.

            The main thread looks overly contentious, given that once it has the
            work it can submit it from outside the synchronized block.

            I'll also try removing object pooling.

            If none of that works, then we'll probably need to design a more
            concurrent priority queue implementation.

            • 18. Re: TimeoutFactory flagged as point of contention

              Some academic papers, if somebody fancies a read or wants
              to play with implementing them in java. :-)

              http://www.google.co.uk/search?hl=en&q=concurrent+priority+queue&btnG=Google+Search

              • 19. Re: TimeoutFactory flagged as point of contention
                clebert.suconic

                ok... just reopened it. I thought you wanted to keep the case closed.

                I've just attached Phillip's thread dump. (I asked him to send me the file).

                208 threads waiting a lock on this point

                "SocketServerInvokerThread-10.64.36.1-58" id=151 idx=0x120 tid=16201 prio=5 alive, in native, blocked
                 -- Blocked trying to get lock: org/jboss/util/timeout/TimeoutFactory@0x2000000015c53290[fat lock]
                 at jrockit/vm/Threads.waitForSignal()V(Native Method)
                 at jrockit/vm/Locks.fatLockBlockOrSpin(JLjrockit/vm/ObjectMonitor;II)V(Unknown Source)[optimized]
                 at jrockit/vm/Locks.lockFat(Ljava/lang/Object;JLjrockit/vm/ObjectMonitor;Z)Ljava/lang/Object;(Unknown Source)[optimized]
                 at jrockit/vm/Locks.monitorEnterSecondStage(Ljava/lang/Object;I)Ljava/lang/Object;(Unknown Source)[optimized]
                 at jrockit/vm/Locks.monitorEnter(Ljava/lang/Object;)Ljava/lang/Object;(Unknown Source)[optimized]
                 at org/jboss/util/timeout/TimeoutFactory.dropTimeout(Lorg/jboss/util/timeout/TimeoutFactory$TimeoutImpl;)Z(TimeoutFactory.java:549)
                 at org/jboss/util/timeout/TimeoutFactory.access$200(Lorg/jboss/util/timeout/TimeoutFactory;Lorg/jboss/util/timeout/TimeoutFactory$TimeoutImpl;)Z(TimeoutFactory.java:46)
                 at org/jboss/util/timeout/TimeoutFactory$TimeoutImpl.cancel()Z(TimeoutFactory.java:696)
                 at org/jboss/tm/TransactionImpl.cancelTimeout()V(TransactionImpl.java:1374)
                 at org/jboss/tm/TransactionImpl.completeTransaction()V(TransactionImpl.java:1194)[inlined]
                 at org/jboss/tm/TransactionImpl.commit()V(TransactionImpl.java:315)[optimized]
                 at org/jboss/ejb/plugins/TxInterceptorCMT.endTransaction(Lorg/jboss/invocation/Invocation;Ljavax/transaction/Transaction;Ljavax/transaction/Transaction;I)V(TxInterceptorCMT.java:501)[optimized]
                 at org/jboss/ejb/plugins/TxInterceptorCMT.runWithTransactions(Lorg/jboss/invocation/Invocation;)Ljava/lang/Object;(TxInterceptorCMT.java:361)[optimized]
                 at org/jboss/ejb/plugins/TxInterceptorCMT.invoke(Lorg/jboss/invocation/Invocation;)Ljava/lang/Object;(TxInterceptorCMT.java:181)[optimized]
                 at org/jboss/ejb/plugins/LogInterceptor.invoke(Lorg/jboss/invocation/Invocation;)Ljava/lang/Object;(LogInterceptor.java:165)[optimized]
                 at org/jboss/ejb/plugins/CleanShutdownInterceptor.invoke(Lorg/jboss/invocation/Invocation;)Ljava/lang/Object;(CleanShutdownInterceptor.java:278)
                 at org/jboss/ejb/plugins/ProxyFactoryFinderInterceptor.invoke(Lorg/jboss/invocation/Invocation;)Ljava/lang/Object;(ProxyFactoryFinderInterceptor.java:136)[optimized]
                 at jrockit/reflect/CompiledMethodInvoker.invoke0(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;(Unknown Source)
                 at jrockit/reflect/CompiledMethodInvoker.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;(Unknown Source)[optimized]
                 at java/lang/reflect/Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;J)Ljava/lang/Object;(Unknown Source)[optimized]
                 at org/jboss/mx/interceptor/ReflectedDispatcher.invoke(Lorg/jboss/mx/server/Invocation;)Ljava/lang/Object;(ReflectedDispatcher.java:155)
                 at org/jboss/mx/server/AbstractMBeanInvoker.invoke(Ljava/lang/String;[Ljava/lang/Object;[Ljava/lang/String;)Ljava/lang/Object;(AbstractMBeanInvoker.java:189)[optimized]
                 at org/jboss/mx/server/MBeanServerImpl.invoke(Ljavax/management/ObjectName;Ljava/lang/String;[Ljava/lang/Object;[Ljava/lang/String;)Ljava/lang/Object;(MBeanServerImpl.java:659)
                 at org/jboss/invocation/unified/server/UnifiedInvokerHA.invoke(Lorg/jboss/remoting/InvocationRequest;)Ljava/lang/Object;(UnifiedInvokerHA.java:146)
                 at org/jboss/remoting/ServerInvoker.invoke(Lorg/jboss/remoting/InvocationRequest;)Ljava/lang/Object;(ServerInvoker.java:828)
                 at org/jboss/remoting/ServerInvoker.invoke(Ljava/lang/Object;)Ljava/lang/Object;(ServerInvoker.java:681)
                 at org/jboss/remoting/transport/socket/ServerThread.processInvocation()V(ServerThread.java:358)
                 at org/jboss/remoting/transport/socket/ServerThread.dorun()V(ServerThread.java:412)
                 at org/jboss/remoting/transport/socket/ServerThread.run()V(ServerThread.java:239)
                 at jrockit/vm/RNI.c2java(JJJJ)V(Native Method)
                




                and just 1 at this point

                "TimeoutFactory-0" id=58 idx=0x74 tid=16107 prio=5 alive, in native, waiting, daemon
                 -- Waiting for notification on: org/jboss/util/timeout/TimeoutFactory@0x2000000015c53290[fat lock]
                 at jrockit/vm/Threads.waitForSignalWithTimeout(J)V(Native Method)
                 at java/lang/Object.wait(J)V(Native Method)
                 at org/jboss/util/timeout/TimeoutFactory.doWork()V(TimeoutFactory.java:603)
                 ^-- Lock released while waiting: org/jboss/util/timeout/TimeoutFactory@0x2000000015c53290[fat lock]
                 at org/jboss/util/timeout/TimeoutFactory.access$000(Lorg/jboss/util/timeout/TimeoutFactory;)V(TimeoutFactory.java:46)
                 at org/jboss/util/timeout/TimeoutFactory$1.run()V(TimeoutFactory.java:264)
                 at jrockit/vm/RNI.c2java(JJJJ)V(Native Method)
                 -- end of trace
                


                No wonder why...


                
                 // thread.run will call this method...
                 private void doWork()
                 {
                 while (!cancelled)
                 {
                 TimeoutImpl work = null;
                 // Look for work
                 synchronized (this)
                 {
                 if (size == 0 && !cancelled)
                 {
                 try
                 {
                 wait();
                 }
                 catch (InterruptedException ex)
                 {
                 }
                 }
                
                
                



                • 20. Re: TimeoutFactory flagged as point of contention
                  clebert.suconic

                  It's kind of funny...


                  wait is releasing the first lock.

                  but it's also creating another lock on the wait itself:

                  Look at Waiting for notification on .... [fat lock]

                  "TimeoutFactory-0" id=58 idx=0x74 tid=16107 prio=5 alive, in native, waiting, daemon
                   -- Waiting for notification on: org/jboss/util/timeout/TimeoutFactory@0x2000000015c53290[fat lock]
                   at jrockit/vm/Threads.waitForSignalWithTimeout(J)V(Native Method)
                   at java/lang/Object.wait(J)V(Native Method)
                   at org/jboss/util/timeout/TimeoutFactory.doWork()V(TimeoutFactory.java:603)
                   ^-- Lock released while waiting: org/jboss/util/timeout/TimeoutFactory@0x2000000015c53290[fat lock]
                   at org/jboss/util/timeout/TimeoutFactory.access$000(Lorg/jboss/util/timeout/TimeoutFactory;)V(TimeoutFactory.java:46)
                   at org/jboss/util/timeout/TimeoutFactory$1.run()V(TimeoutFactory.java:264)
                   at jrockit/vm/RNI.c2java(JJJJ)V(Native Method)
                   -- end of trace
                  


                  • 21. Re: TimeoutFactory flagged as point of contention
                    clebert.suconic


                    Maybe the thing locks / unlocks so frequently (spinning between waits and notifies) that we couldn't capture the lock holder.

                    The stack trace, and high response times when going higher in our performance tests is the information we have now.

                    • 22. Re: TimeoutFactory flagged as point of contention

                      Over the weekend, I refactored the TimeoutFactory into
                      the core part and a TimeoutPriorityQueue plugin with one
                      implementation that is just the old code.

                      I've stress tested this such that we can use it with the 4.0.4.GA
                      release if necessary.

                      The only material change over the current behaviour is that I moved the
                      submission to the thread pool for time outs outside the synchronized
                      block.

                      I can now try alternative priority queue impls.
                      This is done by specifying the system property:
                      -Dorg.jboss.util.timeout.TimoutPriorityQueue=some.class.name

                      • 23. Re: TimeoutFactory flagged as point of contention

                        From the testing that I have done, the issue appears to be the
                        algorithm doesn't scale the way it is intended.

                        It is "log(n)" in terms of adding and removing, but that is just for a
                        single thread. Once the background thread is contending with the
                        schedulers it ceases to be so.

                        It certainly doesn't scale with the number of scheduling threads in a nice way.

                        I am also seeing a wide variation in the length of time a test takes
                        to run. Which suggests to me that there is a "fair queuing" issue
                        from the use of the simple synchronization primitives.

                        P.S. This alogirthm is very similar to the DelayQueue from JSR166
                        (backport-concurrent-util).
                        The only main difference I can see is that it uses two different
                        synchronization primitives.
                        1) A re-entrant lock to make the queue thread safe
                        2) A condition to wake up the waiting takers.
                        The re-entrant lock does not use fair queuing though?

                        • 24. Re: TimeoutFactory flagged as point of contention

                          I am also unsure whether what I am really measuring is just the
                          scalability of the JDK's lock/notify primitives rather than the efficiency
                          of the algorithm.

                          • 25. Re: TimeoutFactory flagged as point of contention
                            clebert.suconic

                            Adrian,

                            Phillip Thurmond will take another run with current CVS version to see how it would behave.
                            (Unless you think it doesn't worth trying yet).

                            We will make these tests in about three days. (unless we need to change priorities)

                            We will post results here.

                            • 26. Re: TimeoutFactory flagged as point of contention

                              You can try it, but I haven't changed much.
                              1) Moved the thread pool submission out of the synchronization block
                              2) Removed the object pooling

                              I've tried three different implementations today.
                              All were slower or made little difference compared with the current version.

                              * Using a "scheduler" queue such that the submitting threads do
                              virutally no work
                              * Using a modified version of the public domain JSR166 version of DelayQueue to optimize removal (using fair queued re-entrant locks with this was even slower :-)
                              * A modification of the JSR166 DelayQueue to use normal synchronization
                              rather than a Reentrant lock.

                              Looks like I am going to have to look some of the academic
                              implementations that do more fine grained locking
                              or the some of the skip-list stuff.

                              • 27. Re: TimeoutFactory flagged as point of contention
                                clebert.suconic

                                Just brain storming...

                                Isn't possible to partition the queue somehow?
                                Using hashcodes?

                                • 28. Re: TimeoutFactory flagged as point of contention

                                  Probably. But the issue is synchronization between the adders/removers
                                  and the background thread that needs to wait/notify when a timeout
                                  is required.

                                  I tried today to implement this:
                                  http://www.cs.rochester.edu/~scott/papers/1996_IPL_heaps.pdf
                                  but it doesn't scale as well as the current implementation.
                                  That is without the "bit-reversedsounter" to hash the adds/removes
                                  across different parts of the tree.

                                  But I couldn't get this part to work. The alogrithm as it stands is broken.
                                  I've done a few fixes to get it working as a priority queue, but I
                                  haven't been able to get it to work with the "random" remove/cancellation
                                  of the timeouts.

                                  Next on the list is this :-)
                                  http://citeseer.ist.psu.edu/sundell03fast.html
                                  Which is a more recent alogorithm, but also doesn't include a random
                                  remove/cancel operation. :-(

                                  • 29. Re: TimeoutFactory flagged as point of contention

                                    I've implemented your Hash based priority queue.
                                    To use it, start JBoss with:
                                    -Dorg.jboss.util.timeout.TimeoutPriorityQueue=org.jboss.util.timeout.HashedTimeoutPriorityQueueImpl

                                    This divides the queue into 40 subqueues with an individual node at the top
                                    for the next to be scheduled.

                                    It doesn't make any significant difference in my tests.
                                    But like I said before, I am not sure whether I am just measuring the
                                    locking/scheduling capabilities of the OS/JDK.