13 Replies Latest reply on Dec 9, 2009 11:25 AM by clebert.suconic

    Stress test: MultiThreadRandomReattachTest

    timfox

      I commented out this test for now, as it still seems to be failing.

      Clebert- can you give a status update on this?

        • 1. Re: Stress test: MultiThreadRandomReattachTest
          clebert.suconic

          This error is being hard to catch.

          And at least one weird thing I have seen:


          We eventually see those session you didn't close exceptions on those tests. Those are marked as inClose. I have added extra logging to them, and setting the Thread that is trying to close them, and it seems that a finalize is trying to close the session.


          What is weird to me is why the reference on the finalize wasn't taken into account on the Garbage Colletion count?

          I'll confirm that..


          And this is hard to chase. Sometimes it happens quickly.. sometimes it takes more than 500 iterations to replicate it.

          • 2. Re: Stress test: MultiThreadRandomReattachTest
            clebert.suconic

            BTW: There are no logs whatsoever on stress-tests. Can we at least enable Warning/Error on them?


            • 3. Re: Stress test: MultiThreadRandomReattachTest
              clebert.suconic

              @Andy: If you could add some logging into stress tests please (as we talked today).

              I think we should have WARNING/ERROR being sent into the console output, and INFO/WARNING/ERROR being sent to the log file.

              IMO having INFO on the console would be too verbose. But if everybody just wants INFO/WARNING/ERROR on the console .. fine with me.

              I couldn't find any log file on the stress test BTW.

              • 4. Re: Stress test: MultiThreadRandomReattachTest
                clebert.suconic

                At least I could create a test that reliably recreates the issue. (at least in less than 100 iterations)..

                org.hornetq.tests.integration.cluster.reattach.OrderReattachTest



                This is basically RandomReattachTest::testB. With a different that OrderReattachTest will force several failures during consuming the messages.

                This is an issue (apparently) with messageListeners during failover.

                • 5. Re: Stress test: MultiThreadRandomReattachTest
                  clebert.suconic

                  I've done the opposite on a test.. always failing during message.send... I never had any issues with ordering. (500 iterations)

                  • 6. Re: Stress test: MultiThreadRandomReattachTest
                    clebert.suconic

                    BTW: The didn't close session messages is a different issue.

                    I could have already the out of order even before I modified Reattach::testB as OrderReattachTest.

                    Every time I saw that message, the session had already inClose = true, and I could confirm by adding logs that the session was being closed at a finalize block.

                    I have no clue why the VM is calling finalize() for something still referenced. That's so weird. As a matter of fact.. if I added a reference to *this, such as system.out.println(this) after the latest command on the finalize, the didn't close never happened. Maybe that's some hotspot optimization releasing the instance earlier? (Just a guess)

                    I will do some tweaks to DelegatintSession to make sure finalize won't call close again after the user has called close.

                    • 7. Re: Stress test: MultiThreadRandomReattachTest
                      timfox

                      BTW your test fails with just one session, making it much easier to debug.

                      • 8. Re: Stress test: MultiThreadRandomReattachTest
                        timfox

                        Also, I can't get this to fail when using the Netty transport. Only when using the invm transport.

                        • 9. Re: Stress test: MultiThreadRandomReattachTest
                          timfox

                          I'm going to take over this task, otherwise there's too much risk it won't be fixed before the end of this week.

                          • 10. Re: Stress test: MultiThreadRandomReattachTest
                            clebert.suconic

                             

                            "timfox" wrote:
                            Also, I can't get this to fail when using the Netty transport. Only when using the invm transport.


                            It looks like the client keeps receiving messages even after the connection has "failed".

                            What would make those tests invalid. We would need to make sure the connection is closed after failed?

                            • 11. Re: Stress test: MultiThreadRandomReattachTest
                              clebert.suconic

                              I've changed RemotingConnection::fail to this:

                              Calling the transportConnection.close(); before calling the listeners.


                              I'm already at 250 iterations and no failures.




                              public void fail(final HornetQException me)
                               {
                               synchronized (failLock)
                               {
                               if (destroyed)
                               {
                               return;
                               }
                              
                               destroyed = true;
                               }
                              
                               RemotingConnectionImpl.log.warn("Connection failure has been detected: " + me.getMessage() +
                               " [code=" +
                               me.getCode() +
                               "]");
                              
                               // We close the underlying transport connection
                               transportConnection.close();
                              
                              
                               // Then call the listeners
                               callFailureListeners(me);
                              
                               callClosingListeners();
                              
                               for (Channel channel : channels.values())
                               {
                               channel.returnBlocking();
                               }
                               }
                              


                              • 12. Re: Stress test: MultiThreadRandomReattachTest
                                clebert.suconic

                                Just to update the thread with what I said on IRC. It failed at 450.

                                • 13. Re: Stress test: MultiThreadRandomReattachTest
                                  clebert.suconic

                                  @Tim: I was going to make a simple change on DelegatingSession as shown on the end of this post.

                                  I won't commit that now just in case if affects the investigation.

                                  I don think it is related. The times I have seen the message ""I'm closing a core ClientSession you left open." was when a failure happened during the closeSession, leaving the inClose=true and closed=false. While there was a finalize block also trying to close the session. (The weird behaviour on GC that I mentioned before)




                                  Index: src/main/org/hornetq/core/client/impl/DelegatingSession.java
                                  ===================================================================
                                  --- src/main/org/hornetq/core/client/impl/DelegatingSession.java (revision 8641)
                                  +++ src/main/org/hornetq/core/client/impl/DelegatingSession.java (working copy)
                                  @@ -50,6 +50,8 @@
                                   private final ClientSessionInternal session;
                                  
                                   private final Exception creationStack;
                                  +
                                  + private volatile boolean closed = false;
                                  
                                   private static Set<DelegatingSession> sessions = new ConcurrentHashSet<DelegatingSession>();
                                  
                                  @@ -68,7 +70,7 @@
                                   @Override
                                   protected void finalize() throws Throwable
                                   {
                                  - if (!session.isClosed())
                                  + if (!closed)
                                   {
                                   DelegatingSession.log.warn("I'm closing a core ClientSession you left open. Please make sure you close all ClientSessions explicitly " + "before letting them go out of scope! " +
                                   System.identityHashCode(this));
                                  @@ -134,6 +136,8 @@
                                   {
                                   DelegatingSession.sessions.remove(this);
                                   }
                                  +
                                  + closed = true;
                                  
                                   session.close();
                                   }