2 Replies Latest reply on Aug 13, 2009 11:17 AM by timfox

    Ping issues under stress

    jmesnil

      I managed to get ping issues and clients stuck doing a soak test on our messaging Labs.

      Setup:
      1 server (messaging-01) with 1 JMS topic
      10 producers + 17 non-durable subscribers (splitted on 7 machines)

      server config:

       Netty:
       <param key="jbm.remoting.netty.tcpnodelay" value="false" type="Boolean"/>
       <param key="jbm.remoting.netty.tcpsendbuffersize" value="1048576" type="Integer"/>
       <param key="jbm.remoting.netty.tcpreceivebuffersize" value="1048576" type="Integer"/>
      
       <security-enabled>false</security-enabled>
       <persistence-enabled>false</persistence-enabled>
      



      JMS config:

      <connection-factory name="ConnectionFactory">
      <connector-ref connector-name="netty-connector"/>



      <retry-interval>1000</retry-interval>
      <retry-interval-multiplier>1.0</retry-interval-multiplier>
      <reconnect-attempts>-1</reconnect-attempts>
      <failover-on-server-shutdown>true</failover-on-server-shutdown>
      <call-timeout>5000</call-timeout>
      <block-on-non-persistent-send>false</block-on-non-persistent-send>
      <block-on-persistent-send>true</block-on-persistent-send>
      <pre-acknowledge>true</pre-acknowledge>
      </connection-factory>







      soak.properties:
      duration-in-minutes=120
      num-warmup-messages=100
      message-size=1024
      durable=false
      transacted=false
      batch-size=1000
      drain-queue=false
      destination-lookup=soakTopic
      connection-factory-lookup=/ConnectionFactory
      throttle-rate=-1
      dups-ok-acknowledge=false
      disable-message-id=true
      disable-message-timestamp=true
      


      The clients (both producers and consumers) are connected to reconnect to the server
      when they are notified by a JMSExceptionListener. Their connection factories will indefinitely try
      to reconnect to the server.

      1. I start the server. No activity:

       [java] **** Server Dump ****
       [java] date: Thu Aug 13 08:40:56 EDT 2009
       [java] heap memory: used=60.09 MB, max=1.92 GB
       [java] non-heap memory: used=60.09 MB, max=1.92 GB
       [java] # of thread: 16
       [java] # of conns: 0
       [java] JMS topics:
       [java] soakTopic: 0 subscription (0 non-durable) receiving 0 message
       [java] ********************
      


      2. I then start 10 producers and 17 non-durable subscribers
      At the beginning, all producers are producing and all subscribers are subscribing

      3. But after a few seconds, the subscribers starts to warn about missing ping servers and failures to failover

       [java] Aug 13, 2009 8:51:30 AM org.jboss.messaging.core.logging.Logger warn
       [java] WARNING: Connection failure has been detected: Did not receive ping from server for org.jboss.messaging.integration.transports.netty.NettyConnection@128ae45a[local= /172.16.8.12:41708, remote=/172.16.8.10:5445] [code=3]
       [java] Aug 13, 2009 8:51:30 AM org.jboss.messaging.core.logging.Logger warn
       [java] WARNING: Connection failure has been detected: Did not receive ping from server for org.jboss.messaging.integration.transports.netty.NettyConnection@128ae45a[local= /172.16.8.12:41706, remote=/172.16.8.10:5445] [code=3]
       [java] Aug 13, 2009 8:51:36 AM org.jboss.jms.soak.example.reconnect.SoakReconnectableReceiver$3 onMessage
       [java] INFO: received 10000 messages in 11.60s (total: 117s)
       [java] Aug 13, 2009 8:51:53 AM org.jboss.jms.soak.example.reconnect.SoakReconnectableReceiver$3 onMessage
       [java] INFO: received 10000 messages in 16.80s (total: 133s)
       [java] Aug 13, 2009 8:52:09 AM org.jboss.jms.soak.example.reconnect.SoakReconnectableReceiver$3 onMessage
       [java] INFO: received 10000 messages in 16.07s (total: 150s)
       [java] Aug 13, 2009 8:52:15 AM org.jboss.messaging.core.logging.Logger warn
       [java] WARNING: Connection failure has been detected: Did not receive ping from server for org.jboss.messaging.integration.transports.netty.NettyConnection@5f7779e3[local= /172.16.8.12:41735, remote=/172.16.8.10:5445] [code=3]
       [java] Aug 13, 2009 8:52:15 AM org.jboss.messaging.core.logging.Logger warn
       [java] WARNING: Connection failure has been detected: Did not receive ping from server for org.jboss.messaging.integration.transports.netty.NettyConnection@1ce3570c[local= /172.16.8.12:41737, remote=/172.16.8.10:5445] [code=3]
       [java] Aug 13, 2009 8:52:20 AM org.jboss.messaging.core.logging.Logger error
       [java] SEVERE: Failed to handle failover
       [java] MessagingException[errorCode=3 message=Timed out waiting for response when sending packet 32]
       [java] at org.jboss.messaging.core.remoting.impl.ChannelImpl.sendBlocking(ChannelImpl.java:312)
       [java] at org.jboss.messaging.core.client.impl.ClientSessionImpl.handleFailover(ClientSessionImpl.java:751)
       [java] at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.reattachSessions(ConnectionManagerImpl.java:794)
       [java] at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.failoverOrReconnect(ConnectionManagerImpl.java:647)
       [java] at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.handleConnectionFailure(ConnectionManagerImpl.java:518)
       [java] at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.access$600(ConnectionManagerImpl.java:81)
       [java] at org.jboss.messaging.core.client.impl.ConnectionManagerImpl$DelegatingFailureListener.connectionFailed(ConnectionManagerImpl.java:1218)
       [java] at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl.callFailureListeners(RemotingConnectionImpl.java:393)
       [java] at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl.fail(RemotingConnectionImpl.java:244)
       [java] at org.jboss.messaging.core.client.impl.ConnectionManagerImpl$FailedConnectionAction$1.run(ConnectionManagerImpl.java:1174)
       [java] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
       [java] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
       [java] at java.lang.Thread.run(Thread.java:619)
       [java] javax.jms.JMSException: Connection is destroyed
       [java] at org.jboss.messaging.core.remoting.impl.ChannelImpl.sendBlocking(ChannelImpl.java:240)
       [java] at org.jboss.messaging.core.client.impl.ClientConsumerImpl.doCleanUp(ClientConsumerImpl.java:745)
       [java] at org.jboss.messaging.core.client.impl.ClientConsumerImpl.close(ClientConsumerImpl.java:315)
       [java] at org.jboss.messaging.jms.client.JBossMessageConsumer.close(JBossMessageConsumer.java:153)
       [java] at org.jboss.messaging.jms.client.JBossSession.close(JBossSession.java:275)
       [java] at org.jboss.messaging.jms.client.JBossConnection.close(JBossConnection.java:259)
       [java] at org.jboss.jms.soak.example.reconnect.SoakReconnectableReceiver.disconnect(SoakReconnectableReceiver.java:186)
       [java] at org.jboss.jms.soak.example.reconnect.SoakReconnectableReceiver.access$000(SoakReconnectableReceiver.java:41)
       [java] at org.jboss.jms.soak.example.reconnect.SoakReconnectableReceiver$2.onException(SoakReconnectableReceiver.java:99)
       [java] at org.jboss.messaging.jms.client.JBossConnection$JMSFailureListener$1.run(JBossConnection.java:550)
       [java] at java.lang.Thread.run(Thread.java:619)
       [java] Caused by: MessagingException[errorCode=2 message=Connection is destroyed]
       [java] ... 11 more
       [java] javax.jms.JMSException: Timed out waiting for response when sending packet 30
       [java] at org.jboss.messaging.core.remoting.impl.ChannelImpl.sendBlocking(ChannelImpl.java:312)
       [java] at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.createSession(ConnectionManagerImpl.java:326)
       [java] at org.jboss.messaging.core.client.impl.ClientSessionFactoryImpl.createSessionInternal(ClientSessionFactoryImpl.java:977)
       [java] at org.jboss.messaging.core.client.impl.ClientSessionFactoryImpl.createSession(ClientSessionFactoryImpl.java:721)
       [java] at org.jboss.messaging.jms.client.JBossConnection.createSessionInternal(JBossConnection.java:463)
       [java] at org.jboss.messaging.jms.client.JBossConnection.createSession(JBossConnection.java:160)
       [java] at org.jboss.jms.soak.example.reconnect.SoakReconnectableReceiver.connect(SoakReconnectableReceiver.java:213)
       [java] at org.jboss.jms.soak.example.reconnect.SoakReconnectableReceiver.access$100(SoakReconnectableReceiver.java:41)
       [java] at org.jboss.jms.soak.example.reconnect.SoakReconnectableReceiver$2.onException(SoakReconnectableReceiver.java:100)
       [java] at org.jboss.messaging.jms.client.JBossConnection$JMSFailureListener$1.run(JBossConnection.java:550)
       [java] at java.lang.Thread.run(Thread.java:619)
       [java] Caused by: MessagingException[errorCode=3 message=Timed out waiting for response when sending packet 30]
       [java] ... 11 more
      


      4. on the server, the number of remoting connection (and the memory) explodes:

       [java] **** Server Dump ****
       [java] date: Thu Aug 13 08:41:57 EDT 2009
       [java] heap memory: used=1.46 GB, max=1.88 GB
       [java] non-heap memory: used=1.46 GB, max=1.88 GB
       [java] # of thread: 62
       [java] # of conns: 112
       [java] JMS topics:
       [java] soakTopic: 17 subscription (17 non-durable) receiving 8337499 message
       [java] ********************
      


      5. The exception continues on the subscribers
      6. Things are messy on the server-side with more subscriptions than there are clients:
       [java] **** Server Dump ****
       [java] date: Thu Aug 13 08:43:57 EDT 2009
       [java] heap memory: used=1.70 GB, max=1.88 GB
       [java] non-heap memory: used=1.70 GB, max=1.88 GB
       [java] # of thread: 45
       [java] # of conns: 260
       [java] JMS topics:
       [java] soakTopic: 23 subscription (23 non-durable) receiving 10414915 message
       [java] ********************
      


      6. the server start to clean up remoting connections and related resources
       [java] [Thread-0 (group:JBM-scheduled-threads-2030193044)] 08:44:25,717 WARNING [org.jboss.messaging.core.remoting.server.impl.RemotingServiceImpl] Did not receive initial ping from /172.16.8.15:38164, connection will be closed
       [java] [Thread-0 (group:JBM-scheduled-threads-2030193044)] 08:44:25,717 WARNING [org.jboss.messaging.core.remoting.server.impl.RemotingServiceImpl] Did not receive initial ping from /172.16.8.12:49942, connection will be closed
      


      7. I also have some netty exception on the server:
       [java] [New I/O server worker #1-2] 08:44:31,281 WARNING [org.jboss.netty.channel.socket.nio.NioWorker] Unexpected exception in the selector loop.
       [java] org.jboss.netty.channel.ChannelException: Failed to register a socket to the selector.
       [java] at org.jboss.netty.channel.socket.nio.NioWorker$RegisterTask.run(NioWorker.java:754)
       [java] at org.jboss.netty.channel.socket.nio.NioWorker.processRegisterTaskQueue(NioWorker.java:260)
       [java] at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:199)
       [java] at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:53)
       [java] at org.jboss.messaging.integration.transports.netty.VirtualExecutorService$ChildExecutorRunnable.run(VirtualExecutorService.java:184)
       [java] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
       [java] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
       [java] at java.lang.Thread.run(Thread.java:619)
       [java] Caused by: java.nio.channels.ClosedChannelException
       [java] at java.nio.channels.spi.AbstractSelectableChannel.configureBlocking(AbstractSelectableChannel.java:252)
       [java] at org.jboss.netty.channel.socket.nio.NioWorker$RegisterTask.run(NioWorker.java:739)
       [java] ... 7 more
      


      8. Failed connections now are unable to clean up related resource on the server
       [java] [Thread-3 (group:JBM-scheduled-threads-2030193044)] 08:47:55,595 SEVERE [org.jboss.messaging.core.management.impl.ManagementServiceImpl] Failed to call listener
       [java] java.lang.IllegalStateException: Cannot find queue info for queue b685ebae-d3b7-4a4d-81f1-862366829b1f8b8e8cc7-8806-11de-9b94-0015178e6d7c
       [java] at org.jboss.messaging.core.postoffice.impl.PostOfficeImpl.onNotification(PostOfficeImpl.java:387)
       [java] at org.jboss.messaging.core.management.impl.ManagementServiceImpl.sendNotification(ManagementServiceImpl.java:647)
       [java] at org.jboss.messaging.core.server.impl.ServerConsumerImpl.close(ServerConsumerImpl.java:272)
       [java] at org.jboss.messaging.core.server.impl.ServerSessionImpl.close(ServerSessionImpl.java:339)
       [java] at org.jboss.messaging.core.server.impl.ServerSessionImpl.doHandleClose(ServerSessionImpl.java:2217)
       [java] at org.jboss.messaging.core.server.impl.ServerSessionImpl.handleClose(ServerSessionImpl.java:882)
       [java] at org.jboss.messaging.core.server.impl.ServerSessionImpl.connectionFailed(ServerSessionImpl.java:1177)
       [java] at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl.callFailureListeners(RemotingConnectionImpl.java:393)
       [java] at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl.fail(RemotingConnectionImpl.java:244)
       [java] at org.jboss.messaging.core.remoting.server.impl.RemotingServiceImpl$FailedConnectionAction.run(RemotingServiceImpl.java:480)
       [java] at org.jboss.messaging.core.remoting.impl.Pinger.run(Pinger.java:116)
       [java] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
       [java] at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
       [java] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
       [java] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
       [java] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
       [java] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
       [java] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
       [java] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
       [java] at java.lang.Thread.run(Thread.java:619)
      


      9. During that time, the producers started to see missing pings from the server:

      10. When things settle down, all clients are stuck
       [java] **** Server Dump ****
       [java] date: Thu Aug 13 09:22:10 EDT 2009
       [java] heap memory: used=1.70 GB, max=1.87 GB
       [java] non-heap memory: used=1.70 GB, max=1.87 GB
       [java] # of thread: 30
       [java] # of conns: 16
       [java] JMS topics:
       [java] soakTopic: 2 subscription (2 non-durable) receiving 0 message
       [java] ********************
      


      I did the same run with <block-non-persistent-send> set to true and I did not get any error.
      To clear things up, no servers were killed/restarted during these runs.