Ping issues under stress
jmesnil Aug 13, 2009 9:27 AMI managed to get ping issues and clients stuck doing a soak test on our messaging Labs.
Setup:
1 server (messaging-01) with 1 JMS topic
10 producers + 17 non-durable subscribers (splitted on 7 machines)
server config:
Netty: <param key="jbm.remoting.netty.tcpnodelay" value="false" type="Boolean"/> <param key="jbm.remoting.netty.tcpsendbuffersize" value="1048576" type="Integer"/> <param key="jbm.remoting.netty.tcpreceivebuffersize" value="1048576" type="Integer"/> <security-enabled>false</security-enabled> <persistence-enabled>false</persistence-enabled>
JMS config:
<connection-factory name="ConnectionFactory">
<connector-ref connector-name="netty-connector"/>
<retry-interval>1000</retry-interval>
<retry-interval-multiplier>1.0</retry-interval-multiplier>
<reconnect-attempts>-1</reconnect-attempts>
<failover-on-server-shutdown>true</failover-on-server-shutdown>
<call-timeout>5000</call-timeout>
<block-on-non-persistent-send>false</block-on-non-persistent-send>
<block-on-persistent-send>true</block-on-persistent-send>
<pre-acknowledge>true</pre-acknowledge>
</connection-factory>
soak.properties:
duration-in-minutes=120 num-warmup-messages=100 message-size=1024 durable=false transacted=false batch-size=1000 drain-queue=false destination-lookup=soakTopic connection-factory-lookup=/ConnectionFactory throttle-rate=-1 dups-ok-acknowledge=false disable-message-id=true disable-message-timestamp=true
The clients (both producers and consumers) are connected to reconnect to the server
when they are notified by a JMSExceptionListener. Their connection factories will indefinitely try
to reconnect to the server.
1. I start the server. No activity:
[java] **** Server Dump **** [java] date: Thu Aug 13 08:40:56 EDT 2009 [java] heap memory: used=60.09 MB, max=1.92 GB [java] non-heap memory: used=60.09 MB, max=1.92 GB [java] # of thread: 16 [java] # of conns: 0 [java] JMS topics: [java] soakTopic: 0 subscription (0 non-durable) receiving 0 message [java] ********************
2. I then start 10 producers and 17 non-durable subscribers
At the beginning, all producers are producing and all subscribers are subscribing
3. But after a few seconds, the subscribers starts to warn about missing ping servers and failures to failover
[java] Aug 13, 2009 8:51:30 AM org.jboss.messaging.core.logging.Logger warn [java] WARNING: Connection failure has been detected: Did not receive ping from server for org.jboss.messaging.integration.transports.netty.NettyConnection@128ae45a[local= /172.16.8.12:41708, remote=/172.16.8.10:5445] [code=3] [java] Aug 13, 2009 8:51:30 AM org.jboss.messaging.core.logging.Logger warn [java] WARNING: Connection failure has been detected: Did not receive ping from server for org.jboss.messaging.integration.transports.netty.NettyConnection@128ae45a[local= /172.16.8.12:41706, remote=/172.16.8.10:5445] [code=3] [java] Aug 13, 2009 8:51:36 AM org.jboss.jms.soak.example.reconnect.SoakReconnectableReceiver$3 onMessage [java] INFO: received 10000 messages in 11.60s (total: 117s) [java] Aug 13, 2009 8:51:53 AM org.jboss.jms.soak.example.reconnect.SoakReconnectableReceiver$3 onMessage [java] INFO: received 10000 messages in 16.80s (total: 133s) [java] Aug 13, 2009 8:52:09 AM org.jboss.jms.soak.example.reconnect.SoakReconnectableReceiver$3 onMessage [java] INFO: received 10000 messages in 16.07s (total: 150s) [java] Aug 13, 2009 8:52:15 AM org.jboss.messaging.core.logging.Logger warn [java] WARNING: Connection failure has been detected: Did not receive ping from server for org.jboss.messaging.integration.transports.netty.NettyConnection@5f7779e3[local= /172.16.8.12:41735, remote=/172.16.8.10:5445] [code=3] [java] Aug 13, 2009 8:52:15 AM org.jboss.messaging.core.logging.Logger warn [java] WARNING: Connection failure has been detected: Did not receive ping from server for org.jboss.messaging.integration.transports.netty.NettyConnection@1ce3570c[local= /172.16.8.12:41737, remote=/172.16.8.10:5445] [code=3] [java] Aug 13, 2009 8:52:20 AM org.jboss.messaging.core.logging.Logger error [java] SEVERE: Failed to handle failover [java] MessagingException[errorCode=3 message=Timed out waiting for response when sending packet 32] [java] at org.jboss.messaging.core.remoting.impl.ChannelImpl.sendBlocking(ChannelImpl.java:312) [java] at org.jboss.messaging.core.client.impl.ClientSessionImpl.handleFailover(ClientSessionImpl.java:751) [java] at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.reattachSessions(ConnectionManagerImpl.java:794) [java] at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.failoverOrReconnect(ConnectionManagerImpl.java:647) [java] at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.handleConnectionFailure(ConnectionManagerImpl.java:518) [java] at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.access$600(ConnectionManagerImpl.java:81) [java] at org.jboss.messaging.core.client.impl.ConnectionManagerImpl$DelegatingFailureListener.connectionFailed(ConnectionManagerImpl.java:1218) [java] at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl.callFailureListeners(RemotingConnectionImpl.java:393) [java] at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl.fail(RemotingConnectionImpl.java:244) [java] at org.jboss.messaging.core.client.impl.ConnectionManagerImpl$FailedConnectionAction$1.run(ConnectionManagerImpl.java:1174) [java] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [java] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [java] at java.lang.Thread.run(Thread.java:619) [java] javax.jms.JMSException: Connection is destroyed [java] at org.jboss.messaging.core.remoting.impl.ChannelImpl.sendBlocking(ChannelImpl.java:240) [java] at org.jboss.messaging.core.client.impl.ClientConsumerImpl.doCleanUp(ClientConsumerImpl.java:745) [java] at org.jboss.messaging.core.client.impl.ClientConsumerImpl.close(ClientConsumerImpl.java:315) [java] at org.jboss.messaging.jms.client.JBossMessageConsumer.close(JBossMessageConsumer.java:153) [java] at org.jboss.messaging.jms.client.JBossSession.close(JBossSession.java:275) [java] at org.jboss.messaging.jms.client.JBossConnection.close(JBossConnection.java:259) [java] at org.jboss.jms.soak.example.reconnect.SoakReconnectableReceiver.disconnect(SoakReconnectableReceiver.java:186) [java] at org.jboss.jms.soak.example.reconnect.SoakReconnectableReceiver.access$000(SoakReconnectableReceiver.java:41) [java] at org.jboss.jms.soak.example.reconnect.SoakReconnectableReceiver$2.onException(SoakReconnectableReceiver.java:99) [java] at org.jboss.messaging.jms.client.JBossConnection$JMSFailureListener$1.run(JBossConnection.java:550) [java] at java.lang.Thread.run(Thread.java:619) [java] Caused by: MessagingException[errorCode=2 message=Connection is destroyed] [java] ... 11 more [java] javax.jms.JMSException: Timed out waiting for response when sending packet 30 [java] at org.jboss.messaging.core.remoting.impl.ChannelImpl.sendBlocking(ChannelImpl.java:312) [java] at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.createSession(ConnectionManagerImpl.java:326) [java] at org.jboss.messaging.core.client.impl.ClientSessionFactoryImpl.createSessionInternal(ClientSessionFactoryImpl.java:977) [java] at org.jboss.messaging.core.client.impl.ClientSessionFactoryImpl.createSession(ClientSessionFactoryImpl.java:721) [java] at org.jboss.messaging.jms.client.JBossConnection.createSessionInternal(JBossConnection.java:463) [java] at org.jboss.messaging.jms.client.JBossConnection.createSession(JBossConnection.java:160) [java] at org.jboss.jms.soak.example.reconnect.SoakReconnectableReceiver.connect(SoakReconnectableReceiver.java:213) [java] at org.jboss.jms.soak.example.reconnect.SoakReconnectableReceiver.access$100(SoakReconnectableReceiver.java:41) [java] at org.jboss.jms.soak.example.reconnect.SoakReconnectableReceiver$2.onException(SoakReconnectableReceiver.java:100) [java] at org.jboss.messaging.jms.client.JBossConnection$JMSFailureListener$1.run(JBossConnection.java:550) [java] at java.lang.Thread.run(Thread.java:619) [java] Caused by: MessagingException[errorCode=3 message=Timed out waiting for response when sending packet 30] [java] ... 11 more
4. on the server, the number of remoting connection (and the memory) explodes:
[java] **** Server Dump **** [java] date: Thu Aug 13 08:41:57 EDT 2009 [java] heap memory: used=1.46 GB, max=1.88 GB [java] non-heap memory: used=1.46 GB, max=1.88 GB [java] # of thread: 62 [java] # of conns: 112 [java] JMS topics: [java] soakTopic: 17 subscription (17 non-durable) receiving 8337499 message [java] ********************
5. The exception continues on the subscribers
6. Things are messy on the server-side with more subscriptions than there are clients:
[java] **** Server Dump **** [java] date: Thu Aug 13 08:43:57 EDT 2009 [java] heap memory: used=1.70 GB, max=1.88 GB [java] non-heap memory: used=1.70 GB, max=1.88 GB [java] # of thread: 45 [java] # of conns: 260 [java] JMS topics: [java] soakTopic: 23 subscription (23 non-durable) receiving 10414915 message [java] ********************
6. the server start to clean up remoting connections and related resources
[java] [Thread-0 (group:JBM-scheduled-threads-2030193044)] 08:44:25,717 WARNING [org.jboss.messaging.core.remoting.server.impl.RemotingServiceImpl] Did not receive initial ping from /172.16.8.15:38164, connection will be closed [java] [Thread-0 (group:JBM-scheduled-threads-2030193044)] 08:44:25,717 WARNING [org.jboss.messaging.core.remoting.server.impl.RemotingServiceImpl] Did not receive initial ping from /172.16.8.12:49942, connection will be closed
7. I also have some netty exception on the server:
[java] [New I/O server worker #1-2] 08:44:31,281 WARNING [org.jboss.netty.channel.socket.nio.NioWorker] Unexpected exception in the selector loop. [java] org.jboss.netty.channel.ChannelException: Failed to register a socket to the selector. [java] at org.jboss.netty.channel.socket.nio.NioWorker$RegisterTask.run(NioWorker.java:754) [java] at org.jboss.netty.channel.socket.nio.NioWorker.processRegisterTaskQueue(NioWorker.java:260) [java] at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:199) [java] at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:53) [java] at org.jboss.messaging.integration.transports.netty.VirtualExecutorService$ChildExecutorRunnable.run(VirtualExecutorService.java:184) [java] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [java] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [java] at java.lang.Thread.run(Thread.java:619) [java] Caused by: java.nio.channels.ClosedChannelException [java] at java.nio.channels.spi.AbstractSelectableChannel.configureBlocking(AbstractSelectableChannel.java:252) [java] at org.jboss.netty.channel.socket.nio.NioWorker$RegisterTask.run(NioWorker.java:739) [java] ... 7 more
8. Failed connections now are unable to clean up related resource on the server
[java] [Thread-3 (group:JBM-scheduled-threads-2030193044)] 08:47:55,595 SEVERE [org.jboss.messaging.core.management.impl.ManagementServiceImpl] Failed to call listener [java] java.lang.IllegalStateException: Cannot find queue info for queue b685ebae-d3b7-4a4d-81f1-862366829b1f8b8e8cc7-8806-11de-9b94-0015178e6d7c [java] at org.jboss.messaging.core.postoffice.impl.PostOfficeImpl.onNotification(PostOfficeImpl.java:387) [java] at org.jboss.messaging.core.management.impl.ManagementServiceImpl.sendNotification(ManagementServiceImpl.java:647) [java] at org.jboss.messaging.core.server.impl.ServerConsumerImpl.close(ServerConsumerImpl.java:272) [java] at org.jboss.messaging.core.server.impl.ServerSessionImpl.close(ServerSessionImpl.java:339) [java] at org.jboss.messaging.core.server.impl.ServerSessionImpl.doHandleClose(ServerSessionImpl.java:2217) [java] at org.jboss.messaging.core.server.impl.ServerSessionImpl.handleClose(ServerSessionImpl.java:882) [java] at org.jboss.messaging.core.server.impl.ServerSessionImpl.connectionFailed(ServerSessionImpl.java:1177) [java] at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl.callFailureListeners(RemotingConnectionImpl.java:393) [java] at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl.fail(RemotingConnectionImpl.java:244) [java] at org.jboss.messaging.core.remoting.server.impl.RemotingServiceImpl$FailedConnectionAction.run(RemotingServiceImpl.java:480) [java] at org.jboss.messaging.core.remoting.impl.Pinger.run(Pinger.java:116) [java] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) [java] at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) [java] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) [java] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) [java] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) [java] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) [java] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [java] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [java] at java.lang.Thread.run(Thread.java:619)
9. During that time, the producers started to see missing pings from the server:
10. When things settle down, all clients are stuck
[java] **** Server Dump **** [java] date: Thu Aug 13 09:22:10 EDT 2009 [java] heap memory: used=1.70 GB, max=1.87 GB [java] non-heap memory: used=1.70 GB, max=1.87 GB [java] # of thread: 30 [java] # of conns: 16 [java] JMS topics: [java] soakTopic: 2 subscription (2 non-durable) receiving 0 message [java] ********************
I did the same run with <block-non-persistent-send> set to true and I did not get any error.
To clear things up, no servers were killed/restarted during these runs.