-
1. Re: Potential Netty problem with synchronously closing conne
timfox Apr 30, 2009 2:53 AM (in response to timfox)As a note if I add a small sleep (100 ms) between each connection creation and close, then the test will run ad-infinitum with no problems.
The test is org.jboss.test.messaging.jms.ConnectionTest::testConnectionListenerBug -
2. Re: Potential Netty problem with synchronously closing conne
trustin May 11, 2009 4:09 AM (in response to timfox)I ran the mentioned test with increase loop count (20000) and could not reproduce the problem. I'm running it Integer.MAX_VALUE time now. How long did it take for you to make it run out of handle? What is the OS?
-
3. Re: Potential Netty problem with synchronously closing conne
trustin May 11, 2009 4:14 AM (in response to timfox)Interestingly, the test hangs at around 59000 with the following message:
May 11, 2009 5:12:18 PM org.jboss.messaging.core.logging.Logger warn WARNING: Connection failure has been detected Did not receive ping on connection. It is likely a client has exited or crashed without closing its connection, or the network between the server and client has failed. The connection will now be closed.:3
Not sure this is a Netty problem or not, but I still can't get a 'too many open files' error. -
4. Re: Potential Netty problem with synchronously closing conne
trustin May 11, 2009 4:54 AM (in response to timfox)I was able to reproduce the issue finally using the following simple test code:
import java.net.InetAddress; import java.net.InetSocketAddress; import java.net.Socket; import java.util.concurrent.Executors; import java.util.concurrent.atomic.AtomicInteger; import org.jboss.netty.bootstrap.ServerBootstrap; import org.jboss.netty.channel.ChannelFactory; import org.jboss.netty.channel.ChannelHandlerContext; import org.jboss.netty.channel.ChannelPipelineCoverage; import org.jboss.netty.channel.ChannelStateEvent; import org.jboss.netty.channel.ExceptionEvent; import org.jboss.netty.channel.MessageEvent; import org.jboss.netty.channel.SimpleChannelUpstreamHandler; import org.jboss.netty.channel.socket.nio.NioServerSocketChannelFactory; public class Test { public static void main(String[] args) throws Throwable { ChannelFactory sf = new NioServerSocketChannelFactory( Executors.newCachedThreadPool(), Executors.newCachedThreadPool(), 1); ServerBootstrap sb = new ServerBootstrap(sf); sb.getPipeline().addLast("handler", new ServerHandler()); sb.setOption("backlog", 1024); sb.bind(new InetSocketAddress(8080)); InetAddress dstAddr = InetAddress.getByName("localhost"); long startTime = System.currentTimeMillis(); for (int i = 0; i < 1048576; i ++) { Socket s = new Socket(dstAddr, 8080); s.close(); if ((i + 1) % 1000 == 0) { long endTime = System.currentTimeMillis(); System.err.println(i + 1 + ": TOOK " + (endTime - startTime) + " MS"); startTime = endTime; } } sb.releaseExternalResources(); } @ChannelPipelineCoverage("all") static class ServerHandler extends SimpleChannelUpstreamHandler { private static final int THRESHOLD = 4; private final AtomicInteger cnt = new AtomicInteger(); private final AtomicInteger lateCnt = new AtomicInteger(); @Override public void channelOpen(ChannelHandlerContext ctx, ChannelStateEvent e) throws Exception { if (cnt.incrementAndGet() > THRESHOLD) { lateCnt.incrementAndGet(); } } @Override public void messageReceived(ChannelHandlerContext ctx, MessageEvent e) { // Wait until the client closes the connection. } @Override public void channelClosed(ChannelHandlerContext ctx, ChannelStateEvent e) throws Exception { if (cnt.decrementAndGet() == THRESHOLD) { System.err.println("LATE CLOSURES: " + lateCnt.getAndSet(0)); } } @Override public void exceptionCaught(ChannelHandlerContext ctx, ExceptionEvent e) { System.err.println(e); e.getCause().printStackTrace(); e.getChannel().close(); } } }
Sometimes, 'LATE CLOSURES' goes up to around 2000, which can lead to 'too many open files' for a normal user. I set the ulimit value of my local account to 10240, so I was not getting the error.
Let me try to optimize Netty so that it doesn't go up too much, but there's inevitable indeterminism because it's non-blocking I/O. -
5. Re: Potential Netty problem with synchronously closing conne
trustin May 11, 2009 4:57 AM (in response to timfox)"trustin" wrote:
Let me try to optimize Netty so that it doesn't go up too much, but there's inevitable indeterminism because it's non-blocking I/O.
.. which means the ulimit value of the normal user of the production environment must be increased anyway. It's simple to bump it up: http://gleamynode.net/articles/1557/ -
6. Re: Potential Netty problem with synchronously closing conne
trustin May 12, 2009 2:07 AM (in response to timfox)I have made the proof-of-concept modification on Netty that suspends ServerSocket.accept() when the number of connections reaches a certain threshold, and it seems to make the ConnectionTest.testManyConnections() pass according to Tim.
I'm currently designing more generic interface that allows you to:
* suspend / resume ServerSocket.accept()
* close the accepted connection immediately
* close the accepted connection immediately with a good bye (e.g. 'server busy', 'go away') message
The point of this modification is to prune malicious connections as early as possible so that the server does not run out of file descriptors and to temporarily suspend ServerSocket.accept() if necessary. Therefore, a time-consuming tasks cannot be executed in this context.
Please let me know if you like it or not. If not, let me know what features are missing.
Thanks in advance! -
7. Re: Potential Netty problem with synchronously closing conne
trustin May 12, 2009 2:43 AM (in response to timfox)Related Netty forum thread: http://n2.nabble.com/Pruning-the-accepted-connections-at-the-earliest-stage-possible.-td2867545.html
-
8. Re: Potential Netty problem with synchronously closing conne
trustin May 15, 2009 4:17 AM (in response to timfox)I've just modified Netty so that Channel.close() in channelConnected handler method can close the connection immediately not causing the too many open files error, and started to wonder if you really want to suspend accepting an incoming connection instead of closing them quickly.
Even if you suspended an accept operation, the incoming connections are accepted by the operating system and hold in the backlog of the server socket. If the backlog is full, further connection attempt from the client will fail. When you resume the accept operation, the oldest connections will be served while the latest ones have been failed. I'm not sure this is the expected behavior. I would close the connection so that the client can retry later and all connections are served anyway rather than being timed out due to the full backlog. WDYT?