1 Reply Latest reply on Jan 20, 2017 1:49 PM by robmv

    Wildfly 10.1 CLOSE_WAIT/can't identify protocol

    bobzer

      Hi.

      After upgrade to Wildfly 10.1 from Wildfly 10.0 i have file descriptors leak on production environment. I have same configuration on both 10.0 and 10.1.

       

      Previously, 10.0 version has file descriptors leak too. I was able to reproduce this leak by physically terminate connection from the client - after that, all descriptors, taked by this client, are stay occupied forever. I fix this leak by adding tcp-keep-alive="true" to http-listener ([WFLY-3536] Wildfly 8.1.0 Final keeps established connections forever - JBoss Issue Tracker ).

       

      Then I was faced with memory leak in SSL ([WFLY-6380] Memory leak and freeze (io.undertow.protocols.ssl.SslConduit) - JBoss Issue Tracker ) and decide to upgrade to 10.1 version.

       

      The 10.1 has file descriptors leak with another symptoms, and faster, than in 10.0. I run lsof command and see many records with CLOSE_WAIT, like (changed real domain to myhost.com):

      java    29821 root 4089u  IPv4          462742734      0t0       TCP myhost.com:https->37-221-202-146.obit.ru:6821 (CLOSE_WAIT)

      java    29821 root 3236u  IPv4          462731802      0t0       TCP myhost.com:http->baiduspider-180-76-15-148.crawl.baidu.com:14464 (CLOSE_WAIT)

      Note, that there is both - http and https protocols.

       

      Another bunch of records is:

      java    29821 root 3217u  sock                0,6      0t0 462731371 can't identify protocol

       

      File descriptors count grow to 2500-4500 and after that server not response to incoming connections, but inner jobs works perfectly.

       

      I can't reproduce this leak on test environment. I do load testing and connection terminating, but descriptors count not grown above 1700 and i don't see records with "CLOSE_WAIT" or "can't identify protocol". I'l stopped 10.1 version and run 10.0 version on the production server, and don't see "CLOSE_WAIT" or "can't identify protocol" records at all.

       

      The only difference in http settings compared with clean installation - is adding "tcp-keep-alive="true" read-timeout="60000" write-timeout="60000"" to http-listener and https-listener. This settings migrated from 10.0 to fix WFLY-3536.

       

      What can i do to prevent file descriptors leak on Wildfly 10.1?

        • 1. Re: Wildfly 10.1 CLOSE_WAIT/can't identify protocol
          robmv

          I am having the same problem, unable to reproduce it on test environment. The only way I can reproduce something like the original problem gives me lsof reporting "protocol: TCP" instead of "can't identify protocol". No connections show on netstat related to those on lsof. Those connections that remain forever as "protocol: TCP" are connections made by the Oracle javaws client (Windows) when downloading updated application jars, even if all client processes exit.

           

          Updating Wildfly Undertow modules to 1.4.8 (latest available) solves this case discovered on tests. Will deploy to production after more tests in order to determine if it fixes the "can't identify protocol" connections.