2 Replies Latest reply on Feb 14, 2017 4:22 AM by davsclaus

    How to increase the performance for SFTP file polling consumer

    ravishankarhassain

      I am trying to create a file polling route which will poll the source SFTP server and transfer the file to the destination SFTP server.

      The file count is huge and the source can generate up to 270 - 300 files per second where as the file size is in KB.

       

       

      I am planing to have 3 instance (say A, B, C) of the same route in active mode where each route is capable of handling and transferring 100 files per each poll.

      Once a file is picked for processing by say instance A then the same file should not be picked for processing by the rest 2 active instance Instance B & C.

       

       

      I have created the below route and this route is transferring files at the rate of 5-6 files per second.

       

        from("sftp://user:password@source.sftp.server.com/inputpassword@source.sftp.server.com/input"

             + "?readLock=changed"

             + "&readLockMinAge=1m"

             + "&readLockTimeout=70000"

             + "&readLockCheckInterval=5000"

             + "&delay=1000"

             + "&preMove=processing"

             + "&maxMessagesPerPoll=100"

             + "&move=../archive"

             + "&localWorkDirectory=temp"

             + "&stepwise=false"

             + "&include=.*(txt)$")

        .threads(30, 35)

             .log(LoggingLevel.INFO, "downloading files from Source SFTP Server")

             .to("sftp://user:password@destination.sftp.server.com/outputpassword@destination.sftp.server.com/output")

        .end()

       

      Is there any other configuration that needs to be applied to achieve the processing speed of 100 files per poll and to process the file between instance in an

      intelligent and efficient manner.

       

       

      Any help suggestion or pointers is much appreciated.

        • 1. Re: How to increase the performance for SFTP file polling consumer
          davsclaus

          The current implementation of readLock=change is a bit slow as it does a change check per individual file one by one. What would be better to speedup would be a holistic pass on all the files and then detect when files are no longer changed faster.

           

          A caveat is that if you must poll the files in a specific order then the first file must be ready before the 2nd etc and therefore this faster approach does not work for everybody.

           

          Such a functionality is currently not in Apache Camel. As a Fuse customer you can use the Red Hat Customer Portal to log a Enhancement request to have this in the roadmap list for the product.

          • 2. Re: How to increase the performance for SFTP file polling consumer
            davsclaus

            An alternative is to build your own custom read lock implementation that works faster using the approach I discussed above.