11 Replies Latest reply on Apr 16, 2009 1:25 AM by Lars Heinemann

    Binary FilePoller Only Delivering Part of File

    Terrance Crow Newbie

      Good afternoon!

       

      FUSE ESB seems to be only moving part of a binary file from the in directory to the out directory. For example, it's only forwarding the first 4-5Mb of a 170Mb file to the out directory.

       

      I've installed ESB 3.3.1.16 in standalone mode (i.e., not a war) on Centos Linux 5.2 (32-bit). It's in the default location. I've created an ID called servicemix as the ID in which to run FUSE ESB, and that ID owns the entire FUSE ESB /opt/iona directory tree.

       

      Using FUSE ESB's file-binding example as a model, I setup a binary polling component with these directories under /opt/iona/fuse-esb (fuse-esb being a symoblic link to fuse-esb-3.3.1.16):

       

      interstell

      interstell/test001-archive

      interstell/test001-in

      interstell/test001-out

       

      When I place a small file in the in directory, FUSE ESB happily deposits a copy in the archive and moves the original to the test001-out directory. It does the same thing for a 2Mb file. When I try to copy a 170Mb file, though, it only copies the first 4-5Mb of the file before it just stops.

       

      The program copying the data doesn't crash or show any error; it keeps copying. The target directory for the copy is test001-in. I've tired using scp from another Linux box (and Mac OS X box) and tried cp from the same Linux box that's running FUSE ESB. The results are the same.

       

      I've tried varying the Period parameter, and that helps; but if the file doesn't copy within the period (or before the period kicks off a poll), the file gets cut off.

       

      It seems to me that the problem's with the lockManager, but I'm not clear on how to make sure the default lockManager is being invoked. I found and tried to implement a variation of the suggestion in this URL, which describes how to override the default lockManager:

       

      http://www.mail-archive.com/users@servicemix.apache.org/msg06682.html//www.mail-archive.com/users@servicemix.apache.org/msg06682.html

       

      I didn't make my own lockManager, but I wanted to specify org.apache.servicemix.common.locks as lockManager the class; however, I get an error saying that I'm specifying an interface, not a class, so I clearly don't have the right information.

       

      Rather than continue to flail, and in the hopes that I can't possibly be the only person to have encountered this issue despite my inability to find any similar posts after more time searching that I care to admit, I decided to post this question.

       

      Has anyone seen this before? Am I missing something obvious?

       

      My goal is to implement a simple file moving application as our first installation of FUSE ESB. I thought that would be a simple introduction to the configuration and management process. At least, after reading Open Source ESBs in Action, I thought this would be an easy implementation! It's turning out to be otherwise.

       

      I appreciate any feedback anyone can provide.

       

      Thank you!

        • 1. Re: Binary FilePoller Only Delivering Part of File
          Torsten Mielke Apprentice

          I am not seeing this problem here when testing with FUSE ESB 3.4.0.1 (using servicemix-file version 2008.01.0.2-fuse) on Windows. With the following configuration that is inspired by the Binary File Marshaler example large files get processed correctly.

           

            <file:poller service="id772:file-poller"
              endpoint="endpoint"
              targetService="id772:file-sender"
              targetEndpoint="endpoint"
              autoCreateDirectory="true"
              archive="file:///D:/Temp/TestCases/Forum/ID772/fileDir/archive"
              file="file:///D:/Temp/TestCases/Forum/ID772/fileDir/from">
                
              <property name="marshaler">
                <bean class="org.apache.servicemix.components.util.BinaryFileMarshaler"></bean>
              </property>
            </file:poller>              
                        
                        
            <file:sender service="id772:file-sender"
              endpoint="endpoint"
              directory="file:///D:/Temp/TestCases/Forum/ID772/fileDir/to"
              autoCreateDirectory="true">
              
              <property name="marshaler">
                <bean class="org.apache.servicemix.components.util.BinaryFileMarshaler"></bean>
              </property>
            </file:sender>
          </beans>
          

           

          I tried up to 450MB of file sizes and did not encounter any problems.

          • 2. Re: Binary FilePoller Only Delivering Part of File
            Torsten Mielke Apprentice

            Attaching my little testcase here. Feel free to run it in your env.

            • 3. Re: Binary FilePoller Only Delivering Part of File
              Terrance Crow Newbie

              Thank you for taking the time to reply!

               

              It looked like the major difference between your configuration and mine was the version of FUSE ESB. I upgraded to the 3.4.0.1, added my configuration to servicemix.xml, ran ant setup from the examples/file-binding directory to get the needed jar, and tried my tests again.

               

              Interestingly enough, this time the whole file make it to the archive directory, but only 44Mb of it made it to the out directory.

               

              Here's the snipped from servicemix.xml:

               

               

               

              If I reduce the period value to 1000, which is the default, the out directory had 166Mb of the file (archive still had the entire file). On a second run, the out directory had 55Mb, so it looks like the polling period's hitting the file differently each time.

               

              Multiple tests show the out directory getting varying amounts of the file and the archive directory getting the entire file.

               

              The console's not showing any errors.

               

              I guess I could specify the real output directory as the "archive", but a) that wouldn't give me a real archive and b) it doesn't solve the problem. I can't imagine this is expected behavior, especially given your experience.

               

              The only other difference between your configuration and mine was how you specified the directories -- you used the full path and prefixed it with file://. I tried that and reran my tests.

               

              It worked once. The second time, it cut the out directory version of the file off at 150Mb out of 170Mb. I checked the size multiple times over a period of 30 seconds to make sure it wasn't just spooling. The fourth time I ran it, the entire file made it.

               

              I'm not sure how to interpret what I'm seeing.

               

              Might you have other suggestions on what I can try or where else I can look for diagnostic information?

               

              Thanks again for taking the time to reply!

              • 4. Re: Binary FilePoller Only Delivering Part of File
                Torsten Mielke Apprentice

                That behavior is rather strange indeed. I will try your configuration snippet next but in the meantime can you please run my demo as it is (perhaps change the directory names of the file poller and sender) using FUSE ESB 3.4.0.1-fuse? It should work correctly just as it did for me.

                Another difference between our environments is the OS. I might need to try on Linux as well.

                • 5. Re: Binary FilePoller Only Delivering Part of File
                  Torsten Mielke Apprentice

                  Also tested using your servicemix.xml configuration on Windows and Linux (SuSE 10.3 using ext3 filesystem)  and the results are positive. It works every time for me.

                   

                  Are you sure you are not running out of disk space or so on the file system?

                   

                  I attach my servicemix.xml that I used for testing.

                  • 6. Re: Binary FilePoller Only Delivering Part of File
                    Terrance Crow Newbie

                    Thank you very much for responding so quickly and for building a demo for me to test.

                     

                    I've loaded it into my test environment, and I think I have it installed correctly. At least, it seems to run!

                     

                    2/3 of the time, the 170Mb file makes it intact to the out directory. All of the time, the archive file is intact. The demo you assembled even assigns a random filename to prevent overwrites; I like that feature.

                     

                    I checked my available disk space, and the drive has over 3Gb free. Since the demo and my original configuration work sometimes, I don't think it's disk space related.

                     

                    My CentOS 5.2 installation is running under VMware Workstation 5.5.7, which is running under XP Pro SP3 with all current patches. I didn't think this was relevant because I can honestly say it's never been a problem before with anything I've done. However, I think I need to rule it out, especially since everything you've tried is working.

                     

                    I'm going to try my original configuration on Windows XP (no virtualization). I'll then try it under another Linux installation on a different virtualization platform.

                     

                    I'll report back with my findings.

                     

                    Again, thank you very much for taking the time to respond and to craft a demo service assembly/service unit.

                    • 7. Re: Binary FilePoller Only Delivering Part of File
                      Terrance Crow Newbie

                      I was able to get FUSE ESB 3.4.0.1 running under Windows XP Pro SP3. I tried the bare bones configuration that I posted, and it worked most of the time. The one strange behavior was that if the file already existed in the archive location, FUSE ESB would constantly delete and recreate the file in the out box. That looks like a configuration issue to me. As soon as I deleted the archive version, FUSE ESB could finish copying the in file to the out directory and created the archive directory copy, too.

                       

                      I was also able to get your demo working under Windows after adapting the file locations, and it seemed to work just fine! The only issue was that if the poll cycle forced FUSE ESB to try copying the source file while XCOPY was still running (to place the large 170Mb file in the in directory), the console would scroll several "file in use" warning messages. FUSE ESB wouldn't copy the inbox file until I stopped/restarted the ESB server. Again, to me that seems like a configuration issue on my part.

                       

                      I wasn't able to get another Linux box up under another virtual server. I was trying to get Oracle VM working, and I've not worked with it or any other Xen-based solution before, so I couldn't move quickly. I'm still going to try that.

                       

                      However, I think I've seen enough to be reasonably sure that for the first time, I'm seeing a situation where the combination of Linux CentOS 5.2 (32-bit) and VMware Workstation 5.5.7/XP SP3 aren't playing well together. There might be another explanation, and I hope there is, but I'm not seeing it.

                       

                      Again, thank you very much for your help! Not only does it look like the problem's solved, but I learned quite a bit about troubleshooting FUSE ESB.

                       

                      Thanks!

                      • 8. Re: Binary FilePoller Only Delivering Part of File
                        Ulhas Bhole Novice

                        I was also able to get your demo working under Windows after adapting the file locations, and it seemed to work just fine! The only issue was that if the poll cycle forced FUSE ESB to try copying the source file while XCOPY was still running (to place the large 170Mb file in the in directory), the console would scroll several "file in use" warning messages. FUSE ESB wouldn't copy the inbox file until I stopped/restarted the ESB server. Again, to me that seems like a configuration issue on my part.

                         

                        Just a note on copy operation. In general, copy operation is not an atomic operation so you should normally be using move/mv command to put the file in poll directory.

                         

                        /Ulhas

                        • 9. Re: Binary FilePoller Only Delivering Part of File
                          Terrance Crow Newbie

                          I'm glad you said something about not using copy.

                           

                          I just finished installing FUSE ESB on Oracle Linux 5.3 64-bit on a Sun Ultra 20 with 3Gb of RAM. No virtualization.

                           

                          I installed the demo that tmielke provided and tried my test cases. The tests up to 170Mb worked fine, but it occurred to me that the copy times were really fast compared to the Linux I was running under VMware. So, I tried a 3.4Gb ISO.

                           

                          I got the exact same symptoms I had in my original post!

                           

                          Based on your post, I tried moving the ISO into the inbox directory instead of copying it. When I moved it, the ISO worked just fine.

                           

                          So, it appears I can't copy data into the input directory.

                           

                          That means I need to stage the files on the file system first. For instance, if I have programs that output text files and I want to use FUSE-ESB to route them, I need to spool them first to a hold directory, then move them to the FUSE ESB input directory.

                           

                          Am I thinking correctly?

                           

                          Thank you very much for your feedback! I think this whole exchange has really helped me move forward with FUSE ESB.

                          • 10. Re: Binary FilePoller Only Delivering Part of File
                            Ulhas Bhole Novice
                            terrancecrow wrote:

                            I'm glad you said something about not using copy.

                             

                            I was bitten by this atomicity problem multiple times in past so it is kind of etched in my brain.

                            That means I need to stage the files on the file system first. For instance, if I have programs that output text files and I want to use FUSE-ESB to route them, I need to spool them first to a hold directory, then move them to the FUSE ESB input directory.

                             

                            Am I thinking correctly?

                             

                            Yes, that would be a better option to do even though it is 2 stage copying somewhere and then moving it, this will make sure file poller won't pickup half written file.

                            If you are using filter on file-poller then you can use the same folder to store file with some extension which will be filtered out and then moving it to actual file.

                             

                            Ulhas Bhole

                            • 11. Re: Binary FilePoller Only Delivering Part of File
                              Lars Heinemann Novice

                              As far as I can remember this problem was adressed in ServiceMix already. It got fixed in servicemix-file at 2008-12-08. The file poller should normally try to find out if the file is still copied or already fully copied over. (via FileUtil.isFileFullyAvailable(...))

                               

                              protected void pollFile(final File aFile) {
                                      if (logger.isDebugEnabled()) {
                                          logger.debug("Scheduling file " + aFile + " for processing");
                                      }
                                      if (!FileUtil.isFileFullyAvailable(aFile)) {
                                          if (logger.isDebugEnabled()) {
                                              logger.debug("The file " + aFile + " is still being copied. Skipping...");
                                          }
                                          // skip the file because it is not yet fully copied over
                                          return;
                                      }
                              ...