3 Replies Latest reply on May 13, 2009 2:36 PM by jochen.reinhardt

    How to deal with huge uploaded file in JBoss

      Hi folks,

      I'm writing a medium-scale application for one of our customers, using JBoss 5.02, JSF, tomahawk and EJB3 with a MySQL database. I need to do a file import - one big file - through the web interface (JSF file upload) and handle the file contents in EJB stateless sessions. The problem is that the size of the file is thus huge, that when I process it, I receive OutOfMemoryExceptions.

      My approach is to handle the import asynchronously - as it may take several hours. So I use JMS to send a message for asynchronous processing.
      First, I tried to generate a JMS-Message with all the data inside. But that's not feasable for the life system. I also did not like that the file content was saved to the database - as part of JMS transaction handling. I also assume that the file content was kept in memory several times - by JSF, by JMS and may be by EJB stateless session bean.

      My second approach was to store the uploaded file in a temporary file and just pass in the file path with the message. When receiving the the message I just need to load the file from disk again in the JMS MessageListener. But this fails - my test file is > 100MB - reading it into a byte array caused OutOfMemoryException. I'm also not sure, if this already violates the EJB spec. Is accessing files allowed from within JMS-Message Listeners and / or EJB session beans? Or is that some restriction related to clustered setups?

      My preferred way would be to open an InputStream to the file and pass that in to a Stateless Session Bean local method call. That should be possible - as the input stream is passed by reference for local calls. But I'm not sure if that is supported in JBoss.

      Please advise!

      Jochen



        • 1. Re: How to deal with huge uploaded file in JBoss
          peterj

          Is there a way to to transfer the file in chunks? For example, transfer 1MB, handle that 1MB in the EJB, and then transfer the next 1MB. This would avoid having to have the entire huge file in memory at one time.

          The only other thing I can think of is to ftp the file to the server and have the server open the file there.

          By the way, most likely the file is in memory twice - once in its serialized state, and once in its object state. Any other "copies" are probably references to the object state. But the serialized state could use twice the memory room of the object state.

          • 2. Re: How to deal with huge uploaded file in JBoss

            We have add to implement a solution for that. I don't know if this will fit your need since we doesn't have to handle big file has yours. however, I have tested with file bigger than 100M and it was working with a -Xmx set to a lower value.

            we use struts. the struts form use a field type FormFile.

            from this field type, we can have acces to an input stream instead of storing the file in a byte array. Then you can handle the file without having to store it in a variable. We are ourself writing the file in a blob database field using blob stream feature.

            I am not sure it is clear enought. I hope this will help

            An Phong Do

            www.solabs.com

            • 3. Re: How to deal with huge uploaded file in JBoss

              Hi and thanks for your responeses,

              First of all, it works ;-) Just by saving the file in the
              I found several other problems in the app:

              - I was using a wrong version of tomahawk
              - Tomahawk seems to ignore setting for file size thresholds. This resulted in a FileUploadHtmlComponent caching 100 MB in memory. Have to submit a bug report for this...
              - I had to use a hardly documented attribute in the file upload tag (found in tomahawk's source code).

              I don't like the idea of splitting the files into chunks. That's just because I am a bit lazy and I want to keep things as clean and as simple as possible. And with junking I do not solve my general problem. That is using a JMS Message to process the file asynchronously. I would not want to send a message for each chunk. That would not be fast enough. And I would end up having JMS dumping the chunks to the database.

              Next time I start a new application I will consider using struts. But for this project, I'm already in to deep using myFaces. And I've never heard about BlobStreams. Is that part of some JDBC spec? But it seems to make good sense - as I won't be able to hold that much data in one row of a result set.

              For now, I'll keep things as they are - I like using the simple FileInputStreams, consuming only minimum memory. Buffering is done by some sick CSV-library. I thought I had to die as I took a look into the sources... but as long as it works...

              Any hint about standards violation?
              Accessing a file from disk in MessageListner?
              Passing a FileInputStream to EJB3 stateless session bean method call?

              I guess I just don't care - as long as it works.

              Thanks again for your hints - I had not suspected tomahawk to be the major problem in that case. So after all, this post seems to be somewhat off-topic here...

              So long,
              Jochen