9 Replies Latest reply on Sep 3, 2010 1:38 AM by doychin

    JBoss Remoting and compression

    doychin

      I've spent some time digging into compression and I managed to make it work with JBoss 4.2.3 and Remoting 2.2.2SP8.

      The problem I found was with an error previously described in another post.
      http://www.jboss.com/index.html?module=bb&op=viewtopic&t=134557

      The problem comes from some extra bytes put in the output stream from compression and not read from decompression.

      These extra bytes still there after unmarshalling is completed. Next time when unmarshaller reads version from the input stream it reads these extra bytes and generates the error for incorrect version.

      In most recent versions I see that BufferedInputStream is used to workaround this problem but I'm not sure this will work always.

      There is always a small chance that the serialized compressed data that contains these extra bytes is just a few bytes longer then the size of the buffer in the buffered input stream. So after unmarshalling is complete they will still in the socket input stream and lead again to same version error. Of course the use of BufferedInputStream reduces the chance for this error but it is still exists.

      What I did to make sure it will never happen was to put the size of of the compressed data before it in the output stream and during the unmarshaling to read that size and then read the specified amount of bytes from the input stream.

      Also another problem I found with current solution is that it leaks lots of memory when it is used to transfer big amounts of data for short period of time. The reason is that GZip streams depend on some native code routines in JVM and to release the resources used by these routines streams depend on the garbage collector to call their finalize methods.

      Another way to release that method is to call close method on the GZip output stream but this will also close the underlying stream(in this case it is the buffered stream which will close the socket output stream).

      In order to avoid that I used instead ByteArrayOutputStream on the marshaling side and ByteArrayInputStream on the unmarshaling side to store the compressed data to byte array and then to write that byte array to the socket output stream/read from input stream to byte array and then pass that byte array to GZip input stream.

      Now when I have separated the socket streams from GZip streams I can call close on these in order to release the memory used during the compress/decompress process.

      I hope these comments will help other people to avoid the same problems I faced already.

      Doychin Bondzhev

        • 1. Re: JBoss Remoting and compression
          ron_sigal

          Hi Doychin,

          Thank you for looking into this problem.

          "doychin" wrote:

          The problem comes from some extra bytes put in the output stream from compression and not read from decompression.


          Comparing the version of CompressingMarshaller on the 2.2 branch (from which release 2.2.2.SP8 was derived) with the version on the 2.x branch (from which 2.4/2.5 releases are derived) I see that there's a problem which I fixed only on the 2.x branch. In particular, CompressingMarshaller.write() ends

           gzos.finish();
           oos.flush();
          


          on the 2.2 branch and it ends

           oos.flush();
           bos.flush();
           gzos.flush();
           gzos.finish();
          


          on the 2.x branch. I suspect that calling gzos.finish() before oos.flush() is what leaves extra bytes unwritten. Could you try running with the 2.x version and see if that fixes the problem?

          "doychin" wrote:

          Also another problem I found with current solution is that it leaks lots of memory when it is used to transfer big amounts of data for short period of time. The reason is that GZip streams depend on some native code routines in JVM and to release the resources used by these routines streams depend on the garbage collector to call their finalize methods.


          I did not know that. I see that David Lloyd made a suggestion on thread "Compression marshalling" (http://www.jboss.com/index.html?module=bb&op=viewtopic&t=134557&postdays=0&postorder=asc&start=10). Perhaps his suggestion is directed at this issue.



          • 2. Re: JBoss Remoting and compression
            ron_sigal

            I created JIRA issue JBREM-1077 "Fix problem in CompressingMarshaller" (https://jira.jboss.org/jira/browse/JBREM-1077) for this problem.

            • 3. Re: JBoss Remoting and compression
              doychin

              In order to allow compression to work with EJB 2.x invocations I created my own Compression marshaller and unmarshaller which descend from original classes.

              The default constructor for both classes now call the constructor that takes one parameter of type Marshaller/UnMarshaller with new InvocationMarshaller()/InvocationUnMarshaller().

              I also did new versions of read/write methods which I use to test different variants for calling GZip streams.

              From my experiments I can tell that switching

              oos.flush(); and gzos.finish();

              does not help.

              I'm still getting exceptions for incorrect version in the stream. But I also found a solution which is in the code below.

              In order to workaround the other problem with OutOfMemory exception I created new GZip input/output stream classes that descend from the original Java classes.

              in my Output stream I added new finish method

              public void finish() throws IOException
               {
               super.finish();
               def.end(); // This will release all resources used by zlib
               }
               }


              in the input stream I added new method

              public void end() throws IOException
               {
               while (available() > 0) { // This tell the gzip input stream to read the extra trailer put by finish method in output stream. This all removes the need to use buffered stream like in 2.x branch
               read();
               }
               inf.end();
               }
              


              and I call this new method at the end of read method.

              this way I can use original java GZip classes without having to relay on external libraries.

              - inf and def are protected fields in Java 1.5 and 1.6

              If you want I can provide you with full source code so you can use it to create the necessary updates in 2.x and 2.2 branches.

              Doychin

              • 4. Re: JBoss Remoting and compression
                ron_sigal

                Hi Doychin,

                "doychin" wrote:

                If you want I can provide you with full source code so you can use it to create the necessary updates in 2.x and 2.2 branches.


                That would be great. Could you attach the source files to the JIRA issue at https://jira.jboss.org/jira/browse/JBREM-1077?

                Thanks,
                Ron

                • 5. Re: JBoss Remoting and compression
                  doychin

                  You can find the source code of compression invoker marshaller/unmarshaller as attachments in the JIRA report.

                  Doychin

                  • 6. Re: JBoss Remoting and compression
                    ron_sigal

                    Thanks!

                    • 7. Re: JBoss Remoting and compression
                      ron_sigal

                      I've applied Doychin's fix to branch 2.2 (for release 2.2.3.SP1) and branch 2.x (for release 2.5.2).

                      Previews of jboss-remoting.jar from each of these branches are attached to JBREM-1077 if anyone one wants to test the new versions.

                      I've tested the changes with a sample EJB3 that copies strings to the server and back. It is also attached to JBREM-1077.

                      • 8. Re: JBoss Remoting and compression
                        ron_sigal

                        There's another chapter to the story.

                        I noticed recently that NewCompressingMarshallerTestCase, which runs in about half a minute on my Windows laptop, takes about 13 minutes on one of the Red Hat linux test machines. It turns out that the GZipOutputStream constructor takes an order of magnitude longer in linux than in Windows. I've changed CompressingMarshaller so that it reuses a single GZipOutputStream but replaces the Deflater with each call to write(). CompressingUnMarshaller has the symmetric changes with respect to Inflater. Now NewCompressingMarshallerTestCase takes about 18 seconds on my Fedora laptop.

                        I've attached to JBREM-1077 "Fix problem in CompressingMarshaller" a copy of jboss-remoting.jar from the 2.x branch with the changes.

                        • 9. Re: JBoss Remoting and compression
                          doychin

                          Looks like compression is not good for RMI especially when you have fast network. Even the smallest call to the server takes to much time to complete.

                          Application looks slow. After removing compression it is now like a rocket ;-)