6 Replies Latest reply on Mar 3, 2009 10:27 PM by clebert.suconic

    ByteBuffer.allocateDirect ridiculously slow

    clebert.suconic

      I just found out ByteBuffer.allocateDirect is ridiculously slow, that's why the Buffer reuse on Journal made a lot of difference on the performance for us.

       final int TIMES = 50;
      
       final long numberOfIteractions = 1000000;
      
      
       public void testJustAllocateBufferDirect() throws Exception
       {
      
       long start = System.currentTimeMillis();
      
       for (int c = 0; c < TIMES; c++)
       {
       for (long i = 0; i < numberOfIteractions; i++)
       {
       if (i == 10000)
       {
       start = System.currentTimeMillis();
       }
       MessagingBuffer bufferSend = ChannelBuffers.wrappedBuffer(ByteBuffer.allocateDirect(1024));
      
       bufferSend.writeBytes(new byte[1024]);
       }
      
       long spentTime = System.currentTimeMillis() - start;
      
       System.out.println("Time BuffersDirect = " + spentTime);
       }
      
       }
      
       public void testOurBuffers() throws Exception
       {
      
       long start = System.currentTimeMillis();
      
       for (int c = 0; c < TIMES; c++)
       {
       for (long i = 0; i < numberOfIteractions; i++)
       {
       if (i == 10000)
       {
       start = System.currentTimeMillis();
       }
       ChannelBuffer bufferSend = ChannelBuffers.wrappedBuffer(AsynchronousFileImpl.newNativeBuffer(1024));
      
       bufferSend.writeBytes(new byte[1024]);
      
       AsynchronousFileImpl.destroyBuffer(bufferSend.toByteBuffer());
       }
      
       long spentTime = System.currentTimeMillis() - start;
      
       System.out.println("Time JBM JNI Buffers = " + spentTime);
       }
      
      



      On the Above test, ByteBuffer.allocateDirect needed 100 seconds to complete the 1 million buffers.

      While our code, allocating ByteBuffers directly through JNI, needed 2 seconds to complete 1 million buffers.


      That *is* ridiculous.


      The new Buffers were slicing the direct ByteBuffer, what affected the capacity, breaking buffer reuse, what raised this issue to my eyes.

      I have made a few changes on the Journal, where we will allocate buffers directly, and I could get very good numbers with some simple tests. (Even thought I was not targeting any optimizations.. just looking after this bug on ByteBuffer).

      We can talk about this tomorrow on the meeting.

        • 1. Re: ByteBuffer.allocateDirect ridiculously slow
          timfox

          It's a well known fact that direct bytebuffer allocation is much slower than non direct byte buffers.

          If you think about it when you allocate a non direct byte buffer then it basically just needs to dereference a pointer to some memory on the Java heap, which is very quick. But for a direct buffer, it has to malloc real memory from the OS, and do a bunch of other house keeping.

          So yes, if you're using direct byte buffers it pays to re-use them.

          You should speak to Trustin about this. I believe Netty uses non direct buffers for exactly this reason.

          • 2. Re: ByteBuffer.allocateDirect ridiculously slow
            timfox

            From the java.nio.ByteBuffer javadoc (my emphasis added):


            A direct byte buffer may be created by invoking the allocateDirect factory method of this class. The buffers returned by this method typically have somewhat higher allocation and deallocation costs than non-direct buffers. The contents of direct buffers may reside outside of the normal garbage-collected heap, and so their impact upon the memory footprint of an application might not be obvious. It is therefore recommended that direct buffers be allocated primarily for large, long-lived buffers that are subject to the underlying system's native I/O operations. In general it is best to allocate direct buffers only when they yield a measureable gain in program performance.


            • 3. Re: ByteBuffer.allocateDirect ridiculously slow
              clebert.suconic

               

              "timfox" wrote:
              It's a well known fact that direct bytebuffer allocation is much slower than non direct byte buffers.

              If you think about it when you allocate a non direct byte buffer then it basically just needs to dereference a pointer to some memory on the Java heap, which is very quick. But for a direct buffer, it has to malloc real memory from the OS, and do a bunch of other house keeping.

              So yes, if you're using direct byte buffers it pays to re-use them.

              You should speak to Trustin about this. I believe Netty uses non direct buffers for exactly this reason.



              I need a direct buffer on AIO, because of memory alignments and JNI invocations.

              But there shouldn' t be a reason for being that slow. If I called malloc/posix_mem_align (which is the same as malloc, but aligned) and created the direct buffer myself using the memory position, and a JNI method, it would be way much faster (100x as my test showed).

              • 4. Re: ByteBuffer.allocateDirect ridiculously slow
                clebert.suconic

                Jason told me on a pvt email that there is a Sleep(100ms) and a System.gc call for every direct buffer you create. wow!

                • 5. Re: ByteBuffer.allocateDirect ridiculously slow
                  clebert.suconic

                   

                  It's a well known fact that direct bytebuffer allocation is much slower


                  I knew that already. But I didn't expect it to be 100X slower.

                  I have changed the code to allocate and free buffers directly, and performance won't be a problem for these buffers any more.


                  • 6. Re: ByteBuffer.allocateDirect ridiculously slow
                    clebert.suconic

                    Since this could be affecting the testsuite, I have changed ServiceTestBase and other tests to use NIO until I can speed up the buffer creations. (what would take 1/2 to 1 day of work)

                    ByteBuffer.createNative will probably have some affect on GC, what could cause few timeouts on the testsuite.