9 Replies Latest reply on Sep 1, 2008 5:37 AM by ataylor

    Message Chunking Performance problems

    ataylor

      I've put together some figures comparing the performance of the current trunk and the message chunking work that I've been doing. It was compiled using the perf example, both sender and listener, and uses the default settings ( apart from message count ) and were all run on the same machine, my laptop. These figures can be found here, http://wiki.jboss.org/wiki/_Files/JBM2HandlingLgeMessages/messageChunkingComp.ods.

      Basically there are 3 sets of data, the trunk, the message branch with an initial buffer size of 1k and a max buffer size of 64k, and the message branch with an initial buffer size of 1k and max buffer size of 1k, NB. Using different initial and max buffer sizes means we need auto growing buffers.

      Basically the results show that if we send lots of small buffers, say 1k, it has a massive impact om performance but if we increase the max buffer size to 64k there is little or no difference.

      These results are limited so if anyone has some decent hardware it would be good to see some more figures.

      I was thinking we could make the initial and max buffer size configurable and set it to a default of 1k and 64k respectively.

      I also tested the speed of basic writing and reading from the buffer, for 100,000,000 bytes it took 7857/1048 Milliseconds for the branch and 8918/319 for the trunk for writing and reading.

        • 1. Re: Message Chunking Performance problems
          ataylor

          FYI, I did 2 comparisons for both Mina and Netty

          • 2. Re: Message Chunking Performance problems
            ataylor

            Ive just tried running with 1k initial buffer and 1k max buffer size with remoting-tcp-nodelay set to false and the performance is good, in fact same as trunk.

            So this being the case i think we should make the max buffer size configurable and then we just need to decide on an out of the box config,

            • 3. Re: Message Chunking Performance problems
              clebert.suconic

              Why do we still need PacketImpl.INITIAL_BUFFER_SIZE?

              We can aways calculate how many bytes we need to send any of our packets. On ClientMessages, we just get the getEncodeSize(), and add a few bytes... Most of the other Packets have constant sizes.

              Looking at RemotingConnectionImpl:

              private void doWrite(final Packet packet)
               {
               if (destroyed)
               {
               throw new IllegalStateException("Cannot write packet to connection, it is destroyed");
               }
              
               MessagingBuffer buffer = transportConnection.createBuffer(PacketImpl.INITIAL_BUFFER_SIZE);
              
               packet.encode(buffer);
              
               transportConnection.write(buffer);
               }
              
              


              We could add the getEncodeSize() on the Packet interface... a parameter with the size on doWrite... any other similar thing as long as we don't allocate buffers to later dispose then. At the rates we are achieving with JBossMessaging the Buffer alloc times really matters (on the tests I have made so far).



              • 4. Re: Message Chunking Performance problems
                clebert.suconic

                 

                as long as we don't allocate buffers to later dispose then


                I mean... as long as we don't need to allocate buffers to increase the size of a buffer.



                Ive just tried running with 1k initial buffer and 1k max buffer size with remoting-tcp-nodelay set to false and the performance is good, in fact same as trunk.



                For instance, a 1K message (used on perfSender) will require a buffer bigger than 1K. The Buffer will need 1K to carry the message.body + extra bytes required by the encoding.

                • 5. Re: Message Chunking Performance problems
                  ataylor

                   

                  We can aways calculate how many bytes we need to send any of our packets. On ClientMessages, we just get the getEncodeSize(), and add a few bytes

                  You're right to a certain degree, however different transport implementations may write some objects to their internal buffers slightly differently, i.e strings, utf etc. but yes we could make an educated guess as to the size of buffer we need.

                  • 6. Re: Message Chunking Performance problems
                    ataylor

                     

                    We can aways calculate how many bytes we need to send any of our packets. On ClientMessages, we just get the getEncodeSize(), and add a few bytes... Most of the other Packets have constant sizes.


                    Ive added this to the message chunking branch, we now only create the buffers to the size we need. The max buffer size in configurable via the jbm-config file via the 'remoting-max-buffer-size' attribute. Ive ran the the perf examples with max buffer size set to both 1k and 64k with tcp-nodelay enabled and disabled and the results were on par with trunk on my lap top.

                    I wouldn't mind some real figures if anyone with a couple of machines and a switch minds running some with some different settings?

                    • 7. Re: Message Chunking Performance problems
                      timfox

                      Have you tried tuning the default buffer size so that it can fir in a single IP packet.

                      You could use wireshark to look at the packets (this is very straightforward).

                      • 8. Re: Message Chunking Performance problems
                        timfox

                        Mind you, I've just realised you're using TCP loopback, so all these discussions about MTU are moot.

                        http://en.wikipedia.org/wiki/Jumbo_Frame

                        • 9. Re: Message Chunking Performance problems
                          ataylor

                          You're right, I think loopback uses 32k sized packets. I don't have any hardware, i.e. a server and switch, to be able to test this any further.