Strings Experiments...
clebert.suconic Feb 19, 2009 7:53 PMI have done some experiments with Strings & Encoding today, and this thread is a summary of what I have found. We can talk about this at our daily iRC meeting tomorrow.
I'm reusing a buffer (ChannelBufferWrapper, same buffer used by our Netty channel), and doing the process for about 1.000.000 times.
I have compared a few different ways of serializing strings: putUTF, putSimpleString, putNewSimpleString, putString, putStringNewWay.
putNewSimpleString: I"m instantiating a new SimpleString on every write. Just because it wasn't fair to the other methods).
putSimpleString : Aways reusing the same SimpleString.
putStringNewWay: This is getting the bytes from the String, in the same way as SimpleString is doing. I'm doing that just to measure a possible optimization.
And these are the results I got:
putUTF = 2928 milliseconds putNewSimpleString = 1497 milliseconds putSimpleString = 98 milliseconds putString = 2999 milliseconds putStringNewWay = 1468 milliseconds
We are being able to persist 1 million Strings in 3 seconds with the UTF. We could optimize it to probably around 50%.
We could for sure optimize putString, using the same idea Tim used to extract bytes on SimpleString.
This is the test I'm using:
import junit.framework.TestCase; import org.jboss.messaging.integration.transports.netty.ChannelBufferWrapper; import org.jboss.messaging.util.SimpleString; public class UTF8Test extends TestCase { private final String str = "abcdef&^*&!^ghijkl\uB5E2\uCAC7\uB2BB\uB7DD\uB7C7\uB3A3\uBCE4\uB5A5" + "abcdef&^*&!^ghijkl\uB5E2\uCAC7\uB2BB\uB7DD\uB7C7\uB3A3\uBCE4\uB5A5" + "abcdef&^*&!^ghijkl\uB5E2\uCAC7\uB2BB\uB7DD\uB7C7\uB3A3\uBCE4\uB5A5" + "abcdef&^*&!^ghijkl\uB5E2\uCAC7\uB2BB\uB7DD\uB7C7\uB3A3\uBCE4\uB5A5" + "abcdef&^*&!^ghijkl\uB5E2\uCAC7\uB2BB\uB7DD\uB7C7\uB3A3\uBCE4\uB5A5" + "abcdef&^*&!^ghijkl\uB5E2\uCAC7\uB2BB\uB7DD\uB7C7\uB3A3\uBCE4\uB5A5" + "abcdef&^*&!^ghijkl\uB5E2\uCAC7\uB2BB\uB7DD\uB7C7\uB3A3\uBCE4\uB5A5" + "abcdef&^*&!^ghijkl\uB5E2\uCAC7\uB2BB\uB7DD\uB7C7\uB3A3\uBCE4\uB5A5" + "abcdef&^*&!^ghijkl\uB5E2\uCAC7\uB2BB\uB7DD\uB7C7\uB3A3\uBCE4\uB5A5"; private final SimpleString simpleStr = new SimpleString(str); int TIMES = 5; // Attributes ---------------------------------------------------- // Static -------------------------------------------------------- // Constructors -------------------------------------------------- // Public -------------------------------------------------------- long numberOfIteractions = 1000000; public void testUTFOnBufferWrapper() throws Exception { ChannelBufferWrapper buffer = new ChannelBufferWrapper(10 * 1024); long start = System.currentTimeMillis(); for (int c = 0; c < TIMES; c++) { for (long i = 0; i < numberOfIteractions; i++) { if (i == 10000) { start = System.currentTimeMillis(); } buffer.rewind(); buffer.putUTF(str); } long spentTime = System.currentTimeMillis() - start; System.out.println("spentTime UTF = " + spentTime); } } public void testPutNewSimpleString() throws Exception { ChannelBufferWrapper buffer = new ChannelBufferWrapper(10 * 1024); for (int c = 0; c < TIMES; c++) { long start = System.currentTimeMillis(); for (int i = 0; i < numberOfIteractions; i++) { if (i == 10000) { start = System.currentTimeMillis(); } buffer.rewind(); buffer.putSimpleString(new SimpleString(str + i)); } long spentTime = System.currentTimeMillis() - start; System.out.println("spentTime PutNewSimpleString = " + spentTime); } } public void testPutSimpleString() throws Exception { ChannelBufferWrapper buffer = new ChannelBufferWrapper(10 * 1024); for (int c = 0; c < TIMES; c++) { long start = System.currentTimeMillis(); for (int i = 0; i < numberOfIteractions; i++) { if (i == 10000) { start = System.currentTimeMillis(); } buffer.rewind(); buffer.putSimpleString(simpleStr); } long spentTime = System.currentTimeMillis() - start; System.out.println("spentTime PutSimpleString = " + spentTime); } } public void testPutString() throws Exception { ChannelBufferWrapper buffer = new ChannelBufferWrapper(10 * 1024); for (int c = 0; c < TIMES; c++) { long start = System.currentTimeMillis(); for (int i = 0; i < numberOfIteractions; i++) { if (i == 10000) { start = System.currentTimeMillis(); } buffer.rewind(); buffer.putString(str + i); } long spentTime = System.currentTimeMillis() - start; System.out.println("spentTime putString = " + spentTime); } } public void testPutStringNewWay() throws Exception { ChannelBufferWrapper buffer = new ChannelBufferWrapper(10 * 1024); for (int c = 0; c < TIMES; c++) { long start = System.currentTimeMillis(); for (int i = 0; i < numberOfIteractions; i++) { if (i == 10000) { start = System.currentTimeMillis(); } buffer.rewind(); buffer.putStringNewWay(str + i); } long spentTime = System.currentTimeMillis() - start; System.out.println("spentTime putStringNewWay = " + spentTime); } } }
And this is the proposal for putString:
(one funny thing, If I placed this loop on a method, I would increase the total time for about 0.5 second for the 1 million messages).
public void putStringNewWay(final String nullableString) { flip(); int len = nullableString.length(); byte[] data = new byte[len << 1]; int j = 0; for (int i = 0; i < len; i++) { char c = nullableString.charAt(i); byte low = (byte)(c & 0xFF); // low byte data[j++] = low; byte high = (byte)(c >> 8 & 0xFF); // high byte data[j++] = high; } buffer.writeInt(data.length); buffer.writeBytes(data); buffer.readerIndex(buffer.writerIndex()); }
We can talk about these numbers on the IRC meeting tomorrow.