I did some performance testing with data relevant to our use case (note: useLockStriping is disabled in my cache config, and numOfOwners for distributed mode is 2). In my testing, I initialized a number of base data into the cache, and this initialization is done on all involving nodes with even load on each node and using its own key set without collision (for example, for a total data set of 1M, when 10 nodes participate, each node initializes 100k data with its own key set, say node1's keys start with "node1_..." & node2's key start with "node2_..."). After initialization, each node does a reading test at the same time, and then an updating test at the same time. During the updating test, each node updates a subset of data initialized by itself (with the above example, when each node updates 1% of data it initialized, it will be 1k, and the total data being updated on 10 nodes will be 10k). I got the following questions from my testing results.
1. When the base data set increased from 100k to 1M (the stored data are our custome objects), updating the same total amount of data took significant longer time (say when updating a total amount of 10k data with each node updating 1k for 10 nodes at the same time, on average it took around 70 milliseconds for 100k base data but near 600 milliseconds for 1M base data. Such kind of difference made me concerned about the scalability. Could you please suggest the factors that contribute to the difference?
2. When tested with the same base data set and updating the same total amount of data, but different node numbers. I was expecting it took less time for 10 nodes than 5 nodes as each node gets less updating load. However, when the stored data in the cache is Float, 10 nodes shows a little better result; and when the stored data are our custom objects, 10 nodes results are even worse. Any suggestions on what's causing such a result?
3. When tested with same base data set, same updating data amount on 10 nodes, the Distributed L1 mode performed similar to Replicated mode while I was expecting Distributed L1 outperforms. Not sure if 10 nodes can not show the difference, or other reason?
4. I tested with TCP vs. UDP comm, for the same scenario their performances are pretty close. While I am surprised by the good performance by TCP, I am wondering why UDP didn't do a better job?
I would appreciate your responses.