Yes, this is possible. You actually used sharding, where you distributed the data set over 2 separate clusters. The bottleneck here would be the CPU which is shared by both processes on the same host, and the bandwidth. if you have a GB network, then the 2 caches have to share the ~ 125MB/sec max (modification) throughput.
For your test, this isn't an issue, but if you increase the write modifications, it may become one. In that case, you could always route each cluster over a separate network, e.g. to a different switch.
I gave a try to your test, and measured a high cost in marshalling your rather complex Object. On top of that, marshalling String in a platform independent way (as Infinispan does) is pretty expensive as well, you might get significant improvements using a custom Externalizer. Changing from String to byte should help as well.
Thanks you for your replies.
Currently we are pretty much pleased with the 16 X 2 TX / ms rate.
I have one more question though.
Is it possible to make a setup, similar to the parallel two clusters that I've shown before,
in which only machine A will hold a configuration that will automatically connect the node on host B? (without configuration files on node B)
I hope my question is clear enough.
A quick update on this: dirst of all, I packaged the test case and placed it onto GitHub: git://github.com/belaban/InfinispanLoadTest. Hope you don't mind Liron !
I benchmarked async REPL and DIST mode some more, and here are my results:
- 1 passive, 1 active: 70 TXs / ms
- 2 passives, 1 active: 59 TXs / ms
- 1 passive, 1 active: 23 TXs / ms
- 2 passives, 1 active: 2.7 TXs / ms
I think the main reason why DIST is not as fast as REPL is that we do *not* use a replication queue in DIST mode. However, compared to the 70 TXs / ms for async REPL, the 23 TXs / ms for async DIST should remain the same with increasing cluster size (REPL will decrease).
Along the way, I think I found a bug in DIST, related to L1 cache management, so the 23 TXs / ms are without an L1 cache (which you shouldn't need in a 2 node case anyway!)...
The 2.7 TXs / ms number is probably affected by that bug, too, so I'll test some more after we've fixed the bug.
I'll keep you posted,
No, that's not currently possible.
OK, here's a final update from me on my findings with this test.
Referring to my previous post, I mentioned we only got 2.7 TXs /ms with 2 passives and 1 active node.
The reason is ... the test is pathetic ! :-)
*1* Transaction does the following:
- 2 GETs. As they key is not yet there, those GETs are synchronous RPCs to the owner(s) - 2 of the 2 nodes as numOwners=3
- 1 PUT. Asynchronous, no cost
- 4 GETs. The first GET might be a synchronous RPC is the key is not stored locally, not found in the L1 cache. The remaining 3 GETs are all served from the L1 cache, so they're local
- 1 REMOVE. Async, no cost
- 4 GETs. *All* 4 GETs are synchronous RPCs to 2 nodes, as the key was removed
So this means we have between 6 and 7 synchronous GETs *per transaction* ! A synchronous RPC takes 0.17ms on my box, so *1* Transaction should take roughly 1 ms.
This means we could only do ca. 1000 Transactions / sec. Since I'm using 10 threads, the max I could possibly get is 10'000 TXs / sec.
This is the theoretically max number of Transactions (by 10 threads) per second, but in reality I'm getting 2'700 Transactions / sec.
The different can be attrributed to a few things:
- The performance of synchronous GETs was measured on JGroups only. This test (UPerf) doesn't do anything with the keys/values. In contrast, Infinispan stores the keys/values, needs to provide synchronization around access, and - in general - has to do way more work than the JGroups test which simply discards the values (no memory needed either)
- We're running 3 nodes on the same box, so all 3 nodes are competing for CPU and memory
Liron: it's good you sent us this pathetic use case, so we could investigate (and found one memory leak), so this helped us overall !
However, in a real environment, you'd wrap your modifications into a JTA transaction or use a batch. This would certainly improve performance.