Thanks Cyrille for the logs. A similar optimization to the one I made in https://github.com/galderz/hibernate-core/commit/fc38e943e7cec4173db3d443ae41c792724c2ad9 is needed but this time for the timestamp cache update. I'm working on it right now and will push it asap.
Btw Cyrille, I hope to get some time in the next couple of days to apply these enhancements to all put (and other modifying) operations so that you don't encounter these issues further. I'll post back when this is done.
Cyrille, I've pushed a more global solution for all put/remove operations, to make them work with the clustered cache loader much better. The actual commit is https://github.com/hibernate/hibernate-core/commit/66f555e52afa6bac6fb0cdc9d08eb8f4d89fd2bf and you can pick the latest code from https://github.com/hibernate/hibernate-core/commits/3.6
Thanks a lot for all that stuff.
There's no more bug neither at starting, neither during tests.
Yes, there is a "but" :-(
I will need more complete tests to confirm all that, but i encounter TimeoutExceptions during a STRESS test (on my PC i run 2 nodes, 32 threads in each and 1000 requests for each thread). It seems that network communication inside Infinispan becomes the bottleneck, while JBossCache manages to pass the same test.
Btw, i join the stdout log of the 2 nodes.
I will try to do more tests and comparisons to give you more facts.
Cyrille Charron added log files.
My comments seem not included with my last files joined. So i retype this here :
I confirm my stress problems with Infinispan while JBossCache manages to pass the stress (see the last 4 *-stdout.txt files for trace of these tests).
I tried to set the same configuration between Infinispan and JBossCache (see joined files infinispan-configs.xml and jpacache-jbc2-configs.xml) :
LockAcquisitionTimeout at 30 s
SyncReplTimeout at 40 s
maxNodes at 100000
ConcurrencyLevel at 100000
useLockStriping at false
For jgroups config, i use the default one in infinispan, and i joined the one used with JBossCache (see jpacache-jgroups-stacks.xml).
I have currently near no knowledge about jgroups, but i resolved my last TimeoutExceptions during stress tests, by using exactly the same jgroups options as those used with JBossCache test, i.e. the TCP stack.
So, i will no more annoy you with this issue :-)
Thanks for all.
Glad to you got around your issues with the different JGroups stack. At first glance, the main difference between the default stack and the tcp stack is the thread pool size at the transport level. The default has 8 threads and the TCP one has 25, maybe those 32 threads you're executing were holding each other up accessing that thread pool. This kind of issues can be spotted by getting thread dumps in all nodes involved in the cluster when TimeoutExceptions are reported. If the sender is holding on something, you might see it there. These dumps might also show if the receiver side is blocking on something...
Cyrille, would it be possible to share your stress test code and full configuration for us so that we can use it to catch potential errors in the future? We'd really appreciate this
Yes, i have now the agreement from my hierarchy :-)
Here the source code of the 2 projects : FamilyModel which code database entities, InfiniSpanTest which code bench tests.
InfiniSpanTest currently tests only Entity and Collection caches, i have not yet migrated my Query tests in it.
To run a test under Eclipse, you have to start the database via startHsqldb.bat, then to run the 2 application tests InfiniSpanRun-1 and InfiniSpanRun-2.
To run a test under MSDOS, you have to assembly the zip file (mvn clean assembly:assembly -DskipTests=true), then expand it in 3 different directories (to have distincts log files). In the first dir you will start the database (via startDatabase.bat), in the second dir you will start the first node (via runNode0.bat), in the third dir you will start the second node (via runNode1.bat).
I have added system properties (which appear in bat files of the assembly, but not yet in Eclipse Application Run launch files) which you can modify in command line such as "-Dtest.nbThread=32 -Dtest.nbRequest=500 -Dtest.nbLoopRead=5 -Dtest.nbPersonPerFamily=3" to adapt the stress test.
On my computer (Windows XP 32 bits, with 2 GO RAM), the test fails with UDP jgroups stack. The cause is not apparently the thread pool size (i tried 50 and 25 with nearly the same performances), but the swap file of Windows XP. I observed that with parameters "-Dtest.nbThread=32 -Dtest.nbRequest=500 -Dtest.nbLoopRead=5 -Dtest.nbPersonPerFamily=3" i consume only 2 Go on TCP jgroup stack while i consume 3 Go on UDP jgroup stack. I have nearly 30% better performances with TCP stack than UDP stack when i use "-Dtest.nbRequest=300" (stress test does not fail).
In maven dependencies, i was forced, for some days, to set openwebbeans explicit versions manually because openejb SNAPSHOT version was not up to date. When openejb snapshot is OK, all openwebbeans dependencies and exclusions in the pom file can be deleted.
I join also the generated zip file, usable out of the box :-)
Sorry, i cannot join the zip file. Too big (30 Mb).
I've sent you a private message.
Cyrille, I've just realised that this is private discussion that people cannot read. Please make sure you open discussions in the Infinispan space next time: http://community.jboss.org/en/infinispan?view=discussions
Have i a means to open this discussion ?
I saw, for example, the command "Convertir le fil de discussion en document", i.e. convert the discussion into document. Would it be the solution ?
Don't worry about it. I've emailed the JBoss.org guys to figure out how to move this discussion to the Infinispan space.