Could you try 1.1.3? it should bring some improvements.
Just tried with a fresh build of 1.1.3, but unfortunately the results are very similar. I've cooked up a patch on top of 1.1.0.Final (attached) that seems to improve things by caching host and context entries in the heap. I am seeing up to 10x improvement in certain cases. Could you please take a look if the approach makes sense and it would be meaningful to port it to 1.1.3? (the patch is not applicable to 1.1.3 directly due to significant rework in mod_proxy_cluster.c)
I have created a JIRA and I will integrate the patch in the next version. Many thanks.
I recently upgraded to 1.2.0.Final with the patch included, and found that the lookup performance is still not quite where expected: with 200 instances I still see over 90ms of latency per http transaction on average.
Looking at mod_proxy_cluster.c in 1.2.0.Final I noticed that worker lookup now touches "node" structures within a tight loop and it is stilll coming from the shared memory. I took the liberty of mirroring the caching approach already applied to "context" and "host", and that brought the latency down to about 10ms in that test. What is more important for scalability, it became less dependent on the table size.
I was wondering if you could consider adding the extra caching (proxy_node_table/read_node_table) to the upstream of mod_cluster for further releases?
Adding extra caching for the node looks a good idea. Create a JIRA and submit a patch ;-)
the initial JIRA was MODCLUSTER-252.