Dear Domonkos, this is indeed serious. Do I get it right that there were only 2 contexts served and 1 worker node connected?
We had this issue: MODCLUSTER-372 Number of registered contexts negatively affects mod_cluster performance
As it is clear from the title, the performance regression was linked to the amount of registered contexts.
Please, share your Apache HTTP Server configuration files (mod_cluster, httpd conf and anything you find relevant). I'm especially interested in the SSL settings with regard to the mod_cluster configuration. Furthermore, we would need to know the exact version of your Apache HTTP Server, mod_cluster modules and operating system.
We will try to simulate the environment and reproduce the issue.
Thank you for answering so fast!
There was only one context registered and one worker node connected.
My mod_cluster conf looks like this:
The IP address 10.30.3.3 is managed by Corosync, and it's purpose is only to make mod_cluster advertise on the right network interface. Users come in on a different interface (10.30.2.3, managed by Corosync too), using SSL. The two sites enabled are default Apache sites (default-ssl and default). "Default SSL" is bound to 10.30.2.3, uses the snakeoil certificate, nothing has been changed in it. "Default" is bound to 10.30.3.3 and has EnableMCPMReceive in it so it can communicate with the worker node.
Apache version is: 2.2.22-1ubuntu1.4
OS: Ubuntu Server 12.04.4 x64
mod_cluster version is 1.2.6 Final x64 from here:
mod_cluster 1.2.6.Final bin Downloads - JBoss Community
file says this about one of the downloaded binaries:
/usr/lib/apache2/modules/mod_advertise.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=0x70a7a6492bb71a91daf7ab2d435740e334088575, not stripped
One more interesting thing is that we naturally did some testing earlier, and used JMeter to load-test our test environment that uses the same configuration. JMeter was slow, so we turned off SSL and that made everything faster (possible JMeter's SSL implementation problem, at least that's what we thought): 300 users simultenously were logged in and used the system without any major hiccups. To still check and test everything I used 'ab' to benchmark Apache's throughput (using SSL here!): 20 000 requests on 15 threads, the requests were all directed to our systems login page (so they went through mod_cluster). No problems were detected, the throughput was fine and the load wasn't major on the server.
I just started to wonder, and thought these 3 things should be mentioned here:
1. I am using the default apache installation on Ubuntu, which is MPM-worker. I have found some threads saying that it is fine, but I still have a really small doubt in my heart whether mod_cluster is thread safe or not.
2. There are 3 apache instances running on this very same VM, 2 of them using mod_cluster with different configs (so there is a /etc/apache2-test and there is a /etc/apache2-prod directory, containing different configs)
3. I have this in my apache2.conf:
The test index.html was a proxied file or local httpd one?
If proxied than the problem is probably in the back-end.
It was a local file, in Apache's /var/www folder. Also the URL /mod_cluster-manager took 30 seconds to load. I am really suspicious about the MaxClients setting in apache2.conf, because the same kind of slowness appeared later yesterday (when using ProxyPass) and I was able to solve it by increasing the setting from 150 to 400. I checked via Apache's /server-status URL and I saw there were no idle workers left causing the slow down.
Ok it isn't a mod_cluster problem then.
You probably have an issue in the JBoss nodes, probably one application is getting slow and block the whole system.
use netstat -na to check how many connections are opened between httpd and JBoss probably that number is increasing when you see the slow down also check the load on the JBoss nodes.
I'm sorry, but I don't fully understand your point: if there is a problem with the JBoss node being slow then why would serving a file from /var/www (that has nothing to do with JBoss or mod_cluster) be slow too? I'm almost sure that it was an Apache problem, maybe a mod_cluster problem (but again since I found the MaxClients option I am having a strong feeling that it was the cause of our problem).
The problem is that I can't try to switch back to mod_cluster because the system is live and in production and we have users constantly using it, and currently I wasn't able to find a way to reproduce the issue using JMeter. I will keep trying to reproduce the issue somehow.
Today we accidentally switched back to AJP and mod_cluster from ProxyPass and the problem occured again. It is now obvious that the MaxClients directive has been the problem all along: if there are no idle workers left the slow down happens (not suprisingly). It is not a mod_cluster issue, it is a pure Apache issue.
So to fix such a slow down increase the number of MaxClients, and if needed the ServerLimit too in apache2.conf/httpd.conf. To calculate to right amount have look at how much memory one apache2 thread uses and then multiply it by the number you are willing to set for MaxClients. Don't forget to save some RAM for the OS too. Example: you have 2 GB of RAM, each apache thread uses around 2MB of RAM, so if you set MaxClients to 800 apache will consume around 1,6 GB so 400 MB is left for other processes/OS.
You need to monitor 127.0.0.1/server-status on your apache server to check if you have any idle workers left or not. If you have less than 10 idle workers it is time to increase MaxClients.
thank you for all your help!