"ivelin" wrote:"ivelin" wrote:
Here is a task, that Scott and Bela thought it might be good to do.
Implement a JMX HeartWatch service, which monitors the basic system resources - CPU Utilization and Memory. If any of them is starved for a long time, broadcast a JMX notification.
Two helper services will be interested in these notifications:
- System restart service, which will ask the JBoss kernel to unload all modules and redeploy from scratch.
- Email notifier, which will use the mailer service to send a message to the administrator.
The use cases for this service are production systems which have a slow memory leak or a rare time spiking scenario, which is hard to reproduce in the lab. The heartwatch service will keep these systems going while the problem is being identified and resolved. Apache HTTPD and ASP.NET offer similar service.
Here are the pieces that need to be implemented:
1) Scheduling based CPU estimate. Schedule a regular heartbeat task which will measure the time between two runs. If the delay is over the scheduled interval for a prolonged (configurable) time, then broadcast JMX notification.
2) Memory monitor. A similarly scheduled task which measures the available memory and if it aproaches a certain limit, will send a Warning JMX Notification. If it reaches a critical limit, it will send an Alarm notification. The latter will probably cause the kernel to redeploy all modules.
2) Out-Of-Memory life saver. A soft-referenced buffer of 100K (exact size to be determined), which will give enough room for the kernel to restart the modules in case of a memory starvation.
3) Server restart service. Should use the Server.shutdown() method and then start(). Some refactoring of the ServerImpl may need to occur to prevent the shutdown() from exiting the VM.
If there is a way to determine which module is the offending one, then the kernel should only redploy that modules. However there does not seem to be a pure-Java way to do this currently.
If you are interested to take on this task, please holler.
I will try to assist.