Unexpected behavior behind admin console causes server to hang
jeff.scott Oct 28, 2016 2:14 PM<tldr>
I've been digging into an issue we've had when a single application was deployed, but not others. Basically when going to look at the application in the admin console the webpage would hang (though you can go to other tabs) and the application itself would slow down a lot. I started logging all sql queries and found out doing that started running thousands of queries like:
SELECT * FROM JOB_EXECUTION WHERE JOBINSTANCEID=? ORDER BY JOBEXECUTIONID
This application is our only one that uses batch so it makes sense that it's the one we'd have problems with- and it has over 100k job_execution rows and seems to query every single one.
</tldr>
Using the standalone's website and going to the Runtime tab ends up triggering
ModelControllerImpl.execute() with an operation param like this:
{
"operation" => "read-children-resources",
"address" => undefined,
"child-type" => "deployment",
"include-runtime" => true,
"recursive" => true,
"operation-headers" => {
"access-mechanism" => "HTTP",
"caller-type" => "user"
},
"recursive-depth" => undefined,
"proxies" => undefined,
"include-defaults" => undefined
}
The ReadResourceHandlers seem to get getting every batch job in the deployment and for each it'll call
OperationContextImpl.readResource() passing in this address:
[
("deployment" => "myear.ear"),
("subdeployment" => "common.jar"),
("subsystem" => "batch-jberet"),
("job" => "MyJob")
]
That makes its way down to BatchJobExecutionResource.refreshChildren() which looks like this
private void refreshChildren() {
final List<JobExecution> executions = new ArrayList<>();
final List<JobInstance> instances = jobOperator.getJobInstances(jobName, 0, jobOperator.getJobInstanceCount(jobName));
for (JobInstance instance : instances) {
executions.addAll(jobOperator.getJobExecutions(instance));
}
for (JobExecution execution : executions) {
final String name = Long.toString(execution.getExecutionId());
if (!children.contains(name)) {
children.add(name);
}
}
}
So it first retrieves a list of job instances, then one by one queries for the execution associated with the instance. This really slows down when you can easily have 100k job instances.
I'm not sure the proper solution here but the "recursive-depth" attribute on the operation param looks mighty nice and I'd love to get that set here if it's respected down the line.
(I couldn't figure out how to open a jboss jira ticket, so if you recommend I do that please include a link since I'm apparently an idiot)