2 Replies Latest reply on Oct 31, 2016 9:49 AM by dba2

Unexpected behavior behind admin console causes server to hang

jeff.scott Oct 28, 2016 2:14 PM

<tldr>

I've been digging into an issue we've had when a single application was deployed, but not others. Basically when going to look at the application in the admin console the webpage would hang (though you can go to other tabs) and the application itself would slow down a lot. I started logging all sql queries and found out doing that started running thousands of queries like:

SELECT * FROM JOB_EXECUTION WHERE JOBINSTANCEID=? ORDER BY JOBEXECUTIONID

This application is our only one that uses batch so it makes sense that it's the one we'd have problems with- and it has over 100k job_execution rows and seems to query every single one.

</tldr>

Using the standalone's website and going to the Runtime tab ends up triggering

ModelControllerImpl.execute() with an operation param like this:

{

"operation" => "read-children-resources",

"address" => undefined,

"child-type" => "deployment",

"include-runtime" => true,

"recursive" => true,

"operation-headers" => {

"access-mechanism" => "HTTP",

"caller-type" => "user"

"recursive-depth" => undefined,

"proxies" => undefined,

"include-defaults" => undefined

}

The ReadResourceHandlers seem to get getting every batch job in the deployment and for each it'll call

OperationContextImpl.readResource() passing in this address:

[

("deployment" => "myear.ear"),

("subdeployment" => "common.jar"),

("subsystem" => "batch-jberet"),

("job" => "MyJob")

]

That makes its way down to BatchJobExecutionResource.refreshChildren() which looks like this

private void refreshChildren() {

final List<JobExecution> executions = new ArrayList<>();

final List<JobInstance> instances = jobOperator.getJobInstances(jobName, 0, jobOperator.getJobInstanceCount(jobName));

for (JobInstance instance : instances) {

executions.addAll(jobOperator.getJobExecutions(instance));

}

for (JobExecution execution : executions) {

final String name = Long.toString(execution.getExecutionId());

if (!children.contains(name)) {

children.add(name);

}

So it first retrieves a list of job instances, then one by one queries for the execution associated with the instance. This really slows down when you can easily have 100k job instances.

I'm not sure the proper solution here but the "recursive-depth" attribute on the operation param looks mighty nice and I'd love to get that set here if it's respected down the line.

(I couldn't figure out how to open a jboss jira ticket, so if you recommend I do that please include a link since I'm apparently an idiot)

1. Re: Unexpected behavior behind admin console causes server to hang

jamezp Oct 28, 2016 6:55 PM (in response to jeff.scott)

The executions are processed one-by-one because of the way the batch API works. It's possible this could be improved within JBeret to send a chunk of instance id's and get back a chunk of executions. Even with 100k+ jobs that would be slow though.

Just to better understand the use-cases, is there a reason you hold on to that many jobs? Do you have some kind of purge process for them?

With regard to the recursive-depth that won't help us much here. That's meant for how many levels deep recursively read the management model. For example a CLI command like /deployment=batch.war/subsystem=batch-jberet:read-resource(recursive=true, recursve-depth=1, include-runtime=true) would return only one level deep

[standalone@localhost:9990 /] /deployment=batch-chunk.war/subsystem=batch-jberet:read-resource(recursive=true, recursive-depth=1, include-runtime=true)
{
    "outcome" => "success",
    "result" => {
        "job-xml-names" => [
            "partition-chunk.xml",
            "retry-chunk.xml",
            "simple.xml"
        ],
        "job" => {
            "simple" => {
                "instance-count" => 2,
                "job-xml-names" => ["simple.xml"],
                "running-executions" => 0,
                "execution" => {
                    "2" => undefined,
                    "1" => undefined
                }
            },
            "chunkPartition" => {
                "instance-count" => 0,
                "job-xml-names" => [
                    "partition-chunk.xml",
                    "retry-chunk.xml"
                ],
                "running-executions" => 0,
                "execution" => undefined
            }
        }
    }
}

I filed a JIRA for this as I do think this is an issue. I don't have a great solution at the moment, but put down a couple ideas in the JIRA.

James R. Perkins

2. Re: Unexpected behavior behind admin console causes server to hang

dba2 Oct 31, 2016 9:49 AM (in response to jamezp)

Thanks, that really puts things in perspective and prevents me from upping my recursive-depth on that red herring.

This is somewhat legacy code and the only reason we keep the old data is to display it in a dashboard we wrote. I'll just write something to move older data to archive tables, then use views to union the original and archive tables together and have the dashboard use the views instead of the original tables. Thanks again,

Jeff
Actions

Go to original post