2 Replies Latest reply on Oct 31, 2016 9:49 AM by dba2

    Unexpected behavior behind admin console causes server to hang

    jeff.scott

      <tldr>

      I've been digging into an issue we've had when a single application was deployed, but not others.  Basically when going to look at the application in the admin console the webpage would hang (though you can go to other tabs) and the application itself would slow down a lot.  I started logging all sql queries and found out doing that started running thousands of queries like:

       

      SELECT * FROM JOB_EXECUTION WHERE JOBINSTANCEID=? ORDER BY JOBEXECUTIONID

       

      This application is our only one that uses batch so it makes sense that it's the one we'd have problems with- and it has over 100k job_execution rows and seems to query every single one.

      </tldr>

       

      Using the standalone's website and going to the Runtime tab ends up triggering

       

      ModelControllerImpl.execute() with an operation param like this:

       

       

      {

          "operation" => "read-children-resources",

          "address" => undefined,

          "child-type" => "deployment",

          "include-runtime" => true,

          "recursive" => true,

          "operation-headers" => {

              "access-mechanism" => "HTTP",

              "caller-type" => "user"

          },

          "recursive-depth" => undefined,

          "proxies" => undefined,

          "include-defaults" => undefined

      }

       

       

      The ReadResourceHandlers seem to get getting every batch job in the deployment and for each it'll call

       

       

      OperationContextImpl.readResource() passing in this address:

       

       

      [

          ("deployment" => "myear.ear"),

          ("subdeployment" => "common.jar"),

          ("subsystem" => "batch-jberet"),

          ("job" => "MyJob")

      ]

       

       

      That makes its way down to BatchJobExecutionResource.refreshChildren() which looks like this

       

      private void refreshChildren() {

              final List<JobExecution> executions = new ArrayList<>();

              final List<JobInstance> instances = jobOperator.getJobInstances(jobName, 0, jobOperator.getJobInstanceCount(jobName));

              for (JobInstance instance : instances) {

                  executions.addAll(jobOperator.getJobExecutions(instance));

              }

              for (JobExecution execution : executions) {

                  final String name = Long.toString(execution.getExecutionId());

                  if (!children.contains(name)) {

                      children.add(name);

                  }

              }

          }

       

       

      So it first retrieves a list of job instances, then one by one queries for the execution associated with the instance.  This really slows down when you can easily have 100k job instances.

       

      I'm not sure the proper solution here but the "recursive-depth" attribute on the operation param looks mighty nice and I'd love to get that set here if it's respected down the line.

       

      (I couldn't figure out how to open a jboss jira ticket, so if you recommend I do that please include a link since I'm apparently an idiot)

        • 1. Re: Unexpected behavior behind admin console causes server to hang
          jamezp

          The executions are processed one-by-one because of the way the batch API works. It's possible this could be improved within JBeret to send a chunk of instance id's and get back a chunk of executions. Even with 100k+ jobs that would be slow though.

           

          Just to better understand the use-cases, is there a reason you hold on to that many jobs? Do you have some kind of purge process for them?

           

          With regard to the recursive-depth that won't help us much here. That's meant for how many levels deep recursively read the management model. For example a CLI command like /deployment=batch.war/subsystem=batch-jberet:read-resource(recursive=true, recursve-depth=1, include-runtime=true) would return only one level deep

           

          [standalone@localhost:9990 /] /deployment=batch-chunk.war/subsystem=batch-jberet:read-resource(recursive=true, recursive-depth=1, include-runtime=true)
          {
              "outcome" => "success",
              "result" => {
                  "job-xml-names" => [
                      "partition-chunk.xml",
                      "retry-chunk.xml",
                      "simple.xml"
                  ],
                  "job" => {
                      "simple" => {
                          "instance-count" => 2,
                          "job-xml-names" => ["simple.xml"],
                          "running-executions" => 0,
                          "execution" => {
                              "2" => undefined,
                              "1" => undefined
                          }
                      },
                      "chunkPartition" => {
                          "instance-count" => 0,
                          "job-xml-names" => [
                              "partition-chunk.xml",
                              "retry-chunk.xml"
                          ],
                          "running-executions" => 0,
                          "execution" => undefined
                      }
                  }
              }
          }
          

           

          I filed a JIRA for this as I do think this is an issue. I don't have a great solution at the moment, but put down a couple ideas in the JIRA.

           

          --

          James R. Perkins

          • 2. Re: Unexpected behavior behind admin console causes server to hang
            dba2

            Thanks, that really puts things in perspective and prevents me from upping my recursive-depth on that red herring.

             

            This is somewhat legacy code and the only reason we keep the old data is to display it in a dashboard we wrote.  I'll just write something to move older data to archive tables, then use views to union the original and archive tables together and have the dashboard use the views instead of the original tables.  Thanks again,

             

            Jeff