3 Replies Latest reply on Oct 19, 2017 11:00 AM by traivor

    Does Wildfly distribute the messages of a queue among 2 consuming servers half and half?

    inor

      Hi,

      I'm observing a strange behavior with MDB processing queue messages.

      In my application, a wildfly 10 server instance (i'll call it the "main server") breaks up a job submitted to it, into smaller homogenious tasks.

      It then sends the task id's to its local queue so that multiple threads can process the independent tasks in parallel

      reducing the total time it takes to complete the job.

      The tasks are consumed and processed via an MDB.

       

      When this runs, for a specific job that splits it into 615 tasks, it takes 13:40 minutes.

       

      When we add a second wildfly server (i'll call it the "secondary server"), which connects to the [remote] queue in the main server, and also consumes

      messages via MDB, both servers now process the 615 tasks and complete the job in 26:50 minutes.

      Why does it take 2 servers to complete the job about double the amount of time that it takes 1 server?

       

      Now more details:

      1) The MDB on both is annotated with @Pool("pool-for-JobTaskMDB") which is specified in the standalone-full.xml as

           <strict-max-pool name="pool-for-JobTaskMDB" max-pool-size="10" instance-acquisition-timeout="120" instance-acquisition-timeout-unit="SECONDS"/>

      and

           @ActivationConfigProperty(propertyName = "maxSession", propertyValue = "10")

      2) The processing of the tasks involves DB access. A single DB instance used by both servers, is on the same machine as the main server.

      3) It turns out that, on the average, a task running on the main server take about 10 seconds to complete and a task

      running on the secondary server take about 60 seconds.

       

      It's not clear to me why it takes 6 times longer to run on the secondary, even if the task is very DB-intensive,

      (I also understand that the queue is local to the main server and remote to the secondary server)

      but let's ignore that for now.

       

      So let's assume that that's a given... that processing the queue message on the secondary server takes 60 seconds, vs. 10 seconds on the main server.

       

      So when running with both servers, I would expect that:

      A) in the time it takes to complete processing all the tasks, the faster/main server would process about 6 times more messages/tasks

      than the slower server.

      B) worse case scenario, is that the last message that is consumed from the queue is consumed by the secondary server,

      and it would take an extra minute.

       

      But what I found, to my astonishment, when looking at the results, is that:

      1) contrary to my expectation A above,

           only 315 messages/tasks were processed by the faster/main server and

           300 messages/tasks were processed by the slower/secondary!

            why?

      2) Part of processing the task is logging the start time. Looking at the start times, i discovered

          that during the last 16 minutes of the job, no tasks were processed by the main server!

           why?

       

      So my theory, and i hope I'm wrong, or that this may be controlled via configuration, is this:

      The 615 tasks of the queue were divided up front among the 2 servers, and now each server was assigned to and processed about

      300 or so tasks (and since 300 tasks are processed by the 10 threads in the secondary server in about 30 rounds,

      where each round takes 1 minute, it comes out to the total of about 30 minutes!)

      Had the servers consumed tasks, more or less, based on availability, (and my expectation A been met) I would have expected the job to be completed

      in less than 9 minutes!

       

      Is there a way to configure the 2 servers to consume a message only when there is a thread available to do so,

      and not up front take half of the queue?

       

      Finally, only if it matters:

      The main server MDB uses the default resource adapter.

      The secondary server MDB uses a pooled-connection-factory using an http-connector, and it does not use jndi lookup.

       

      thanks, i would really appreciate some insight on this.