5 Replies Latest reply on Jan 15, 2007 8:31 AM by Dimitrios Andreadis

    JMX Timer Performance & Strange Issues

    Stratos Pavlakis Newbie

      Hi all,

      I have a really complex issue regarding the JMX timer and i could use some help. Based on the architecture of the Simple Schedule Provider, I developed a version of a job scheduling system for academic purposes.

      This system consists of:

      - an mbean provider which listens to a session EJB for registering job schedules at runtime.
      - a manager which contains the JMX Timer for triggering those schedules
      - a listener class whose instances catch the triggering events.
      - a JMS queue where we put the objects describing the logic of each schedule in order to execute it.
      - an MDB whose instances consume the objects in the queue and execute their corresponding logic.

      When the timer fires a schedule, the listener catches the event and places an object message into the queue. An MDB instance is then used by the container to consume the message by executing an interface predefined method on the enclosed object.

      This system makes use of the JbossCache 1.4 (TreeCache) for persistence purposes because it is supposed to be much faster than a traditional JDBC storage engine. Every time a message object is placed in the queue, this information is stored in the cache. Its meaning is that a job's repetition is under execution.

      Our scheduling system can also work in a clustered Jboss environment, where the Provider , Manager (including JMX Timer) and JMS Queue act as singletons working on a single node while several instantiated MDBs on every node of the cluster are ready to consume messages pulled from the JMS queue through the cluster-wide JNDI. The cache also works in synchronous replication mode. In this way we want to achieve a load balancing regarding the load that comes from the processing of the actual logic of the various scheduled jobs.

      I wont refer to other issues as fail-over etc cause they are irrelevant with my current subject but whoever wants more info on this can mail me.

      Anyway, this system works just fine under normal conditions (delays lower than 100ms). However, when we started stress testing it (more than 200 -300 job executions per second) we noticed several strange issues.

      I will demonstrate them through an actual example. Our jobs where simple java classes inserting a row per each execution into an in-memory JDBC table (mysql). Each row consists of timestamps (System.getTimeInMillis()) pulled at certain points of the executing process, starting at the Timer trigger firing and ending at the time the actual job started running.

      Well, having these data, we noticed that delays of several seconds (up to 1 minute) occurred solely because of the JMX Timer.

      row example:

      shouldFireAt: 100000ms
      FiredAt: 120000ms;
      CachedAt: 120010ms;
      MDB consumed at : 120050ms;
      Started executing : 120051ms;

      As far as we know the java.util.Timer can support much more than 200-300 task executions per second, jmx timer doesnt? Do these JMX Timer delays make sense to you? Keep in mind that the cpu and memory usage never exceeded 70%.

      And another even more strange issue is that when working in a cluster, the delays under the same stress tests, go 300% higher! And still the only reason is the JMX timer and not the cache synchronous replication, which is absolutely crazy. How can that be? The timer is used as singleton and he has no knowledge himself that he is working in a cluster. Shouldnt i notice exactly the same FiredAt - shouldFireAt delays as if I worked in a single node environment?

      I am ready to use Jprofiler to find some answers, but i am pretty sure that its the JMX Timer's bad performance that degrades the whole system's performance.

      I could use some advise here.
      thx in advance.