9 Replies Latest reply on Nov 30, 2017 11:36 AM by pferraro

    Scheduled timers executed many times after restart

    lagoria

      I am using WF11, but the very same problem was in WF10 as well.

      I have a bunch of timers, configured through @Scheduled annotation. Some of them trigger every 10 seconds.

       

      If I stop the server for sometime (this happens especially in my dev environment where I shutdown the server often, but still applies to production in case of downtime) and restart, WF tries to execute the timer not just once, but for all the lost time while it was down. This behaviour doesn't make sense to me, it should trigger just once.

       

      The timer is a database persistent timer (<database-data-store>). And my profile is standalone-full-ha.xml.

        • 1. Re: Scheduled timers executed many times after restart
          pferraro

          This behavior is dictated by §13.2 of the EJB specification:

          "In the event of a container crash or container shutdown, the timeout callback method for a persistent timer that has not been cancelled will be invoked on a new JVM when the container is restarted or on another JVM instance across which the container is distributed. This rule applies to both programmatically or automatically created persistent timers."

          • 2. Re: Scheduled timers executed many times after restart
            lagoria

            Sorry, but if I have a timer scheduled to run every 10 seconds:

             

            @Schedule(second = "*/10", minute = "*", hour = "*", persistent = true)

             

            and if the AS goes down for an hour, I don't think the spec meant to say it should run 360 times at startup in a very short time probably bombing the database with hundreds of useless queries, etc.

            I think it meant to say it should be run only once.

             

            If not, could be an option to add a flag to WF to diverge from the spec?

            • 3. Re: Scheduled timers executed many times after restart
              pferraro

              If you don't want missed timer events to execute when the container is restarted, then, by definition, you should use a non-persistent timer.  In general, persistent timers are not appropriate for short interval timers.

              e.g.

              @Schedule(second = "*/10", minute = "*", hour = "*", persistent = false)
              • 4. Re: Scheduled timers executed many times after restart
                lagoria

                Good point, but is a non persistent timer ensured to be executed only once in a cluster?

                 

                Use case is sending emails. A short interval is needed to send emails almost immediately, but on the other hand, I don't want the same timer to be triggered in two nodes of the cluster.

                • 5. Re: Scheduled timers executed many times after restart
                  wdfink

                  Persistent timers are cluster aware. All non persistent timers are executed on the node where started.

                  That mean a programtic timer will be executed on that node where created.

                  For a @Schedule timer that mean on every node where the bean is deployed.

                   

                  From my perpective the EJB spec pferraro mentioned should ensure a missed timer will be executed, but not as often as missed. The spec might be not specific enough in this case (like others)

                  I would take into account that the timeout is not executed concurrent! That mean if, let's assume the timeout is missed for 10 times,

                  the timer is fired first and executed

                  the other 9 invocations are rejected because the timer is current running.

                  If you have more executions which take lomger to schedule a second timeout might be executed if the first execution has been completed.

                   

                  So in fact the multiple executions are useless.

                   

                  You might open a feature request to change the behaviour, or make it configurable.

                  pferraro what do you think?

                  • 6. Re: Scheduled timers executed many times after restart
                    lagoria

                    Yes, that was exactly my understanding.

                     

                    Also, I haven't looked very thoroughly but I guess it's already skipping overlapping timers. The problem is that if I have 100 missed runs, it picks the first attempt and executes it. In the meanwhile it skips some other runs, let's say 10 of them (because they are overlapping). But after the first run, there are 90s more to be executed. Etc, etc.

                    I believe this is inconvenient. Also timers logic should take into account the possibility to be skipped, and do the work the next time. At least I think this is the most useful behaviour 99% of the time.

                     

                    Actually, I think I can find a workaround: set these short timers as non persistent, and switch them on/off (flag needed here) using a MSC service to coordinate the nodes in the cluster and ensure there is only one executing at a time. Problem arises in case of a network partition and both nodes believe to be the only one executing . Amen.

                    • 7. Re: Scheduled timers executed many times after restart
                      wdfink

                      lagoria  wrote:

                       

                      Yes, that was exactly my understanding.

                       

                      Also, I haven't looked very thoroughly but I guess it's already skipping overlapping timers. The problem is that if I have 100 missed runs, it picks the first attempt and executes it. In the meanwhile it skips some other runs, let's say 10 of them (because they are overlapping). But after the first run, there are 90s more to be executed. Etc, etc.

                      That's what I described in my explanation.

                      Another approach is to use the HASingleton approach, there is a quickstart which shows it for scheduled timers. Older versions need to create a SingletonService.

                      For WF11 you can use the singleton-deployement.xml descriptor to mark the application as singleton.

                      You get rid of the extra MSC code

                      But in fact you still have an issue with the cluster splitts

                      • 8. Re: Scheduled timers executed many times after restart
                        wdfink
                        • 9. Re: Scheduled timers executed many times after restart
                          pferraro

                          wdfink  wrote:

                          For WF11 you can use the singleton-deployement.xml descriptor to mark the application as singleton.

                          You get rid of the extra MSC code

                          But in fact you still have an issue with the cluster splitts

                          You can configure a quorum to help mitigate the network partition problem - but depending on the level of fragmentation, this can easily result in unavailability of the timers.