9 Replies Latest reply on Nov 30, 2017 11:36 AM by pferraro

Scheduled timers executed many times after restart

lagoria Nov 29, 2017 3:21 AM

I am using WF11, but the very same problem was in WF10 as well.

I have a bunch of timers, configured through @Scheduled annotation. Some of them trigger every 10 seconds.

If I stop the server for sometime (this happens especially in my dev environment where I shutdown the server often, but still applies to production in case of downtime) and restart, WF tries to execute the timer not just once, but for all the lost time while it was down. This behaviour doesn't make sense to me, it should trigger just once.

The timer is a database persistent timer (<database-data-store>). And my profile is standalone-full-ha.xml.

1. Re: Scheduled timers executed many times after restart

pferraro Nov 29, 2017 8:33 AM (in response to lagoria)

This behavior is dictated by §13.2 of the EJB specification:
"In the event of a container crash or container shutdown, the timeout callback method for a persistent timer that has not been cancelled will be invoked on a new JVM when the container is restarted or on another JVM instance across which the container is distributed. This rule applies to both programmatically or automatically created persistent timers."
Actions
2. Re: Scheduled timers executed many times after restart

lagoria Nov 29, 2017 8:41 AM (in response to pferraro)

Sorry, but if I have a timer scheduled to run every 10 seconds:

@Schedule(second = "*/10", minute = "*", hour = "*", persistent = true)

and if the AS goes down for an hour, I don't think the spec meant to say it should run 360 times at startup in a very short time probably bombing the database with hundreds of useless queries, etc.
I think it meant to say it should be run only once.

If not, could be an option to add a flag to WF to diverge from the spec?
Actions
3. Re: Scheduled timers executed many times after restart

pferraro Nov 29, 2017 9:50 AM (in response to lagoria)
If you don't want missed timer events to execute when the container is restarted, then, by definition, you should use a non-persistent timer. In general, persistent timers are not appropriate for short interval timers.
e.g.
@Schedule(second = "*/10", minute = "*", hour = "*", persistent = false)
Actions
4. Re: Scheduled timers executed many times after restart

lagoria Nov 29, 2017 9:59 AM (in response to pferraro)

Good point, but is a non persistent timer ensured to be executed only once in a cluster?

Use case is sending emails. A short interval is needed to send emails almost immediately, but on the other hand, I don't want the same timer to be triggered in two nodes of the cluster.
Actions
5. Re: Scheduled timers executed many times after restart

wdfink Nov 29, 2017 4:47 PM (in response to lagoria)

Persistent timers are cluster aware. All non persistent timers are executed on the node where started.
That mean a programtic timer will be executed on that node where created.
For a @Schedule timer that mean on every node where the bean is deployed.

From my perpective the EJB spec pferraro mentioned should ensure a missed timer will be executed, but not as often as missed. The spec might be not specific enough in this case (like others)
I would take into account that the timeout is not executed concurrent! That mean if, let's assume the timeout is missed for 10 times,
the timer is fired first and executed
the other 9 invocations are rejected because the timer is current running.
If you have more executions which take lomger to schedule a second timeout might be executed if the first execution has been completed.

So in fact the multiple executions are useless.

You might open a feature request to change the behaviour, or make it configurable.
pferraro what do you think?
Actions
6. Re: Scheduled timers executed many times after restart

lagoria Nov 29, 2017 5:36 PM (in response to wdfink)

Yes, that was exactly my understanding.

Also, I haven't looked very thoroughly but I guess it's already skipping overlapping timers. The problem is that if I have 100 missed runs, it picks the first attempt and executes it. In the meanwhile it skips some other runs, let's say 10 of them (because they are overlapping). But after the first run, there are 90s more to be executed. Etc, etc.
I believe this is inconvenient. Also timers logic should take into account the possibility to be skipped, and do the work the next time. At least I think this is the most useful behaviour 99% of the time.

Actually, I think I can find a workaround: set these short timers as non persistent, and switch them on/off (flag needed here) using a MSC service to coordinate the nodes in the cluster and ensure there is only one executing at a time. Problem arises in case of a network partition and both nodes believe to be the only one executing . Amen.
Actions
7. Re: Scheduled timers executed many times after restart

wdfink Nov 30, 2017 3:39 AM (in response to lagoria)

lagoria wrote:

Yes, that was exactly my understanding.

Also, I haven't looked very thoroughly but I guess it's already skipping overlapping timers. The problem is that if I have 100 missed runs, it picks the first attempt and executes it. In the meanwhile it skips some other runs, let's say 10 of them (because they are overlapping). But after the first run, there are 90s more to be executed. Etc, etc.
That's what I described in my explanation.
Another approach is to use the HASingleton approach, there is a quickstart which shows it for scheduled timers. Older versions need to create a SingletonService.
For WF11 you can use the singleton-deployement.xml descriptor to mark the application as singleton.
You get rid of the extra MSC code
But in fact you still have an issue with the cluster splitts
Actions
8. Re: Scheduled timers executed many times after restart

wdfink Nov 30, 2017 3:56 AM (in response to lagoria)

I've created [WFLY-9586] Persistent EJB timers should resume only once if missed multiple times to discuss this further
Actions
9. Re: Scheduled timers executed many times after restart

pferraro Nov 30, 2017 11:36 AM (in response to wdfink)

wdfink wrote:
For WF11 you can use the singleton-deployement.xml descriptor to mark the application as singleton.
You get rid of the extra MSC code
But in fact you still have an issue with the cluster splitts
You can configure a quorum to help mitigate the network partition problem - but depending on the level of fragmentation, this can easily result in unavailability of the timers.
Actions

Go to original post