Has anyone found a reliable pattern or best practice to allow sequential execution with queuing of time-based triggered Orchestrator jobs, while still enforcing automatic timeouts for hung jobs?
We recently added Stop After and Kill After to our time-based triggers, but it seems to break the queuing behavior: if one process is running, the next scheduled job is immediately stopped instead of waiting in the queue.
For example:
Process A runs hourly (on the hour)
Process Z runs every 6 hours (at 45 minutes past the hour)
If Process Z takes longer than 15 minutes, Process A should be queued and start as soon as the robot is free. Previously, this worked — we only allow one process to run at a time, so overlapping processes were naturally queued. However, after adding the Stop/Kill After settings to handle occasional hangs, Process A is now stopped instantly whenever it overlaps with Process Z (no logs, and start/end timestamps are identical).
We already use queue-based triggers for most of our processes, but the time-based triggers are needed as a fallback mechanism for late transactions.
If anyone has found a way to allow sequential execution with queuing for time-based triggers while still enforcing automatic timeouts, we would greatly appreciate hearing how you handled it.
For Process A in my example, the trigger is set to Stop After 5 minutes and Kill After 6 minutes.
The job log shows a start time of 1:05:49 AM and a stop time of 1:05:49 AM. It looks like the job is being stopped based on the Stop After timer relative to the scheduled time, rather than waiting for the robot to become available. In this example, Process Z finished at 1:13:54 AM, but Process A never queued to run afterward.
Have you provided the stop after for both the jobs A and Z?
Ideally if process Z is the one eating up Process A time frequently, stop after needs to be there mainly for process Z.
and either no stop after for process A or adjust it such that even on the days process Z overlaps with process A, it still is able to run process A instead of not even starting.
For Process Z, we currently have Stop After 1 hour and Kill After 1 hour 5 minutes, based on historic runtimes. I could shorten the timeout to 15 minutes, but then Process Z might not finish.
On the other hand, if I remove the Stop/Kill settings, Process A queues perfectly and runs after Process Z finishes — that worked well, except in cases where a process hangs.
I didn’t realize that these two requirements — enforcing timeouts and preserving queuing for overlapping time-based triggers — seem to be mutually exclusive. This is exactly the dilemma we’re trying to solve.
Per my understanding, stop timer starts when the job is queued for that process not from when it actually starts running thats why you see start time and stop time as same in this case. It would have been queued 5 mins ago.
How long does the process A usually take to run?
How about you increase the stop/kill after time for process A little closer to the time it usually takes to finish and bridging the gap that it dont keep overlapping process Z thus cascading the dealy to every process afterwards.
Thanks for the suggestion. I understand that increasing the Stop/Kill times for Process A would give it a larger window to start if it overlaps with Process Z. While this might reduce the chance of it being killed immediately, it doesn’t fully solve the underlying issue: we want both (1) strict timeouts to handle hung jobs and (2) reliable sequential execution with queuing. With time-based triggers, these two requirements appear to be mutually exclusive.