Apply several fixes in for the seamless scheduler #939

1nf0rmagician · 2025-12-19T06:22:47Z

Summary

This PR will combine multiple changes required to replicate and resolve issues in the SeamlessScheduler .

Don't update activities if state decreases

Previously the state machine of the activity was kept directional by the UpdateActivity method. However, the activity data was updated even though the method blocked the state update. This caused ActivityCompleted sessions to be overriden by ActivityStart sessions during result processing, leading to InvalidCastExceptions further down the road.

Behaviour change:

We now recieve ActivityUpdated events for redispatched activities. This is a positive side effect as the resource was changed in any case.
We now update the started time stamp of an activity on redispatching to a new resource
Checking and updating the activity state is now protected against race conditions by a single surounding lock

Ready jobs not correctly distributed in slots

The loop over ready jobs did not update the last job that was considered. This caused jobs which would otherwise take on running/completing slot to be considered seperately.
This can have multiple different implications down the line, as we always assumed these jobs to share a slot.

For this we did not find any better minor-conform solution.

`IsPrepareOf` and `IsCleanupOf` compare recipes by id

SchedulerExtensions.IsPrepareOf and SchedulerExtensions.IsCleanupOf worked on an object reference equals check.
This check works as long as all jobs are created and completed within the lifetime of the process engine, as they are all handed over through the IJobManagement.Add method then.
After a restart of the process engine this check fails, however.
As the JobManagement uses the RecipeProvider , in the case the ProductManagement directly when reloading recipes and as the product management returns a new instance of a recipe on each call, recipes of restored and newly dispatched jobs cannot be compared with the normal reference equals, but need to be compared by Id.

Refactore the `SeamlessScheduler` to increase readability

Running jobs without running processes are resumed after restart

After a job has dispatched a process for the first time and the process has switched to the running state, the job switches to the running state as well. From here onwords the job is running, it can happen however, that all running processes in the job finish, while there is still a process to be dispatched. This happens when the first activity has not yet been dispatched, because it is waiting for a RTW. If the process engine restarts in this state, we load a job from the database without processes (as completed processes are not loaded by default). We, therefore, have to check whether the job had already recorded successes or failures, which tell us mediately that the process had been started before.
Since the job cannot be completed though (it would not be loaded in that case), we schedule it to be resumed

Note: If the job never had dispatched a process which switched to running before the restart we cannot schedule it as resumed after the restart. A job in the DispatchedState drops back into the InitialState after the reload and does not store any information to indicate it would be anything but an initial job. Hence, the creation of a seperate setup for a job which would otherwise have been a drop-in follow-up job won't be fixed and is expected behaviour.

ToDo: Update Documentation
ToDo: Update Unit tests for SemalessScheduler

Previously the state machine of the activity was kept directional by the `UpdateActivity` method. However, the activity data was updated even though the method blocked the state update. This caused `ActivityCompleted` sessions to be overriden by `ActivityStart` sessions during result processing, leading to InvalidCastExceptions further down the road. Behaviour change: - We now recieve ActivityUpdated events for redispatched activities. This is a positive side effect as the resource was changed in any case. - We now update the started time stamp of an activity on redispatching to a new resource - Checking and updating the activity state is now protected against race conditions by a single surounding lock

The loop over ready jobs did not update the last job that was considered. This caused jobs which would otherwise take on running/completing slot to be considered seperately. This can have multiple different implications down the line, as we always assumed these jobs to share a slot.

Check for an empty list of schedulable jobs before execeuting the other if-clauses, to prevent unnecessary computation.

`SchedulerExtensions.IsPrepareOf` and `SchedulerExtensions.IsCleanupOf` worked on an object reference equals check. This check works as long as all jobs are created and completed within the lifetime of the process engine, as they are all handed over through the `IJobManagement.Add` method then. After a restart of the process engine this check fails, however. As the JobManagement uses the `RecipeProvider` , in the case the ProductManagement directly when reloading recipes and as the product management returns a new instance of a recipe on each call, recipes of restored and newly dispatched jobs cannot be compared with the normal reference equals, but need to be compared by Id.

After a job has dispatched a process for the first time and the process has switched to the running state, the job switches to the running state as well. From here onwords the job is running, it can happen however, that all running processes in the job finish, while there is still a process to be dispatched. This happens when the first activity has not yet been dispatched, because it is waiting for a RTW. If the process engine restarts in this state, we load a job from the database without processes (as completed processes are not loaded by default). We, therefore, have to check whether the job had already recorded successes or failures, which tell us mediately that the process had been started before. Since the job cannot be completed though (it would not be loaded in that case), we schedule it to be resumed

1nf0rmagician added 6 commits December 18, 2025 14:41

Early return on no schedulable jobs

853d3ac

Check for an empty list of schedulable jobs before execeuting the other if-clauses, to prevent unnecessary computation.

Refactor SeamlessScheduler and add logging

98b11bc

1nf0rmagician added this to the Framework 10.x milestone Dec 19, 2025

1nf0rmagician requested review from dbeuchler and seveneleven December 19, 2025 06:22

1nf0rmagician self-assigned this Dec 19, 2025

1nf0rmagician added the bug Something isn't working label Dec 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Apply several fixes in for the seamless scheduler #939

Apply several fixes in for the seamless scheduler #939

Uh oh!

1nf0rmagician commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Apply several fixes in for the seamless scheduler #939

Are you sure you want to change the base?

Apply several fixes in for the seamless scheduler #939

Uh oh!

Conversation

1nf0rmagician commented Dec 19, 2025

Summary

Don't update activities if state decreases

Ready jobs not correctly distributed in slots

IsPrepareOf and IsCleanupOf compare recipes by id

Refactore the SeamlessScheduler to increase readability

Running jobs without running processes are resumed after restart

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`IsPrepareOf` and `IsCleanupOf` compare recipes by id

Refactore the `SeamlessScheduler` to increase readability