Skip to content

Conversation

@1nf0rmagician
Copy link
Member

Summary

This PR will combine multiple changes required to replicate and resolve issues in the SeamlessScheduler .

Don't update activities if state decreases

Previously the state machine of the activity was kept directional by the UpdateActivity method. However, the activity data was updated even though the method blocked the state update. This caused ActivityCompleted sessions to be overriden by ActivityStart sessions during result processing, leading to InvalidCastExceptions further down the road.

Behaviour change:

  • We now recieve ActivityUpdated events for redispatched activities. This is a positive side effect as the resource was changed in any case.
  • We now update the started time stamp of an activity on redispatching to a new resource
  • Checking and updating the activity state is now protected against race conditions by a single surounding lock

Ready jobs not correctly distributed in slots

The loop over ready jobs did not update the last job that was considered. This caused jobs which would otherwise take on running/completing slot to be considered seperately.
This can have multiple different implications down the line, as we always assumed these jobs to share a slot.

For this we did not find any better minor-conform solution.

IsPrepareOf and IsCleanupOf compare recipes by id

SchedulerExtensions.IsPrepareOf and SchedulerExtensions.IsCleanupOf worked on an object reference equals check.
This check works as long as all jobs are created and completed within the lifetime of the process engine, as they are all handed over through the IJobManagement.Add method then.
After a restart of the process engine this check fails, however.
As the JobManagement uses the RecipeProvider , in the case the ProductManagement directly when reloading recipes and as the product management returns a new instance of a recipe on each call, recipes of restored and newly dispatched jobs cannot be compared with the normal reference equals, but need to be compared by Id.

Refactore the SeamlessScheduler to increase readability

Running jobs without running processes are resumed after restart

After a job has dispatched a process for the first time and the process has switched to the running state, the job switches to the running state as well. From here onwords the job is running, it can happen however, that all running processes in the job finish, while there is still a process to be dispatched. This happens when the first activity has not yet been dispatched, because it is waiting for a RTW. If the process engine restarts in this state, we load a job from the database without processes (as completed processes are not loaded by default). We, therefore, have to check whether the job had already recorded successes or failures, which tell us mediately that the process had been started before.
Since the job cannot be completed though (it would not be loaded in that case), we schedule it to be resumed

Note: If the job never had dispatched a process which switched to running before the restart we cannot schedule it as resumed after the restart. A job in the DispatchedState drops back into the InitialState after the reload and does not store any information to indicate it would be anything but an initial job. Hence, the creation of a seperate setup for a job which would otherwise have been a drop-in follow-up job won't be fixed and is expected behaviour.


ToDo: Update Documentation
ToDo: Update Unit tests for SemalessScheduler

Previously the state machine of the activity was kept directional by the `UpdateActivity` method. However, the activity data was updated even though the method blocked the state update. This caused `ActivityCompleted` sessions to be overriden by `ActivityStart` sessions during result processing, leading to InvalidCastExceptions further down the road.

Behaviour change:
- We now recieve ActivityUpdated events for redispatched activities. This is a positive side effect as the resource was changed in any case.
- We now update the started time stamp of an activity on redispatching to a new resource
- Checking and updating the activity state is now protected against race conditions by a single surounding lock
The loop over ready jobs did not update the last job that was considered. This caused jobs which would otherwise take on running/completing slot to be considered seperately.
This can have multiple different implications down the line, as we always assumed these jobs to share a slot.
Check for an empty list of schedulable jobs before execeuting the other if-clauses, to prevent unnecessary computation.
`SchedulerExtensions.IsPrepareOf` and `SchedulerExtensions.IsCleanupOf` worked on an object reference equals check.
This check works as long as all jobs are created and completed within the lifetime of the process engine, as they are all handed over through the `IJobManagement.Add` method then.
After a restart of the process engine this check fails, however.
As the JobManagement uses the `RecipeProvider` , in the case the ProductManagement directly when reloading recipes and as the product management returns a new instance of a recipe on each call, recipes of restored and newly dispatched jobs cannot be compared with the normal reference equals, but need to be compared by Id.
After a job has dispatched a process for the first time and the process has switched to the running state, the job switches to the running state as well. From here onwords the job is running, it can happen however, that all running processes in the job finish, while there is still a process to be dispatched. This happens when the first activity has not yet been dispatched, because it is waiting for a RTW. If the process engine restarts in this state, we load a job from the database without processes (as completed processes are not loaded by default). We, therefore, have to check whether the job had already recorded successes or failures, which tell us mediately that the process had been started before.
Since the job cannot be completed though (it would not be loaded in that case),  we schedule it to be resumed
@1nf0rmagician 1nf0rmagician added this to the Framework 10.x milestone Dec 19, 2025
@1nf0rmagician 1nf0rmagician self-assigned this Dec 19, 2025
@1nf0rmagician 1nf0rmagician added the bug Something isn't working label Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants