feat: stop persisting executor metadata #1291

seriousben · 2025-03-18T21:15:53Z

Context

As part of changing the protocol between executors and servers, it has become obvious that persisting executors is not required and adds complexity.

What

This PR moves executor to only live in memory. Additionally, we look at persisted allocations to determine whether some allocations were abandoned when the server restarts for executors that did not reconnect within 30s.

The graph processor is also now running within its own thread.

Testing

Contribution Checklist

If a Python package was changed, please run make fmt in the package directory.
If the server was changed, please run make fmt in server/.
Make sure all PR Checks are passing.

diptanu · 2025-03-19T06:22:11Z

server/processor/src/task_allocator.rs

-                return Ok(SchedulerUpdateRequest {
+            ChangeType::HandleAbandonedAllocations => {
+                // Get all executor IDs from allocation_by_executor that aren't in executor_ids
+                let missing_executors: Vec<String> = indexes


You could probably collect all the allocations that needs to be removed in this iteration. Pass the allocation down to the unallocate function instead of the executor ids to remove 2 x iteration.

There is the map called allocations_by_fn which is executor id -> fn_name -> Allocation we should use that map instead of allocations_by_executor and try to remove this map altogether. Its not used anywhere else I believe

Allocation we should use that map instead of allocations_by_executor and try to remove this map altogether

I removed allocations_by_executor in my PR. Thanks for mentioning it. the /internal/allocations also returns fn allocations now.

Pass the allocation down to the unallocate function instead of the executor ids to remove 2 x iteration.

I refactored things a lot more to allocate on deregistration. Take a look. I kept the 2 loop for now, but it only happens on startup and allows us to keep the same code for deregistration and handling of abandoned allocations.

diptanu

We should consider using threadpools at some point to run the task allocator and task creators to not block the Tokio tasks.

https://docs.rs/tokio-threadpool/0.1.18/tokio_threadpool/struct.ThreadPool.html

diptanu · 2025-03-19T08:37:01Z

server/processor/src/task_allocator.rs

            _ => {
                error!("unhandled change type: {:?}", change);
                return Err(anyhow!("unhandled change type"));
            }
        }
    }

+    pub fn unallocate(


We could also try to re-allocate the tasks if there are any executors available. This will save some back and forth between state machine and task allocator.

Done! I also reduce the number of state changes happening for executor removal and also fixed an issue where updated_tasks could contain the same task more than once.

diptanu

The PR is looking good. Left a comment about not requiring an additional state change type.

diptanu · 2025-03-20T07:06:12Z

server/processor/src/graph_processor.rs

@@ -238,7 +235,8 @@ impl GraphProcessor {
            }
            ChangeType::ExecutorAdded(_) |
            ChangeType::ExecutorRemoved(_) |
-            ChangeType::TombStoneExecutor(_) => {
+            ChangeType::TombStoneExecutor(_) |
+            ChangeType::HandleAbandonedAllocations => {


Do we need this new state here? If you did the reconciliation of which executors are not available anymore in the executor manager, you could simply write new Deregister(or delete, forget the request name) Executor requests into the state machine, and that would follow the path of tombstone.

This is not batching anymore, but should be fine.

diptanu · 2025-03-20T07:07:44Z

server/processor/src/task_allocator.rs

-                    }
+            ChangeType::HandleAbandonedAllocations => {
+                // Get all executor IDs of executors that haven't registered.
+                let missing_executor_ids: Vec<String> = indexes


See the comment above. You would do this logic in executor manager.

Thanks, addressed

completion

feat: stop persisting executor metadata

9d20c77

diptanu requested changes Mar 19, 2025

View reviewed changes

diptanu previously requested changes Mar 19, 2025

View reviewed changes

feat: reallocate tasks on deregistration

0858bd2

seriousben force-pushed the seriousben/move-executors-in-memory branch from dc138ba to 0858bd2 Compare March 20, 2025 01:31

diptanu reviewed Mar 20, 2025

View reviewed changes

feat: reuse deregister executor to remove lapsed executors

4014354

seriousben force-pushed the seriousben/move-executors-in-memory branch from e0ee000 to 4014354 Compare March 20, 2025 15:07

fix: stop logging an error if tasks arrive post invocation ctx

8a3a095

completion

seriousben marked this pull request as ready for review March 20, 2025 18:20

seriousben merged commit 74c0999 into main Mar 20, 2025
10 checks passed

seriousben deleted the seriousben/move-executors-in-memory branch March 20, 2025 18:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: stop persisting executor metadata #1291

feat: stop persisting executor metadata #1291

seriousben commented Mar 18, 2025 •

edited

Loading

diptanu Mar 19, 2025

diptanu Mar 19, 2025

seriousben Mar 19, 2025 •

edited

Loading

diptanu left a comment

diptanu Mar 19, 2025

seriousben Mar 19, 2025

diptanu left a comment

diptanu Mar 20, 2025 •

edited

Loading

seriousben Mar 20, 2025

seriousben Mar 20, 2025

diptanu Mar 20, 2025

seriousben Mar 20, 2025

feat: stop persisting executor metadata #1291

feat: stop persisting executor metadata #1291

Conversation

seriousben commented Mar 18, 2025 • edited Loading

Context

What

Testing

Contribution Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seriousben Mar 19, 2025 • edited Loading

Choose a reason for hiding this comment

diptanu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

diptanu left a comment

Choose a reason for hiding this comment

diptanu Mar 20, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seriousben commented Mar 18, 2025 •

edited

Loading

seriousben Mar 19, 2025 •

edited

Loading

diptanu Mar 20, 2025 •

edited

Loading