Skip to content

Commit aaea83f

Browse files
committed
docs: Add subgraph runner refactor spec and implementation plan
Introduces architecture spec for simplifying core/src/subgraph/runner.rs: - Enum-based FSM for full runner lifecycle - Pipeline stages for block processing - TriggerRunner component to eliminate code duplication - Checkpoint-based state management for rollback
1 parent 3ff7cc1 commit aaea83f

File tree

2 files changed

+661
-0
lines changed

2 files changed

+661
-0
lines changed

docs/plans/runner-refactor.md

Lines changed: 281 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,281 @@
1+
# Runner Refactor Implementation Plan
2+
3+
This document outlines the implementation plan for the runner refactor described in [the spec](../specs/runner-refactor.md).
4+
5+
## Overview
6+
7+
The refactor transforms `core/src/subgraph/runner.rs` from a complex nested-loop structure into a cleaner state machine with explicit pipeline stages.
8+
9+
## Git Workflow
10+
11+
**Branch**: All work should be committed to the `runner-refactor` branch.
12+
13+
**Commit discipline**:
14+
15+
- Commit work in small, reviewable chunks
16+
- Each commit should be self-contained and pass all checks
17+
- Prefer many small commits over few large ones
18+
- Each commit message should clearly describe what it does
19+
- Each step in the implementation phases should correspond to one or more commits
20+
21+
**Before each commit**:
22+
23+
```bash
24+
just format
25+
just lint
26+
just test-unit
27+
just test-runner
28+
```
29+
30+
- MANDATORY: Work must be committed, a task is only done when work is committed
31+
- MANDATORY: Make sure to follow the commit discipline above
32+
- IMPORTANT: The runner tests produce output in `tests/runner-tests.log`. Use that to investigate failures.
33+
34+
## Implementation Phases
35+
36+
### Phase 1: Extract TriggerRunner Component
37+
38+
**Goal:** Eliminate duplicated trigger processing code (lines 616-656 vs 754-790).
39+
40+
**Files to modify:**
41+
42+
- `core/src/subgraph/runner.rs` - Extract logic
43+
- Create `core/src/subgraph/runner/trigger_runner.rs`
44+
45+
**Steps:**
46+
47+
1. Create `TriggerRunner` struct with execute method
48+
2. Replace first trigger loop (lines 616-656) with `TriggerRunner::execute()`
49+
3. Replace second trigger loop (lines 754-790) with same call
50+
4. Verify tests pass
51+
52+
**Verification:**
53+
54+
- `just test-unit` passes
55+
- `just test-runner` passes
56+
- No behavioral changes
57+
58+
### Phase 2: Define RunnerState Enum
59+
60+
**Goal:** Introduce explicit state machine types without changing control flow yet.
61+
62+
**Files to modify:**
63+
64+
- Create `core/src/subgraph/runner/state.rs`
65+
- `core/src/subgraph/runner.rs` - Add state field
66+
67+
**Steps:**
68+
69+
1. Define `RunnerState` enum with all variants
70+
2. Define `RestartReason` and `StopReason` enums
71+
3. Add `state: RunnerState` field to `SubgraphRunner`
72+
4. Initialize state in constructor
73+
5. Verify tests pass (no behavioral changes yet)
74+
75+
**Verification:**
76+
77+
- Code compiles
78+
- Tests pass unchanged
79+
80+
### Phase 3: Refactor run_inner to State Machine
81+
82+
**Goal:** Replace nested loops with explicit state transitions.
83+
84+
**Files to modify:**
85+
86+
- `core/src/subgraph/runner.rs` - Rewrite `run_inner`
87+
88+
**Steps:**
89+
90+
1. Extract `initialize()` method for pre-loop setup
91+
2. Extract `await_block()` method for stream event handling
92+
3. Extract `restart()` method for restart logic
93+
4. Extract `finalize()` method for cleanup
94+
5. Rewrite `run_inner` as state machine loop
95+
6. Remove nested loop structure
96+
7. Verify tests pass
97+
98+
**Verification:**
99+
100+
- `just test-unit` passes
101+
- `just test-runner` passes
102+
- Same behavior, cleaner structure
103+
104+
### Phase 4: Define Pipeline Stages
105+
106+
**Goal:** Break `process_block` into explicit stages.
107+
108+
**Files to modify:**
109+
110+
- Create `core/src/subgraph/runner/pipeline.rs`
111+
- `core/src/subgraph/runner.rs` - Refactor `process_block`
112+
113+
**Steps:**
114+
115+
1. Extract `match_triggers()` stage method
116+
2. Extract `execute_triggers()` stage method (uses `TriggerRunner`)
117+
3. Extract `process_dynamic_data_sources()` stage method
118+
4. Extract `process_offchain_triggers()` stage method
119+
5. Extract `persist_block_state()` stage method
120+
6. Rewrite `process_block` to call stages in sequence
121+
7. Verify tests pass
122+
123+
**Verification:**
124+
125+
- `just test-unit` passes
126+
- `just test-runner` passes
127+
- Same behavior, cleaner structure
128+
129+
### Phase 5: Consolidate Error Handling
130+
131+
**Goal:** Unify scattered error handling into explicit classification.
132+
133+
**Files to modify:**
134+
135+
- `graph/src/components/subgraph/error.rs` (or wherever `ProcessingError` lives)
136+
- `core/src/subgraph/runner.rs` - Use new error methods
137+
138+
**Steps:**
139+
140+
1. Add `ProcessingErrorKind` enum with Deterministic/NonDeterministic/PossibleReorg variants
141+
2. Add `kind()` method to `ProcessingError`
142+
3. Add helper methods: `should_stop_processing()`, `should_restart()`, `is_retryable()`
143+
4. Replace scattered error checks in `process_block` with unified logic
144+
5. Replace scattered error checks in dynamic DS handling
145+
6. Replace scattered error checks in `handle_offchain_triggers`
146+
7. Document error handling invariants in code comments
147+
8. Verify tests pass
148+
149+
**Verification:**
150+
151+
- `just test-unit` passes
152+
- `just test-runner` passes
153+
- Error behavior unchanged (same semantics, cleaner code)
154+
155+
### Phase 6: Add BlockState Checkpoints
156+
157+
**Goal:** Enable rollback capability with minimal overhead.
158+
159+
**Files to modify:**
160+
161+
- `graph/src/prelude.rs` or wherever `BlockState` is defined
162+
- `core/src/subgraph/runner.rs` - Use checkpoints
163+
164+
**Steps:**
165+
166+
1. Add `checkpoint()` method to `BlockState`
167+
2. Add `BlockStateCheckpoint` struct
168+
3. Add `restore()` method to `BlockState`
169+
4. Use checkpoint before dynamic DS processing
170+
5. Verify tests pass
171+
172+
**Verification:**
173+
174+
- `just test-unit` passes
175+
- No performance regression (checkpoints are lightweight)
176+
177+
### Phase 7: Module Organization
178+
179+
**Goal:** Organize code into proper module structure.
180+
181+
**Files to create/modify:**
182+
183+
- `core/src/subgraph/runner/mod.rs`
184+
- Move/organize existing extracted modules
185+
186+
**Steps:**
187+
188+
1. Create `runner/` directory
189+
2. Move `state.rs`, `pipeline.rs`, `trigger_runner.rs` into it
190+
3. Update `runner.rs` to re-export from module
191+
4. Update imports in dependent files
192+
5. Verify tests pass
193+
194+
**Verification:**
195+
196+
- `just test-unit` passes
197+
- `just test-runner` passes
198+
- `just lint` shows no warnings
199+
200+
## Completion Criteria
201+
202+
Each phase is complete when:
203+
204+
1. `just format` - Code is formatted
205+
2. `just lint` - Zero warnings
206+
3. `just check --release` - Builds in release mode
207+
4. `just test-unit` - Unit tests pass
208+
5. `just test-runner` - Runner tests pass
209+
210+
## Progress Checklist
211+
212+
### Phase 1: Extract TriggerRunner Component
213+
214+
- [ ] Create `TriggerRunner` struct with execute method
215+
- [ ] Replace first trigger loop (lines 616-656)
216+
- [ ] Replace second trigger loop (lines 754-790)
217+
- [ ] Verify tests pass
218+
219+
### Phase 2: Define RunnerState Enum
220+
221+
- [ ] Define `RunnerState` enum with all variants
222+
- [ ] Define `RestartReason` and `StopReason` enums
223+
- [ ] Add `state: RunnerState` field to `SubgraphRunner`
224+
- [ ] Initialize state in constructor
225+
- [ ] Verify tests pass
226+
227+
### Phase 3: Refactor run_inner to State Machine
228+
229+
- [ ] Extract `initialize()` method
230+
- [ ] Extract `await_block()` method
231+
- [ ] Extract `restart()` method
232+
- [ ] Extract `finalize()` method
233+
- [ ] Rewrite `run_inner` as state machine loop
234+
- [ ] Remove nested loop structure
235+
- [ ] Verify tests pass
236+
237+
### Phase 4: Define Pipeline Stages
238+
239+
- [ ] Extract `match_triggers()` stage method
240+
- [ ] Extract `execute_triggers()` stage method
241+
- [ ] Extract `process_dynamic_data_sources()` stage method
242+
- [ ] Extract `process_offchain_triggers()` stage method
243+
- [ ] Extract `persist_block_state()` stage method
244+
- [ ] Rewrite `process_block` to call stages in sequence
245+
- [ ] Verify tests pass
246+
247+
### Phase 5: Consolidate Error Handling
248+
249+
- [ ] Add `ProcessingErrorKind` enum
250+
- [ ] Add `kind()` method to `ProcessingError`
251+
- [ ] Add helper methods (`should_stop_processing()`, `should_restart()`, `is_retryable()`)
252+
- [ ] Replace scattered error checks in `process_block`
253+
- [ ] Replace scattered error checks in dynamic DS handling
254+
- [ ] Replace scattered error checks in `handle_offchain_triggers`
255+
- [ ] Document error handling invariants
256+
- [ ] Verify tests pass
257+
258+
### Phase 6: Add BlockState Checkpoints
259+
260+
- [ ] Add `BlockStateCheckpoint` struct
261+
- [ ] Add `checkpoint()` method to `BlockState`
262+
- [ ] Add `restore()` method to `BlockState`
263+
- [ ] Use checkpoint before dynamic DS processing
264+
- [ ] Verify tests pass
265+
266+
### Phase 7: Module Organization
267+
268+
- [ ] Create `runner/` directory
269+
- [ ] Move `state.rs`, `pipeline.rs`, `trigger_runner.rs` into it
270+
- [ ] Update `runner.rs` to re-export from module
271+
- [ ] Update imports in dependent files
272+
- [ ] Verify tests pass
273+
- [ ] `just lint` shows zero warnings
274+
275+
## Notes
276+
277+
- Each phase should be a separate, reviewable PR
278+
- Phases 1-4 can potentially be combined if changes are small
279+
- Phase 3 (FSM refactor of run_inner) is the most invasive and should be reviewed carefully
280+
- Phase 5 (error handling) can be done earlier if it helps simplify other phases
281+
- Preserve all existing behavior - this is a refactor, not a feature change

0 commit comments

Comments
 (0)