|
| 1 | +# Runner Refactor Implementation Plan |
| 2 | + |
| 3 | +This document outlines the implementation plan for the runner refactor described in [the spec](../specs/runner-refactor.md). |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +The refactor transforms `core/src/subgraph/runner.rs` from a complex nested-loop structure into a cleaner state machine with explicit pipeline stages. |
| 8 | + |
| 9 | +## Git Workflow |
| 10 | + |
| 11 | +**Branch**: All work should be committed to the `runner-refactor` branch. |
| 12 | + |
| 13 | +**Commit discipline**: |
| 14 | + |
| 15 | +- Commit work in small, reviewable chunks |
| 16 | +- Each commit should be self-contained and pass all checks |
| 17 | +- Prefer many small commits over few large ones |
| 18 | +- Each commit message should clearly describe what it does |
| 19 | +- Each step in the implementation phases should correspond to one or more commits |
| 20 | + |
| 21 | +**Before each commit**: |
| 22 | + |
| 23 | +```bash |
| 24 | +just format |
| 25 | +just lint |
| 26 | +just test-unit |
| 27 | +just test-runner |
| 28 | +``` |
| 29 | + |
| 30 | +- MANDATORY: Work must be committed, a task is only done when work is committed |
| 31 | +- MANDATORY: Make sure to follow the commit discipline above |
| 32 | +- IMPORTANT: The runner tests produce output in `tests/runner-tests.log`. Use that to investigate failures. |
| 33 | + |
| 34 | +## Implementation Phases |
| 35 | + |
| 36 | +### Phase 1: Extract TriggerRunner Component |
| 37 | + |
| 38 | +**Goal:** Eliminate duplicated trigger processing code (lines 616-656 vs 754-790). |
| 39 | + |
| 40 | +**Files to modify:** |
| 41 | + |
| 42 | +- `core/src/subgraph/runner.rs` - Extract logic |
| 43 | +- Create `core/src/subgraph/runner/trigger_runner.rs` |
| 44 | + |
| 45 | +**Steps:** |
| 46 | + |
| 47 | +1. Create `TriggerRunner` struct with execute method |
| 48 | +2. Replace first trigger loop (lines 616-656) with `TriggerRunner::execute()` |
| 49 | +3. Replace second trigger loop (lines 754-790) with same call |
| 50 | +4. Verify tests pass |
| 51 | + |
| 52 | +**Verification:** |
| 53 | + |
| 54 | +- `just test-unit` passes |
| 55 | +- `just test-runner` passes |
| 56 | +- No behavioral changes |
| 57 | + |
| 58 | +### Phase 2: Define RunnerState Enum |
| 59 | + |
| 60 | +**Goal:** Introduce explicit state machine types without changing control flow yet. |
| 61 | + |
| 62 | +**Files to modify:** |
| 63 | + |
| 64 | +- Create `core/src/subgraph/runner/state.rs` |
| 65 | +- `core/src/subgraph/runner.rs` - Add state field |
| 66 | + |
| 67 | +**Steps:** |
| 68 | + |
| 69 | +1. Define `RunnerState` enum with all variants |
| 70 | +2. Define `RestartReason` and `StopReason` enums |
| 71 | +3. Add `state: RunnerState` field to `SubgraphRunner` |
| 72 | +4. Initialize state in constructor |
| 73 | +5. Verify tests pass (no behavioral changes yet) |
| 74 | + |
| 75 | +**Verification:** |
| 76 | + |
| 77 | +- Code compiles |
| 78 | +- Tests pass unchanged |
| 79 | + |
| 80 | +### Phase 3: Refactor run_inner to State Machine |
| 81 | + |
| 82 | +**Goal:** Replace nested loops with explicit state transitions. |
| 83 | + |
| 84 | +**Files to modify:** |
| 85 | + |
| 86 | +- `core/src/subgraph/runner.rs` - Rewrite `run_inner` |
| 87 | + |
| 88 | +**Steps:** |
| 89 | + |
| 90 | +1. Extract `initialize()` method for pre-loop setup |
| 91 | +2. Extract `await_block()` method for stream event handling |
| 92 | +3. Extract `restart()` method for restart logic |
| 93 | +4. Extract `finalize()` method for cleanup |
| 94 | +5. Rewrite `run_inner` as state machine loop |
| 95 | +6. Remove nested loop structure |
| 96 | +7. Verify tests pass |
| 97 | + |
| 98 | +**Verification:** |
| 99 | + |
| 100 | +- `just test-unit` passes |
| 101 | +- `just test-runner` passes |
| 102 | +- Same behavior, cleaner structure |
| 103 | + |
| 104 | +### Phase 4: Define Pipeline Stages |
| 105 | + |
| 106 | +**Goal:** Break `process_block` into explicit stages. |
| 107 | + |
| 108 | +**Files to modify:** |
| 109 | + |
| 110 | +- Create `core/src/subgraph/runner/pipeline.rs` |
| 111 | +- `core/src/subgraph/runner.rs` - Refactor `process_block` |
| 112 | + |
| 113 | +**Steps:** |
| 114 | + |
| 115 | +1. Extract `match_triggers()` stage method |
| 116 | +2. Extract `execute_triggers()` stage method (uses `TriggerRunner`) |
| 117 | +3. Extract `process_dynamic_data_sources()` stage method |
| 118 | +4. Extract `process_offchain_triggers()` stage method |
| 119 | +5. Extract `persist_block_state()` stage method |
| 120 | +6. Rewrite `process_block` to call stages in sequence |
| 121 | +7. Verify tests pass |
| 122 | + |
| 123 | +**Verification:** |
| 124 | + |
| 125 | +- `just test-unit` passes |
| 126 | +- `just test-runner` passes |
| 127 | +- Same behavior, cleaner structure |
| 128 | + |
| 129 | +### Phase 5: Consolidate Error Handling |
| 130 | + |
| 131 | +**Goal:** Unify scattered error handling into explicit classification. |
| 132 | + |
| 133 | +**Files to modify:** |
| 134 | + |
| 135 | +- `graph/src/components/subgraph/error.rs` (or wherever `ProcessingError` lives) |
| 136 | +- `core/src/subgraph/runner.rs` - Use new error methods |
| 137 | + |
| 138 | +**Steps:** |
| 139 | + |
| 140 | +1. Add `ProcessingErrorKind` enum with Deterministic/NonDeterministic/PossibleReorg variants |
| 141 | +2. Add `kind()` method to `ProcessingError` |
| 142 | +3. Add helper methods: `should_stop_processing()`, `should_restart()`, `is_retryable()` |
| 143 | +4. Replace scattered error checks in `process_block` with unified logic |
| 144 | +5. Replace scattered error checks in dynamic DS handling |
| 145 | +6. Replace scattered error checks in `handle_offchain_triggers` |
| 146 | +7. Document error handling invariants in code comments |
| 147 | +8. Verify tests pass |
| 148 | + |
| 149 | +**Verification:** |
| 150 | + |
| 151 | +- `just test-unit` passes |
| 152 | +- `just test-runner` passes |
| 153 | +- Error behavior unchanged (same semantics, cleaner code) |
| 154 | + |
| 155 | +### Phase 6: Add BlockState Checkpoints |
| 156 | + |
| 157 | +**Goal:** Enable rollback capability with minimal overhead. |
| 158 | + |
| 159 | +**Files to modify:** |
| 160 | + |
| 161 | +- `graph/src/prelude.rs` or wherever `BlockState` is defined |
| 162 | +- `core/src/subgraph/runner.rs` - Use checkpoints |
| 163 | + |
| 164 | +**Steps:** |
| 165 | + |
| 166 | +1. Add `checkpoint()` method to `BlockState` |
| 167 | +2. Add `BlockStateCheckpoint` struct |
| 168 | +3. Add `restore()` method to `BlockState` |
| 169 | +4. Use checkpoint before dynamic DS processing |
| 170 | +5. Verify tests pass |
| 171 | + |
| 172 | +**Verification:** |
| 173 | + |
| 174 | +- `just test-unit` passes |
| 175 | +- No performance regression (checkpoints are lightweight) |
| 176 | + |
| 177 | +### Phase 7: Module Organization |
| 178 | + |
| 179 | +**Goal:** Organize code into proper module structure. |
| 180 | + |
| 181 | +**Files to create/modify:** |
| 182 | + |
| 183 | +- `core/src/subgraph/runner/mod.rs` |
| 184 | +- Move/organize existing extracted modules |
| 185 | + |
| 186 | +**Steps:** |
| 187 | + |
| 188 | +1. Create `runner/` directory |
| 189 | +2. Move `state.rs`, `pipeline.rs`, `trigger_runner.rs` into it |
| 190 | +3. Update `runner.rs` to re-export from module |
| 191 | +4. Update imports in dependent files |
| 192 | +5. Verify tests pass |
| 193 | + |
| 194 | +**Verification:** |
| 195 | + |
| 196 | +- `just test-unit` passes |
| 197 | +- `just test-runner` passes |
| 198 | +- `just lint` shows no warnings |
| 199 | + |
| 200 | +## Completion Criteria |
| 201 | + |
| 202 | +Each phase is complete when: |
| 203 | + |
| 204 | +1. `just format` - Code is formatted |
| 205 | +2. `just lint` - Zero warnings |
| 206 | +3. `just check --release` - Builds in release mode |
| 207 | +4. `just test-unit` - Unit tests pass |
| 208 | +5. `just test-runner` - Runner tests pass |
| 209 | + |
| 210 | +## Progress Checklist |
| 211 | + |
| 212 | +### Phase 1: Extract TriggerRunner Component |
| 213 | + |
| 214 | +- [ ] Create `TriggerRunner` struct with execute method |
| 215 | +- [ ] Replace first trigger loop (lines 616-656) |
| 216 | +- [ ] Replace second trigger loop (lines 754-790) |
| 217 | +- [ ] Verify tests pass |
| 218 | + |
| 219 | +### Phase 2: Define RunnerState Enum |
| 220 | + |
| 221 | +- [ ] Define `RunnerState` enum with all variants |
| 222 | +- [ ] Define `RestartReason` and `StopReason` enums |
| 223 | +- [ ] Add `state: RunnerState` field to `SubgraphRunner` |
| 224 | +- [ ] Initialize state in constructor |
| 225 | +- [ ] Verify tests pass |
| 226 | + |
| 227 | +### Phase 3: Refactor run_inner to State Machine |
| 228 | + |
| 229 | +- [ ] Extract `initialize()` method |
| 230 | +- [ ] Extract `await_block()` method |
| 231 | +- [ ] Extract `restart()` method |
| 232 | +- [ ] Extract `finalize()` method |
| 233 | +- [ ] Rewrite `run_inner` as state machine loop |
| 234 | +- [ ] Remove nested loop structure |
| 235 | +- [ ] Verify tests pass |
| 236 | + |
| 237 | +### Phase 4: Define Pipeline Stages |
| 238 | + |
| 239 | +- [ ] Extract `match_triggers()` stage method |
| 240 | +- [ ] Extract `execute_triggers()` stage method |
| 241 | +- [ ] Extract `process_dynamic_data_sources()` stage method |
| 242 | +- [ ] Extract `process_offchain_triggers()` stage method |
| 243 | +- [ ] Extract `persist_block_state()` stage method |
| 244 | +- [ ] Rewrite `process_block` to call stages in sequence |
| 245 | +- [ ] Verify tests pass |
| 246 | + |
| 247 | +### Phase 5: Consolidate Error Handling |
| 248 | + |
| 249 | +- [ ] Add `ProcessingErrorKind` enum |
| 250 | +- [ ] Add `kind()` method to `ProcessingError` |
| 251 | +- [ ] Add helper methods (`should_stop_processing()`, `should_restart()`, `is_retryable()`) |
| 252 | +- [ ] Replace scattered error checks in `process_block` |
| 253 | +- [ ] Replace scattered error checks in dynamic DS handling |
| 254 | +- [ ] Replace scattered error checks in `handle_offchain_triggers` |
| 255 | +- [ ] Document error handling invariants |
| 256 | +- [ ] Verify tests pass |
| 257 | + |
| 258 | +### Phase 6: Add BlockState Checkpoints |
| 259 | + |
| 260 | +- [ ] Add `BlockStateCheckpoint` struct |
| 261 | +- [ ] Add `checkpoint()` method to `BlockState` |
| 262 | +- [ ] Add `restore()` method to `BlockState` |
| 263 | +- [ ] Use checkpoint before dynamic DS processing |
| 264 | +- [ ] Verify tests pass |
| 265 | + |
| 266 | +### Phase 7: Module Organization |
| 267 | + |
| 268 | +- [ ] Create `runner/` directory |
| 269 | +- [ ] Move `state.rs`, `pipeline.rs`, `trigger_runner.rs` into it |
| 270 | +- [ ] Update `runner.rs` to re-export from module |
| 271 | +- [ ] Update imports in dependent files |
| 272 | +- [ ] Verify tests pass |
| 273 | +- [ ] `just lint` shows zero warnings |
| 274 | + |
| 275 | +## Notes |
| 276 | + |
| 277 | +- Each phase should be a separate, reviewable PR |
| 278 | +- Phases 1-4 can potentially be combined if changes are small |
| 279 | +- Phase 3 (FSM refactor of run_inner) is the most invasive and should be reviewed carefully |
| 280 | +- Phase 5 (error handling) can be done earlier if it helps simplify other phases |
| 281 | +- Preserve all existing behavior - this is a refactor, not a feature change |
0 commit comments