fix(mcp): DRC-3307 fix submit_run race condition + run_check auto-approve#1342
Conversation
Two root-cause fixes for PR comment checklist showing wrong status: 1. submit_run race condition (run_func.py): update_run_result was async and scheduled via run_coroutine_threadsafe, so run.status stayed RUNNING after await future. Made it synchronous — called directly inside the executor thread fn() so run.status and run.result are set BEFORE the future resolves. Removed async sleep workarounds from tests. 2. run_check auto-approve (mcp_server.py): _tool_run_check had no auto-approve logic, so preset checks (created via API before the agent runs) stayed Unapproved forever. Added the same auto-approve policy as _tool_create_check: successful run = Approved (PM decision). DRC-3307 Signed-off-by: Kent <iamcxa@gmail.com>
probe-run-check found that _tool_run_check tests passed but never asserted auto-approve behavior — mock_run.status was MagicMock (not RunStatus.FINISHED), so the auto-approve condition silently evaluated to False. Tests would pass identically if auto-approve lines were deleted. Fix: - Set mock_run.status = RunStatus.FINISHED in row_count_diff test - Assert update_check_by_id(is_checked=True) in all 3 run_check tests (row_count_diff, lineage_diff, schema_diff) DRC-3307 Signed-off-by: Kent <iamcxa@gmail.com>
Codecov Report❌ Patch coverage is
... and 4 files with indirect coverage changes 🚀 New features to boost your workflow:
|
iamcxa
left a comment
There was a problem hiding this comment.
@iamcxa — review of DRC-3307 root-cause fix.
Overall: race-condition fix is correct in spirit and the 80% jaffle_shop_golden integration result is solid C-level evidence. Surfacing 12 inline findings; 2 HIGH worth resolving before merge (cloud exception leak + new cross-thread race window introduced by the sync fix), and a persistence-asymmetry MEDIUM that breaks the "matches _tool_create_check policy" claim. Defaulting to COMMENT because the core fix is sound but several edges need attention.
Verification Matrix (from PR body)
| # | Concern | Status |
|---|---|---|
| V1 | submit_run race fixed — sync update_run_result |
Confirmed at unit + integration. Introduces new smaller race: result-before-status store ordering visible to async-loop readers (#2) |
| V2 | All 6 submit_run callers benefit |
Confirmed; readers may now see transient RUNNING + result-present inconsistency (#2) |
| V3 | _tool_run_check auto-approve matches _tool_create_check policy |
Gate matches; persistence behavior differs — _tool_create_check calls export_persistent_state, _tool_run_check does not (#3, #4) |
| V4 | Removed asyncio.sleep(0.1) workarounds |
Confirmed; only 1 of 3 removed-sleep tests added a regression assertion (advisory B) |
Break-point Coverage
BP1 update_run_result sync (run_func.py:152-178): B (regression test) + C (jaffle_shop_golden 80%)
BP2 _tool_run_check metadata auto-approve: B (2 success tests) — no negative test (#11)
BP3 _tool_run_check task auto-approve: B (1 success test) — no negative test for error path
Residual uncertainty:
- Cross-thread race window (NEW): result-before-status store ordering creates sub-µs window where async-loop readers see
result is not Nonewhilestatus == RUNNING. Symptom likely undetectable in practice but a regression of the previous loop-serialized model. Fix: invert the store order, OR document GIL/CPython assumption (#2). RecceCloudExceptionnot subclass ofRecceException— cloud-modeupdate_check_by_idfailure surfaces unwrapped to MCP caller (#1).- Persistence asymmetry: local-mode SIGTERM after
run_checkbefore any other operation → auto-approve lost (#3). - Cloud-mode
acting_user_id=Nonefor all MCP-triggered auto-approvals — shared-instance audit attribution gap (#6).
Recommended human probe before merge:
- Decide store-order inversion in
update_run_result(status before result/error) OR add explicit GIL/CPython note in docstring. - Move
update_check_by_idoutside the metadatatry(matches_tool_create_check's structure) OR broadenexceptto includeRecceCloudException. - Decide whether to add
export_persistent_stateto_tool_run_checkfor parity with_tool_create_check, OR clarify the comment that persistence is intentionally deferred to MCP shutdown. - Add
mock_check_dao.update_check_by_id.assert_not_called()to the existing RecceException-path test intest_mcp_server.py(cheap regression guard, #11).
Advisory (not posted as inline comments)
| # | File:Line | Root | Summary | Suggested Action |
|---|---|---|---|---|
| A | run_func.py:155-163 | DOC | DRC-3307 historical narrative is load-bearing while consumer recce-cloud-infra#1245 defense exists — its derive_check_run_status workaround cites this docstring as the canonical justification |
Add sunset note: "Safe to remove once recce-cloud-infra defensive guards in derive_check_run_status are removed" (#9) |
| B | tests/apis/test_run_func.py:376, 395 | DOC | 2 of 3 removed-sleep tests don't add regression assertion (run.status == FINISHED); only test_normalized_params_propagate_to_run does |
Decide: add regression assertion to all 3, OR document why one canonical assertion is sufficient |
| C | — | NEW | Cross-repo defense-in-depth pattern: root-cause repo's docstring outlives its own fix because consumer-side defense cites it as canonical justification | Captured in recce-cloud-infra/.claude/review-lessons.md (just created in companion review) |
Verification Summary
| Check | Result |
|---|---|
| ruff (run_func.py + mcp_server.py) | clean |
| Pre-scan persistence asymmetry | 1 finding (export_persistent_state missing in _tool_run_check) |
| code-reviewer | 2 HIGH + 2 LOW |
| comment-analyzer | 1 critical + 5 improvements + 4 rot risks |
| ToB security (standard) | 2 MEDIUM + 2 LOW |
| Unit tests | not executed locally — PR claims 143 pass |
mcp_server.py — restructure _tool_run_check auto-approve:
- Move update_check_by_id OUTSIDE the metadata-branch try/except
RecceException (mirrors _tool_create_check structure). Cloud mode's
CheckDAO re-raises RecceCloudException, which is NOT a RecceException
subclass — leaving the call inside the wrapper would let cloud failures
escape unwrapped, breaking the consistent ValueError contract.
- Add `await loop.run_in_executor(None, export_persistent_state)` after
auto-approve so local-mode is_checked changes survive process exit.
Closes the persistence asymmetry vs _tool_create_check that was
miscalled "matches policy" in the original comment.
- Reframe the auto-approve comment block: drop the consumer-side
"preset checks / ⏳ Unapproved" narrative (cross-repo concept bleed),
add `See: DRC-3307` for PM-decision attribution, and explicitly
document that an empty result IS valid evidence for metadata-only
types (zero-change confirmation).
- Add `logger.info(f"Auto-approved check {check_id} (triggered_by=...)")`
so server logs preserve attribution even when the cloud audit-layer
ContextVar isn't populated (pre-existing scope, but worth surfacing
now that run_check also auto-approves).
run_func.py — preserve CANCELLED sentinel + invert store order:
- update_run_result now writes run.status BEFORE run.result/run.error
so async readers in the loop thread never observe a "result-present
+ status-RUNNING" intermediate.
- Both success and failure paths skip the status overwrite when
run.status == CANCELLED. Previously only the failure path checked
this; the success path unconditionally overwrote CANCELLED with
FINISHED, silently regressing cancel_run from another thread.
- Docstring rewritten: declares the cross-thread store-ordering
invariant explicitly (CPython GIL atomic per attribute, but multi-
store sequences are not serialized with the loop). Added DRC-3307
historical context with a sunset condition tied to the consumer-side
defensive guards in recce-cloud-infra `derive_check_run_status` —
safe to remove this sync-call requirement once those guards go.
Coverage (87.5% patch -> 100%) + tighter regression tags:
- New negative-path test
`test_tool_run_check_metadata_branch_recce_exception_no_auto_approve`:
asserts update_check_by_id and export_persistent_state are NOT called
when _tool_lineage_diff raises RecceException. Closes the metadata
branch coverage gap (codecov-flagged 2 missing lines).
- New `test_cancel_sentinel_preserved_on_success`: simulates
cancel_run setting CANCELLED before the executor finishes, asserts
update_run_result success path does NOT overwrite. Exercises the
newly-added `if run.status != RunStatus.CANCELLED` branch on the
success path.
- New `test_repr_fallback_when_str_error_is_none_string`: verifies
`repr(error)` fallback when `str(error) == "None"` records useful
failed_reason. Closes the run_func.py coverage gap (codecov-flagged
2 missing lines).
- Tighten the three "DRC-3307" tags in test_mcp_server.py to
"Regression: ... (DRC-3307 root cause)" so the meaning survives
Linear-archive of the ticket.
- Reframe the "no sleep needed" test comment as a forward-looking
regression assertion ("if this fails, update_run_result is being
scheduled async again").
Mock additions to existing tests:
- All 5 _tool_run_check call sites in tests now mock
`recce.apis.check_func.export_persistent_state`. Without this mock,
the new persistence call hits a real default_context() and fails
with AttributeError (no state_loader on None).
Tests: 146 passing (was 115; +31 includes prior 28 unrelated suite
additions and 3 new for this PR).
DRC-3307
Signed-off-by: Kent <iamcxa@gmail.com>
All 12 inline threads addressed in commit HIGH (2) — both resolved
MEDIUM (4) — all resolved
LOW + NIT (5+1) — all resolved
Coverage: codecov-flagged 4 missing patch lines closed:
Tests: 146 passing (was 115 + 31 unrelated suite additions during this fix). Manual review handoff per your request — no re-review tagging. |
The narrow except RecceException pattern silently misses cloud-mode failures because RecceCloudException inherits from Exception, not from RecceException. Documenting the rule + two acceptable fixes so future DAO-write additions catch this at review time. DRC-3307 Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Kent <iamcxa@gmail.com>
iamcxa
left a comment
There was a problem hiding this comment.
@iamcxa — re-review after 8385e0a7.
Excellent close-out — would APPROVE if you weren't the author (GitHub blocks self-approve). All 12 v1 inline findings + 2 of 3 advisory items addressed; 10 fully fixed, 2 with documented partial trade-offs. Sign-off equivalent: ready for human reviewer + merge. The new tests (test_cancel_sentinel_preserved_on_success, test_repr_fallback_when_str_error_is_none_string, test_tool_run_check_metadata_branch_recce_exception_no_auto_approve) noticeably expand coverage beyond what v1 review asked for.
v1 → v2 Closure
| # | v1 finding | Status |
|---|---|---|
| 1 | RecceCloudException not caught (HIGH) | ✅ FIXED — auto-approve moved OUTSIDE the try / except RecceException blocks; explicit comment explains the trade-off |
| 2 | Cross-thread store ordering (HIGH) | ✅ FIXED — status BEFORE result/error; CANCELLED preserved; new regression test |
| 3 | export_persistent_state missing (MEDIUM) |
✅ FIXED — await ... run_in_executor(None, export_persistent_state) added |
| 4 | "Same policy" claim misleading (MEDIUM) | ✅ FIXED — rewritten as "same gate as _tool_create_check (line 1832)" + persistence parity stated |
| 5 | Metadata empty result auto-approve (MEDIUM) | ✅ FIXED via doc — "empty result IS valid evidence: zero changes confirms..." |
| 6 | acting_user_id=None audit gap (MEDIUM) |
logger.info(triggered_by=...) added (the minimum I suggested); cloud-mode user-context propagation not done (acceptable, pre-existing pattern) |
| 7 | "preset checks" consumer-side context (LOW) | ✅ FIXED — phrasing removed |
| 8 | PM decision ticket link (LOW) | ✅ FIXED — (See: DRC-3307) added |
| 9 | Load-bearing docstring sunset (LOW) | ✅ FIXED — explicit sunset-condition paragraph added |
| 10 | DRC-3307 tag fade in tests (NIT) | |
| 11 | Negative-path test gap (LOW) | ✅ FIXED — test_tool_run_check_metadata_branch_recce_exception_no_auto_approve added |
| 12 | Forward-looking framing in test comment (NIT) | ✅ FIXED — reframed as regression guard |
| Adv A | Sunset for historical narrative | ✅ FIXED via #9 |
| Adv B | Regression assertion in 2/3 removed-sleep tests | |
| Adv C | Cross-repo review-lessons | Captured in companion repo |
2 minor follow-ups
Neither is blocking. Both are about the new cross-thread race window's residual edges, posted as inline LOW + NIT.
Verification Matrix (final)
| # | Concern | Status |
|---|---|---|
| V1 | submit_run race fixed |
✅ Confirmed; ordering refined; cancel sentinel preserved |
| V2 | All 6 callers benefit | ✅ Confirmed; ordering protects completion-signal readers |
| V3 | _tool_run_check matches _tool_create_check policy |
✅ Confirmed; gate + persistence + comment all aligned |
| V4 | Removed sleep workarounds | ✅ Confirmed + new regression tests added |
Two docstring-only refinements to make the cross-thread guarantees match what the code actually provides: 1. Distinguish completion-signal readers (wait_run_handler polling on `result is not None`) from snapshot readers (single-render reads of both fields). The store-order guarantee only protects the former; the latter may briefly see (FINISHED, result=None) for the sub-µs gap and are expected to re-poll. 2. Document that the cancel-sentinel guard is check-then-assign, not atomic. A sub-µs GIL window remains where cancel_run can flip status between the check and the overwrite. The guard is best-effort against the macroscopic race (cancel before executor enters), which is the practically-relevant case. Note the threading.Lock / threading.Event escape hatch for future callers that need full atomicity. No behavioral change. DRC-3307 Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Kent <iamcxa@gmail.com>
Two v2 LOW/NIT threads addressed in
No behavioral change. Tests still 31/31 on Per your sign-off equivalent — ready for human reviewer + merge. |
iamcxa
left a comment
There was a problem hiding this comment.
@iamcxa — final sign-off after e175c783 + 9eb96f17.
All clear — would APPROVE if you weren't the author (GitHub blocks self-approve). v2 findings closed; bonus project-knowledge capture in CLAUDE.md. Sign-off equivalent: ready for human reviewer + merge.
v2 → v3 Closure
| v2 finding | v3 status |
|---|---|
| LOW (cancel sentinel sub-µs GIL race) | ✅ Addressed via doc — "best-effort against the macroscopic race... use threading.Lock or threading.Event if a future caller needs full atomicity" |
| NIT (snapshot reader docstring nuance) | ✅ Addressed — completion-signal vs snapshot reader distinction documented; sub-µs (FINISHED, result=None) window acknowledged |
Bonus
9eb96f17 adds a Cloud vs Local Mode Exception Types section to CLAUDE.md capturing the cloud-vs-local exception class divergence rule (from v1 finding #1, RecceCloudException). This is exactly the project-knowledge promotion I would have proposed in D2 — captured proactively without prompting. 👍
Closure Summary
| Round | Commit | Inline | Body |
|---|---|---|---|
| v1 | 76eac07 |
12 (2 HIGH + 4 MEDIUM + 6 LOW/NIT) | full review |
| v2 | 8385e0a7 |
2 (1 LOW + 1 NIT) | closure + sign-off |
| v3 | e175c783 |
0 | final sign-off |
Verification Matrix (final)
| # | Concern | Status |
|---|---|---|
| V1 | submit_run race fixed |
✅ sync update_run_result, status-before-result, cancel sentinel preserved |
| V2 | All 6 callers benefit | ✅ ordering protects completion-signal readers |
| V3 | _tool_run_check matches _tool_create_check policy |
✅ gate + persistence + comment all aligned |
| V4 | Removed sleep workarounds | ✅ regression tests added (cancel sentinel + repr fallback + negative-path metadata) |
Ready to ship.
…ested-actions-inconsistency
Code Review — PR #1342 (DRC-3307) — Adversarial Re-reviewSummaryRe-reviewed every change with the assumption it's wrong until proven otherwise. All changes prove valid under scrutiny. No critical, high, or medium-severity bugs found. Two test gaps remain. Adversarial proofs of validityClaim 1:
|
| Caller | File:Line | Reads after await future |
Safe |
|---|---|---|---|
create_run_handler |
run_api.py:66 |
None (returns run to FastAPI) |
✅ |
rerun_check_handler |
check_api.py:182 |
None (returns run) |
✅ |
wait_run_handler |
run_api.py:85 |
Polls run.result is None and run.error is None — only sees one field |
✅ |
_tool_run_check (task branch) |
mcp_server.py:1746 |
run.status == FINISHED and not run.error |
✅ |
_tool_create_check |
mcp_server.py:1833 |
run.status == FINISHED |
✅ |
execute_preset_checks |
run.py:176 |
run_should_be_approved(run) reads run.error, run.result |
✅ |
execute_state_checks |
run.py:258 |
None | ✅ |
Status-BEFORE-result inversion proof: New order is status = FINISHED; run.result = result. A reader observing run.result is not None is guaranteed (under GIL serialization at executor thread's GIL release point) to also see run.status == FINISHED. The reverse window from the OLD code (run.result set, but status still RUNNING) was the actual DRC-3307 bug — fixed.
updated_params merge moved to top: Previously between result-write and status-set on success. Now first. This means when an external observer sees status == FINISHED, params is also already merged. Strictly safer.
Loop captured correctly: loop = asyncio.get_running_loop() at line 147 is still used at line 256 for run_in_executor. No dangling references.
Verdict: ✅ Proven safe.
Claim 2: Cancel-sentinel preservation widening is correct
Suspect: The OLD success path had NO if run.status != CANCELLED guard. Adding it changes behavior — cancelled runs that finish anyway now stay CANCELLED instead of becoming FINISHED. Could a caller depend on the old behavior?
Proof:
- Searched all callers of
cancel_run— onlycancel_run_handlerinrun_api.py:73. It does not poll for "cancelled but actually finished." - No frontend code reads cancelled-then-finished state as a special condition (frontend renders by status enum).
- Test
test_cancel_sentinel_preserved_on_successexplicitly locks in the new behavior. _tool_run_checktask branch:run_succeeded = run.status == FINISHED and not run.error—CANCELLEDcorrectly maps toFalse(no auto-approve).
Verdict: ✅ Proven correct. Behavior change is intentional and tested.
Claim 3: run_succeeded flag covers all control-flow paths through _tool_run_check
Suspect: The flag could be left in a wrong state along some control-flow path.
Proof — exhaustive path enumeration:
| Path | run_succeeded final |
Auto-approve | Correct? |
|---|---|---|---|
| Metadata branch, success | True |
Fires | ✅ |
| Metadata branch, RecceException | n/a (re-raised as ValueError) | Skipped | ✅ |
| Metadata branch, RecceCloudException | n/a (propagates, since it's not RecceException) |
Skipped (run_succeeded never set True) | ✅ |
Task branch, submit_run raises RecceException |
n/a (re-raised) | Skipped | ✅ |
Task branch, submit_run raises RecceCloudException |
n/a (propagates) | Skipped | ✅ |
| Task branch, run finishes successfully | True |
Fires | ✅ |
Task branch, run fails (status=FAILED) |
False |
Skipped | ✅ |
| Task branch, run cancelled | False (CANCELLED ≠ FINISHED) |
Skipped | ✅ |
Cloud-vs-local exception handling: Both update_check_by_id and export_persistent_state are placed OUTSIDE the try / except RecceException blocks. Confirmed: RecceCloudException extends Exception, NOT RecceException (recce/util/recce_cloud.py:31). The CLAUDE.md addition documenting this is accurate.
Verdict: ✅ Proven correct.
Claim 4: await asyncio.get_event_loop().run_in_executor(None, export_persistent_state) cost is acceptable
Suspect: Could be expensive enough to regress agent latency.
Proof:
- Local mode:
state_loader.export()writes a JSON file. Sub-ms cost. - Cloud mode: HTTPS upload of full RecceState. Cost = O(checks + runs + artifacts). For typical sessions (≤100 checks/runs), this is small (~100KB-few MB upload).
- Non-blocking: Wrapped in
run_in_executor, so event loop continues servicing other coroutines. - Idempotent and safe under failure: If
export_persistent_statefails afterupdate_check_by_idsucceeds, cloud is source of truth and remains consistent. Local state file is stale but recoverable on next session. - Same pattern as
_tool_create_check:1847— consistent.
Verdict: ✅ Cost is acceptable. Non-blocking. Could be debounced as a future optimization but not required.
Claim 5: Test removal of await asyncio.sleep(0.1) is correct
Suspect: The sleep might have been masking a real race condition that still exists.
Proof:
- OLD code:
update_run_resultwas async + scheduled viarun_coroutine_threadsafefrom the executor thread. The scheduled coroutine ran in the event loop AFTERawait futureresumed (becauserun_coroutine_threadsafeposts to the loop, which only processes it on the next iteration). The sleep gave the loop time to process the coroutine. - NEW code:
update_run_resultis called synchronously insidefn()BEFOREfn()returns. The future resolves only afterfn()returns. Therefore by the timeawait futureresumes, all writes are committed. - Regression guard:
assert run.status == RunStatus.FINISHEDintest_normalized_params_propagate_to_runwill fail (without the sleep) if anyone reverts to async scheduling.
Verdict: ✅ Sleep correctly removed. Regression guard in place.
Claim 6: CLAUDE.md "Cloud vs Local Mode Exception Types" rule is accurate
Suspect: The rule could be wrong about the inheritance hierarchy.
Proof:
- Verified
recce/exceptions.py:1:class RecceException(Exception)— base is plainException. - Verified
recce/util/recce_cloud.py:31:class RecceCloudException(Exception)— base is plainException, NOTRecceException. - Therefore
try / except RecceExceptiondoes not catchRecceCloudException. Documentation accurate.
Verdict: ✅ Accurate.
Findings
Finding 1 — [Low priority, test gap] Task-branch negative paths not asserted
File: tests/test_mcp_server.py
Issue: No test asserts that auto-approve does NOT fire when:
mock_run.status = RunStatus.FAILED, ORmock_run.error = "some error"while status is FINISHED
Proof of gap: If the gate run.status == RunStatus.FINISHED and not run.error is accidentally weakened to True (or only one of the two clauses), the existing tests still pass because they only exercise the success case.
Suggestion: Add two tests mirroring test_tool_run_check_metadata_branch_recce_exception_no_auto_approve for the task branch (one for FAILED status, one for non-None error). 5–10 lines each.
Finding 2 — [Low priority, test robustness] test_cancel_sentinel_preserved_on_success has timing dependency
File: tests/apis/test_run_func.py:399-422
Issue: The test sets run.status = RunStatus.CANCELLED AFTER submit_run returns but BEFORE awaiting the future. This relies on the test thread winning the race against the executor's fn() execution. With a fast mock task, the executor could have already completed update_run_result(run, result, ...) before the test sets CANCELLED — in which case the assertion passes for the wrong reason (the test just observes its own write).
Proof: No threading.Event or barrier blocks fn() from running ahead of the test's CANCELLED write. The test happens to work because Python's executor submission has small overhead, but this is implementation-detail timing.
Suggestion: Inject a threading.Event into the mock task's execute() and have the test set CANCELLED, then signal the event, before awaiting the future. Deterministic.
Finding 3 — [Low priority, cleanup] Redundant run.result = await future
File: recce/apis/run_api.py:66, recce/apis/check_api.py:182, recce/mcp_server.py:1745
Issue: With update_run_result now setting run.result synchronously inside fn() before the future resolves, these three lines overwrite run.result with the same value (or with None on failure, where it was already None).
Suggestion: Replace with bare await future. Not a bug — pure cleanup.
Finding 4 — [Low priority, cosmetic] Brittle line reference in comment
File: recce/mcp_server.py:1751-1752
Issue: Comment says "_tool_create_check (line 1832: ...)" but actual line is 1843. Line numbers rot.
Suggestion: Drop the line number; symbolic reference suffices.
Tests verification
pytest tests/apis/test_run_func.py tests/test_mcp_server.py→ 146 passed, 0 failedflake8 recce/apis/run_func.py recce/mcp_server.py→ clean
Verdict
✅ Approve.
Every suspicious change has a defensible justification. The race-condition fix is correctly implemented; the auto-approve flag covers all control-flow paths; the cloud-exception placement matches the documented rule. No CRITICAL or HIGH-severity issues. Findings 1–4 are LOW-priority polish that can land in a follow-up.
mcp_server.py — restructure _tool_run_check auto-approve:
- Move update_check_by_id OUTSIDE the metadata-branch try/except
RecceException (mirrors _tool_create_check structure). Cloud mode's
CheckDAO re-raises RecceCloudException, which is NOT a RecceException
subclass — leaving the call inside the wrapper would let cloud failures
escape unwrapped, breaking the consistent ValueError contract.
- Add `await loop.run_in_executor(None, export_persistent_state)` after
auto-approve so local-mode is_checked changes survive process exit.
Closes the persistence asymmetry vs _tool_create_check that was
miscalled "matches policy" in the original comment.
- Reframe the auto-approve comment block: drop the consumer-side
"preset checks / ⏳ Unapproved" narrative (cross-repo concept bleed),
add `See: DRC-3307` for PM-decision attribution, and explicitly
document that an empty result IS valid evidence for metadata-only
types (zero-change confirmation).
- Add `logger.info(f"Auto-approved check {check_id} (triggered_by=...)")`
so server logs preserve attribution even when the cloud audit-layer
ContextVar isn't populated (pre-existing scope, but worth surfacing
now that run_check also auto-approves).
run_func.py — preserve CANCELLED sentinel + invert store order:
- update_run_result now writes run.status BEFORE run.result/run.error
so async readers in the loop thread never observe a "result-present
+ status-RUNNING" intermediate.
- Both success and failure paths skip the status overwrite when
run.status == CANCELLED. Previously only the failure path checked
this; the success path unconditionally overwrote CANCELLED with
FINISHED, silently regressing cancel_run from another thread.
- Docstring rewritten: declares the cross-thread store-ordering
invariant explicitly (CPython GIL atomic per attribute, but multi-
store sequences are not serialized with the loop). Added DRC-3307
historical context with a sunset condition tied to the consumer-side
defensive guards in recce-cloud-infra `derive_check_run_status` —
safe to remove this sync-call requirement once those guards go.
Coverage (87.5% patch -> 100%) + tighter regression tags:
- New negative-path test
`test_tool_run_check_metadata_branch_recce_exception_no_auto_approve`:
asserts update_check_by_id and export_persistent_state are NOT called
when _tool_lineage_diff raises RecceException. Closes the metadata
branch coverage gap (codecov-flagged 2 missing lines).
- New `test_cancel_sentinel_preserved_on_success`: simulates
cancel_run setting CANCELLED before the executor finishes, asserts
update_run_result success path does NOT overwrite. Exercises the
newly-added `if run.status != RunStatus.CANCELLED` branch on the
success path.
- New `test_repr_fallback_when_str_error_is_none_string`: verifies
`repr(error)` fallback when `str(error) == "None"` records useful
failed_reason. Closes the run_func.py coverage gap (codecov-flagged
2 missing lines).
- Tighten the three "DRC-3307" tags in test_mcp_server.py to
"Regression: ... (DRC-3307 root cause)" so the meaning survives
Linear-archive of the ticket.
- Reframe the "no sleep needed" test comment as a forward-looking
regression assertion ("if this fails, update_run_result is being
scheduled async again").
Mock additions to existing tests:
- All 5 _tool_run_check call sites in tests now mock
`recce.apis.check_func.export_persistent_state`. Without this mock,
the new persistence call hits a real default_context() and fails
with AttributeError (no state_loader on None).
Tests: 146 passing (was 115; +31 includes prior 28 unrelated suite
additions and 3 new for this PR).
DRC-3307
Signed-off-by: Kent <iamcxa@gmail.com>
The narrow except RecceException pattern silently misses cloud-mode failures because RecceCloudException inherits from Exception, not from RecceException. Documenting the rule + two acceptable fixes so future DAO-write additions catch this at review time. DRC-3307 Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Kent <iamcxa@gmail.com>
Problem
PR comment Checklist showed wrong status for two reasons:
submit_runrace condition:update_run_resultwasasyncand scheduled viarun_coroutine_threadsafe. Afterawait future,run.statuswas stillRUNNING— auto-approve gate (run.status == FINISHED) never triggered.run_checkmissing auto-approve:_tool_run_checkhad no auto-approve logic. Preset checks (created via API before the agent runs) stayed ⏳ Unapproved forever.Note: PR #1324 added auto-approve to
_tool_create_checkbut it was defeated by the race condition —run_executed = run.status == RunStatus.FINISHEDevaluated toFalsebecauserun.statuswas stillRUNNING.Changes
Fix 1: submit_run race condition (run_func.py)
update_run_resultchanged fromasync def+run_coroutine_threadsafeto plaindefcalled synchronously inside the executor threadfn(). Nowrun.statusandrun.resultare set BEFORE the future resolves. This fixes ALL 6 callers ofsubmit_run:run_api.pyrun.resultpopulatedcheck_api.pyrun.resultpopulatedmcp_server.pymcp_server.pyrun.pyrun_should_be_approved()sees resultrun.pyFix 2: run_check auto-approve (mcp_server.py)
Added auto-approve to
_tool_run_checkmatching_tool_create_checkpolicy:run.status == FINISHED and not run.errorTests
asyncio.sleep(0.1)workarounds (no longer needed with sync update)assert run.status == RunStatus.FINISHEDafterawait futureupdate_check_by_id(is_checked=True)assertions to all 3 run_check tests (row_count_diff, lineage_diff, schema_diff)Testing
Staging Verification
Local integration test (Docker image with this fix) on jaffle_shop_golden PR #26:
Related