fix(mcp): DRC-3307 fix submit_run race condition + run_check auto-approve by iamcxa · Pull Request #1342 · DataRecce/recce

iamcxa · 2026-04-24T14:41:11Z

Problem

PR comment Checklist showed wrong status for two reasons:

submit_run race condition: update_run_result was async and scheduled via run_coroutine_threadsafe. After await future, run.status was still RUNNING — auto-approve gate (run.status == FINISHED) never triggered.
run_check missing auto-approve: _tool_run_check had no auto-approve logic. Preset checks (created via API before the agent runs) stayed ⏳ Unapproved forever.

Note: PR #1324 added auto-approve to _tool_create_check but it was defeated by the race condition — run_executed = run.status == RunStatus.FINISHED evaluated to False because run.status was still RUNNING.

Changes

Fix 1: submit_run race condition (run_func.py)

update_run_result changed from async def + run_coroutine_threadsafe to plain def called synchronously inside the executor thread fn(). Now run.status and run.result are set BEFORE the future resolves. This fixes ALL 6 callers of submit_run:

Caller	File	Benefit
REST API (run)	`run_api.py`	`run.result` populated
REST API (check rerun)	`check_api.py`	`run.result` populated
MCP run_check	`mcp_server.py`	Auto-approve works
MCP create_check	`mcp_server.py`	Auto-approve works (fixes #1324)
Preset checks	`run.py`	`run_should_be_approved()` sees result
State checks	`run.py`	Run completes before next iteration

Fix 2: run_check auto-approve (mcp_server.py)

Added auto-approve to _tool_run_check matching _tool_create_check policy:

Metadata checks (lineage_diff, schema_diff): unconditional auto-approve after success
Task checks: conditional on run.status == FINISHED and not run.error

Tests

Removed asyncio.sleep(0.1) workarounds (no longer needed with sync update)
Added assert run.status == RunStatus.FINISHED after await future
Added update_check_by_id(is_checked=True) assertions to all 3 run_check tests (row_count_diff, lineage_diff, schema_diff)

Testing

143 tests pass (29 run_func + 114 mcp_server)
Break-point probe verified at Level B (unit tests exercise exact break-point)

Staging Verification

Local integration test (Docker image with this fix) on jaffle_shop_golden PR #26:

Metric	Before	After
✅ Approved	1/10 (10%)	8/10 (80%)
⏳ Unapproved	8/10	0/10

Two root-cause fixes for PR comment checklist showing wrong status: 1. submit_run race condition (run_func.py): update_run_result was async and scheduled via run_coroutine_threadsafe, so run.status stayed RUNNING after await future. Made it synchronous — called directly inside the executor thread fn() so run.status and run.result are set BEFORE the future resolves. Removed async sleep workarounds from tests. 2. run_check auto-approve (mcp_server.py): _tool_run_check had no auto-approve logic, so preset checks (created via API before the agent runs) stayed Unapproved forever. Added the same auto-approve policy as _tool_create_check: successful run = Approved (PM decision). DRC-3307 Signed-off-by: Kent <iamcxa@gmail.com>

probe-run-check found that _tool_run_check tests passed but never asserted auto-approve behavior — mock_run.status was MagicMock (not RunStatus.FINISHED), so the auto-approve condition silently evaluated to False. Tests would pass identically if auto-approve lines were deleted. Fix: - Set mock_run.status = RunStatus.FINISHED in row_count_diff test - Assert update_check_by_id(is_checked=True) in all 3 run_check tests (row_count_diff, lineage_diff, schema_diff) DRC-3307 Signed-off-by: Kent <iamcxa@gmail.com>

codecov · 2026-04-24T14:48:20Z

Codecov Report

❌ Patch coverage is 95.49550% with 5 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
recce/mcp_server.py	92.30%	2 Missing ⚠️
tests/apis/test_run_func.py	95.23%	2 Missing ⚠️
recce/apis/run_func.py	91.66%	1 Missing ⚠️

Files with missing lines	Coverage Δ
tests/test_mcp_server.py	`99.85% <100.00%> (+<0.01%)`	⬆️
recce/apis/run_func.py	`64.97% <91.66%> (+6.81%)`	⬆️
recce/mcp_server.py	`87.98% <92.30%> (+0.18%)`	⬆️
tests/apis/test_run_func.py	`90.54% <95.23%> (+0.51%)`	⬆️

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

iamcxa

@iamcxa — review of DRC-3307 root-cause fix.

Overall: race-condition fix is correct in spirit and the 80% jaffle_shop_golden integration result is solid C-level evidence. Surfacing 12 inline findings; 2 HIGH worth resolving before merge (cloud exception leak + new cross-thread race window introduced by the sync fix), and a persistence-asymmetry MEDIUM that breaks the "matches _tool_create_check policy" claim. Defaulting to COMMENT because the core fix is sound but several edges need attention.

Verification Matrix (from PR body)

#	Concern	Status
V1	`submit_run` race fixed — sync `update_run_result`	Confirmed at unit + integration. Introduces new smaller race: result-before-status store ordering visible to async-loop readers (#2)
V2	All 6 `submit_run` callers benefit	Confirmed; readers may now see transient `RUNNING + result-present` inconsistency (#2)
V3	`_tool_run_check` auto-approve matches `_tool_create_check` policy	Gate matches; persistence behavior differs — `_tool_create_check` calls `export_persistent_state`, `_tool_run_check` does not (#3, #4)
V4	Removed `asyncio.sleep(0.1)` workarounds	Confirmed; only 1 of 3 removed-sleep tests added a regression assertion (advisory B)

Break-point Coverage

BP1 update_run_result sync (run_func.py:152-178):    B (regression test) + C (jaffle_shop_golden 80%)
BP2 _tool_run_check metadata auto-approve:           B (2 success tests) — no negative test (#11)
BP3 _tool_run_check task auto-approve:               B (1 success test) — no negative test for error path

Residual uncertainty:

Cross-thread race window (NEW): result-before-status store ordering creates sub-µs window where async-loop readers see result is not None while status == RUNNING. Symptom likely undetectable in practice but a regression of the previous loop-serialized model. Fix: invert the store order, OR document GIL/CPython assumption (#2).
RecceCloudException not subclass of RecceException — cloud-mode update_check_by_id failure surfaces unwrapped to MCP caller (#1).
Persistence asymmetry: local-mode SIGTERM after run_check before any other operation → auto-approve lost (#3).
Cloud-mode acting_user_id=None for all MCP-triggered auto-approvals — shared-instance audit attribution gap (#6).

Recommended human probe before merge:

Decide store-order inversion in update_run_result (status before result/error) OR add explicit GIL/CPython note in docstring.
Move update_check_by_id outside the metadata try (matches _tool_create_check's structure) OR broaden except to include RecceCloudException.
Decide whether to add export_persistent_state to _tool_run_check for parity with _tool_create_check, OR clarify the comment that persistence is intentionally deferred to MCP shutdown.
Add mock_check_dao.update_check_by_id.assert_not_called() to the existing RecceException-path test in test_mcp_server.py (cheap regression guard, #11).

Advisory (not posted as inline comments)

#	File:Line	Root	Summary	Suggested Action
A	run_func.py:155-163	DOC	DRC-3307 historical narrative is load-bearing while consumer recce-cloud-infra#1245 defense exists — its `derive_check_run_status` workaround cites this docstring as the canonical justification	Add sunset note: "Safe to remove once recce-cloud-infra defensive guards in `derive_check_run_status` are removed" (#9)
B	tests/apis/test_run_func.py:376, 395	DOC	2 of 3 removed-sleep tests don't add regression assertion (`run.status == FINISHED`); only `test_normalized_params_propagate_to_run` does	Decide: add regression assertion to all 3, OR document why one canonical assertion is sufficient
C	—	NEW	Cross-repo defense-in-depth pattern: root-cause repo's docstring outlives its own fix because consumer-side defense cites it as canonical justification	Captured in `recce-cloud-infra/.claude/review-lessons.md` (just created in companion review)

Verification Summary

Check	Result
ruff (run_func.py + mcp_server.py)	clean
Pre-scan persistence asymmetry	1 finding (export_persistent_state missing in _tool_run_check)
code-reviewer	2 HIGH + 2 LOW
comment-analyzer	1 critical + 5 improvements + 4 rot risks
ToB security (standard)	2 MEDIUM + 2 LOW
Unit tests	not executed locally — PR claims 143 pass

mcp_server.py — restructure _tool_run_check auto-approve: - Move update_check_by_id OUTSIDE the metadata-branch try/except RecceException (mirrors _tool_create_check structure). Cloud mode's CheckDAO re-raises RecceCloudException, which is NOT a RecceException subclass — leaving the call inside the wrapper would let cloud failures escape unwrapped, breaking the consistent ValueError contract. - Add `await loop.run_in_executor(None, export_persistent_state)` after auto-approve so local-mode is_checked changes survive process exit. Closes the persistence asymmetry vs _tool_create_check that was miscalled "matches policy" in the original comment. - Reframe the auto-approve comment block: drop the consumer-side "preset checks / ⏳ Unapproved" narrative (cross-repo concept bleed), add `See: DRC-3307` for PM-decision attribution, and explicitly document that an empty result IS valid evidence for metadata-only types (zero-change confirmation). - Add `logger.info(f"Auto-approved check {check_id} (triggered_by=...)")` so server logs preserve attribution even when the cloud audit-layer ContextVar isn't populated (pre-existing scope, but worth surfacing now that run_check also auto-approves). run_func.py — preserve CANCELLED sentinel + invert store order: - update_run_result now writes run.status BEFORE run.result/run.error so async readers in the loop thread never observe a "result-present + status-RUNNING" intermediate. - Both success and failure paths skip the status overwrite when run.status == CANCELLED. Previously only the failure path checked this; the success path unconditionally overwrote CANCELLED with FINISHED, silently regressing cancel_run from another thread. - Docstring rewritten: declares the cross-thread store-ordering invariant explicitly (CPython GIL atomic per attribute, but multi- store sequences are not serialized with the loop). Added DRC-3307 historical context with a sunset condition tied to the consumer-side defensive guards in recce-cloud-infra `derive_check_run_status` — safe to remove this sync-call requirement once those guards go. Coverage (87.5% patch -> 100%) + tighter regression tags: - New negative-path test `test_tool_run_check_metadata_branch_recce_exception_no_auto_approve`: asserts update_check_by_id and export_persistent_state are NOT called when _tool_lineage_diff raises RecceException. Closes the metadata branch coverage gap (codecov-flagged 2 missing lines). - New `test_cancel_sentinel_preserved_on_success`: simulates cancel_run setting CANCELLED before the executor finishes, asserts update_run_result success path does NOT overwrite. Exercises the newly-added `if run.status != RunStatus.CANCELLED` branch on the success path. - New `test_repr_fallback_when_str_error_is_none_string`: verifies `repr(error)` fallback when `str(error) == "None"` records useful failed_reason. Closes the run_func.py coverage gap (codecov-flagged 2 missing lines). - Tighten the three "DRC-3307" tags in test_mcp_server.py to "Regression: ... (DRC-3307 root cause)" so the meaning survives Linear-archive of the ticket. - Reframe the "no sleep needed" test comment as a forward-looking regression assertion ("if this fails, update_run_result is being scheduled async again"). Mock additions to existing tests: - All 5 _tool_run_check call sites in tests now mock `recce.apis.check_func.export_persistent_state`. Without this mock, the new persistence call hits a real default_context() and fails with AttributeError (no state_loader on None). Tests: 146 passing (was 115; +31 includes prior 28 unrelated suite additions and 3 new for this PR). DRC-3307 Signed-off-by: Kent <iamcxa@gmail.com>

iamcxa · 2026-04-27T02:46:27Z

Surfacing 12 inline findings; 2 HIGH worth resolving before merge (cloud exception leak + new cross-thread race window introduced by the sync fix), and a persistence-asymmetry MEDIUM that breaks the "matches _tool_create_check policy" claim.

All 12 inline threads addressed in commit 8385e0a7 (pushed). Per-thread replies posted. Summary by severity:

HIGH (2) — both resolved

Cloud exception leak: update_check_by_id moved OUTSIDE the metadata-branch try/except RecceException (mirrors _tool_create_check structure), so RecceCloudException no longer escapes unwrapped.
Cross-thread race window: update_run_result now writes run.status BEFORE run.result/run.error (no "result-present + status-RUNNING"), and both success and failure paths skip the status overwrite when run.status == CANCELLED (preserves the cancel sentinel from cancel_run). Docstring declares the cross-thread store-ordering invariant explicitly.

MEDIUM (4) — all resolved

Persistence asymmetry: added await loop.run_in_executor(None, export_persistent_state) after auto-approve. Now genuinely matches _tool_create_check policy.
Misleading "same policy" claim: closed by the structural fix above.
Empty-result auto-approve: documented explicitly that empty result IS valid evidence for metadata-only types (zero changes confirms the upstream PR did not affect lineage/schema).
Cloud audit attribution: added logger.info(... triggered_by=...) at the auto-approve site for log-side attribution. Full ContextVar injection deferred — pre-existing scope shared with _tool_create_check, warrants a separate ticket.

LOW + NIT (5+1) — all resolved

Comment block reframed to drop consumer-side concepts ("preset checks" / ⏳).
(See: DRC-3307) added inline with PM-decision attribution.
Sunset condition added to historical-narrative docstring (tied to recce-cloud-infra defensive-guard removal).
3 DRC-3307 test tags tightened to "Regression: ... (DRC-3307 root cause)".
"no sleep needed" reframed as forward-looking regression assertion.

Coverage: codecov-flagged 4 missing patch lines closed:

mcp_server.py (2 lines): new test_tool_run_check_metadata_branch_recce_exception_no_auto_approve exercises the negative path.
run_func.py (2 lines): new test_cancel_sentinel_preserved_on_success (CANCELLED-collision branch) + test_repr_fallback_when_str_error_is_none_string (repr(error) fallback when str(e) == "None").

Tests: 146 passing (was 115 + 31 unrelated suite additions during this fix). Manual review handoff per your request — no re-review tagging.

The narrow except RecceException pattern silently misses cloud-mode failures because RecceCloudException inherits from Exception, not from RecceException. Documenting the rule + two acceptable fixes so future DAO-write additions catch this at review time. DRC-3307 Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Kent <iamcxa@gmail.com>

iamcxa

@iamcxa — re-review after 8385e0a7.

Excellent close-out — would APPROVE if you weren't the author (GitHub blocks self-approve). All 12 v1 inline findings + 2 of 3 advisory items addressed; 10 fully fixed, 2 with documented partial trade-offs. Sign-off equivalent: ready for human reviewer + merge. The new tests (test_cancel_sentinel_preserved_on_success, test_repr_fallback_when_str_error_is_none_string, test_tool_run_check_metadata_branch_recce_exception_no_auto_approve) noticeably expand coverage beyond what v1 review asked for.

v1 → v2 Closure

#	v1 finding	Status
1	RecceCloudException not caught (HIGH)	✅ FIXED — auto-approve moved OUTSIDE the `try / except RecceException` blocks; explicit comment explains the trade-off
2	Cross-thread store ordering (HIGH)	✅ FIXED — status BEFORE result/error; CANCELLED preserved; new regression test
3	`export_persistent_state` missing (MEDIUM)	✅ FIXED — `await ... run_in_executor(None, export_persistent_state)` added
4	"Same policy" claim misleading (MEDIUM)	✅ FIXED — rewritten as "same gate as `_tool_create_check` (line 1832)" + persistence parity stated
5	Metadata empty result auto-approve (MEDIUM)	✅ FIXED via doc — "empty result IS valid evidence: zero changes confirms..."
6	`acting_user_id=None` audit gap (MEDIUM)	⚠️ PARTIAL — `logger.info(triggered_by=...)` added (the minimum I suggested); cloud-mode user-context propagation not done (acceptable, pre-existing pattern)
7	"preset checks" consumer-side context (LOW)	✅ FIXED — phrasing removed
8	PM decision ticket link (LOW)	✅ FIXED — `(See: DRC-3307)` added
9	Load-bearing docstring sunset (LOW)	✅ FIXED — explicit sunset-condition paragraph added
10	DRC-3307 tag fade in tests (NIT)	⚠️ Partial — wording reframed as regression-assertion, tag retained (acceptable)
11	Negative-path test gap (LOW)	✅ FIXED — `test_tool_run_check_metadata_branch_recce_exception_no_auto_approve` added
12	Forward-looking framing in test comment (NIT)	✅ FIXED — reframed as regression guard
Adv A	Sunset for historical narrative	✅ FIXED via #9
Adv B	Regression assertion in 2/3 removed-sleep tests	⚠️ Partial — replaced with NEW dedicated tests (cancel sentinel + repr fallback) instead; net coverage improved
Adv C	Cross-repo review-lessons	Captured in companion repo

2 minor follow-ups

Neither is blocking. Both are about the new cross-thread race window's residual edges, posted as inline LOW + NIT.

Verification Matrix (final)

#	Concern	Status
V1	`submit_run` race fixed	✅ Confirmed; ordering refined; cancel sentinel preserved
V2	All 6 callers benefit	✅ Confirmed; ordering protects completion-signal readers
V3	`_tool_run_check` matches `_tool_create_check` policy	✅ Confirmed; gate + persistence + comment all aligned
V4	Removed sleep workarounds	✅ Confirmed + new regression tests added

Two docstring-only refinements to make the cross-thread guarantees match what the code actually provides: 1. Distinguish completion-signal readers (wait_run_handler polling on `result is not None`) from snapshot readers (single-render reads of both fields). The store-order guarantee only protects the former; the latter may briefly see (FINISHED, result=None) for the sub-µs gap and are expected to re-poll. 2. Document that the cancel-sentinel guard is check-then-assign, not atomic. A sub-µs GIL window remains where cancel_run can flip status between the check and the overwrite. The guard is best-effort against the macroscopic race (cancel before executor enters), which is the practically-relevant case. Note the threading.Lock / threading.Event escape hatch for future callers that need full atomicity. No behavioral change. DRC-3307 Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Kent <iamcxa@gmail.com>

iamcxa · 2026-04-27T03:19:16Z

Excellent close-out — would APPROVE if you weren't the author (GitHub blocks self-approve). All 12 v1 inline findings + 2 of 3 advisory items addressed.

Two v2 LOW/NIT threads addressed in e175c783 (docstring-only refinements):

Cancel-sentinel guard now documents check-then-assign as best-effort against the macroscopic race; sub-µs GIL window between check and overwrite called out as practically unobservable, with threading.Lock / threading.Event noted as the escape hatch for callers that need full atomicity.
Store-order claim tightened: the guarantee protects completion-signal readers (e.g., wait_run_handler polling result is not None); snapshot readers may briefly see (FINISHED, result=None) for the sub-µs gap and are expected to re-poll. Doc now matches the actual guarantee.

No behavioral change. Tests still 31/31 on test_run_func.py.

Per your sign-off equivalent — ready for human reviewer + merge.

iamcxa

@iamcxa — final sign-off after e175c783 + 9eb96f17.

All clear — would APPROVE if you weren't the author (GitHub blocks self-approve). v2 findings closed; bonus project-knowledge capture in CLAUDE.md. Sign-off equivalent: ready for human reviewer + merge.

v2 → v3 Closure

v2 finding	v3 status
LOW (cancel sentinel sub-µs GIL race)	✅ Addressed via doc — "best-effort against the macroscopic race... use `threading.Lock` or `threading.Event` if a future caller needs full atomicity"
NIT (snapshot reader docstring nuance)	✅ Addressed — completion-signal vs snapshot reader distinction documented; sub-µs `(FINISHED, result=None)` window acknowledged

Bonus

9eb96f17 adds a Cloud vs Local Mode Exception Types section to CLAUDE.md capturing the cloud-vs-local exception class divergence rule (from v1 finding #1, RecceCloudException). This is exactly the project-knowledge promotion I would have proposed in D2 — captured proactively without prompting. 👍

Closure Summary

Round	Commit	Inline	Body
v1	`76eac07`	12 (2 HIGH + 4 MEDIUM + 6 LOW/NIT)	full review
v2	`8385e0a7`	2 (1 LOW + 1 NIT)	closure + sign-off
v3	`e175c783`	0	final sign-off

Verification Matrix (final)

#	Concern	Status
V1	`submit_run` race fixed	✅ sync `update_run_result`, status-before-result, cancel sentinel preserved
V2	All 6 callers benefit	✅ ordering protects completion-signal readers
V3	`_tool_run_check` matches `_tool_create_check` policy	✅ gate + persistence + comment all aligned
V4	Removed sleep workarounds	✅ regression tests added (cancel sentinel + repr fallback + negative-path metadata)

Ready to ship.

…ested-actions-inconsistency

wcchang1115 · 2026-04-28T03:15:38Z

Code Review — PR #1342 (DRC-3307) — Adversarial Re-review

Summary

Re-reviewed every change with the assumption it's wrong until proven otherwise. All changes prove valid under scrutiny. No critical, high, or medium-severity bugs found. Two test gaps remain.

Adversarial proofs of validity

Claim 1: `update_run_result` async→sync is safe across ALL 6 callers

Suspect: Removing async and run_coroutine_threadsafe could break callers that depended on async scheduling timing or expected the old "result-set-before-status" ordering.

Proof:

Memory ordering: After loop.run_in_executor(None, fn), the future resolves only when fn() returns. By Python's asyncio spec, await future resumes via call_soon_threadsafe, which acts as a memory barrier — all writes from the executor thread (including run.status, run.result, run.error) are visible to the awaiter.

All 6 callers verified:

Caller	File:Line	Reads after `await future`	Safe
`create_run_handler`	`run_api.py:66`	None (returns `run` to FastAPI)	✅
`rerun_check_handler`	`check_api.py:182`	None (returns `run`)	✅
`wait_run_handler`	`run_api.py:85`	Polls `run.result is None and run.error is None` — only sees one field	✅
`_tool_run_check` (task branch)	`mcp_server.py:1746`	`run.status == FINISHED and not run.error`	✅
`_tool_create_check`	`mcp_server.py:1833`	`run.status == FINISHED`	✅
`execute_preset_checks`	`run.py:176`	`run_should_be_approved(run)` reads `run.error`, `run.result`	✅
`execute_state_checks`	`run.py:258`	None	✅

Status-BEFORE-result inversion proof: New order is status = FINISHED; run.result = result. A reader observing run.result is not None is guaranteed (under GIL serialization at executor thread's GIL release point) to also see run.status == FINISHED. The reverse window from the OLD code (run.result set, but status still RUNNING) was the actual DRC-3307 bug — fixed.
updated_params merge moved to top: Previously between result-write and status-set on success. Now first. This means when an external observer sees status == FINISHED, params is also already merged. Strictly safer.
Loop captured correctly: loop = asyncio.get_running_loop() at line 147 is still used at line 256 for run_in_executor. No dangling references.

Verdict: ✅ Proven safe.

Claim 2: Cancel-sentinel preservation widening is correct

Suspect: The OLD success path had NO if run.status != CANCELLED guard. Adding it changes behavior — cancelled runs that finish anyway now stay CANCELLED instead of becoming FINISHED. Could a caller depend on the old behavior?

Proof:

Searched all callers of cancel_run — only cancel_run_handler in run_api.py:73. It does not poll for "cancelled but actually finished."
No frontend code reads cancelled-then-finished state as a special condition (frontend renders by status enum).
Test test_cancel_sentinel_preserved_on_success explicitly locks in the new behavior.
_tool_run_check task branch: run_succeeded = run.status == FINISHED and not run.error — CANCELLED correctly maps to False (no auto-approve).

Verdict: ✅ Proven correct. Behavior change is intentional and tested.

Claim 3: `run_succeeded` flag covers all control-flow paths through `_tool_run_check`

Suspect: The flag could be left in a wrong state along some control-flow path.

Proof — exhaustive path enumeration:

Path	`run_succeeded` final	Auto-approve	Correct?
Metadata branch, success	`True`	Fires	✅
Metadata branch, RecceException	n/a (re-raised as ValueError)	Skipped	✅
Metadata branch, RecceCloudException	n/a (propagates, since it's not `RecceException`)	Skipped (run_succeeded never set True)	✅
Task branch, `submit_run` raises RecceException	n/a (re-raised)	Skipped	✅
Task branch, `submit_run` raises RecceCloudException	n/a (propagates)	Skipped	✅
Task branch, run finishes successfully	`True`	Fires	✅
Task branch, run fails (`status=FAILED`)	`False`	Skipped	✅
Task branch, run cancelled	`False` (CANCELLED ≠ FINISHED)	Skipped	✅

Cloud-vs-local exception handling: Both update_check_by_id and export_persistent_state are placed OUTSIDE the try / except RecceException blocks. Confirmed: RecceCloudException extends Exception, NOT RecceException (recce/util/recce_cloud.py:31). The CLAUDE.md addition documenting this is accurate.

Verdict: ✅ Proven correct.

Claim 4: `await asyncio.get_event_loop().run_in_executor(None, export_persistent_state)` cost is acceptable

Suspect: Could be expensive enough to regress agent latency.

Proof:

Local mode: state_loader.export() writes a JSON file. Sub-ms cost.
Cloud mode: HTTPS upload of full RecceState. Cost = O(checks + runs + artifacts). For typical sessions (≤100 checks/runs), this is small (~100KB-few MB upload).
Non-blocking: Wrapped in run_in_executor, so event loop continues servicing other coroutines.
Idempotent and safe under failure: If export_persistent_state fails after update_check_by_id succeeds, cloud is source of truth and remains consistent. Local state file is stale but recoverable on next session.
Same pattern as _tool_create_check:1847 — consistent.

Verdict: ✅ Cost is acceptable. Non-blocking. Could be debounced as a future optimization but not required.

Claim 5: Test removal of `await asyncio.sleep(0.1)` is correct

Suspect: The sleep might have been masking a real race condition that still exists.

Proof:

OLD code: update_run_result was async + scheduled via run_coroutine_threadsafe from the executor thread. The scheduled coroutine ran in the event loop AFTER await future resumed (because run_coroutine_threadsafe posts to the loop, which only processes it on the next iteration). The sleep gave the loop time to process the coroutine.
NEW code: update_run_result is called synchronously inside fn() BEFORE fn() returns. The future resolves only after fn() returns. Therefore by the time await future resumes, all writes are committed.
Regression guard: assert run.status == RunStatus.FINISHED in test_normalized_params_propagate_to_run will fail (without the sleep) if anyone reverts to async scheduling.

Verdict: ✅ Sleep correctly removed. Regression guard in place.

Claim 6: CLAUDE.md "Cloud vs Local Mode Exception Types" rule is accurate

Suspect: The rule could be wrong about the inheritance hierarchy.

Proof:

Verified recce/exceptions.py:1: class RecceException(Exception) — base is plain Exception.
Verified recce/util/recce_cloud.py:31: class RecceCloudException(Exception) — base is plain Exception, NOT RecceException.
Therefore try / except RecceException does not catch RecceCloudException. Documentation accurate.

Verdict: ✅ Accurate.

Findings

Finding 1 — [Low priority, test gap] Task-branch negative paths not asserted

File: tests/test_mcp_server.py

Issue: No test asserts that auto-approve does NOT fire when:

mock_run.status = RunStatus.FAILED, OR
mock_run.error = "some error" while status is FINISHED

Proof of gap: If the gate run.status == RunStatus.FINISHED and not run.error is accidentally weakened to True (or only one of the two clauses), the existing tests still pass because they only exercise the success case.

Suggestion: Add two tests mirroring test_tool_run_check_metadata_branch_recce_exception_no_auto_approve for the task branch (one for FAILED status, one for non-None error). 5–10 lines each.

Finding 2 — [Low priority, test robustness] `test_cancel_sentinel_preserved_on_success` has timing dependency

File: tests/apis/test_run_func.py:399-422

Issue: The test sets run.status = RunStatus.CANCELLED AFTER submit_run returns but BEFORE awaiting the future. This relies on the test thread winning the race against the executor's fn() execution. With a fast mock task, the executor could have already completed update_run_result(run, result, ...) before the test sets CANCELLED — in which case the assertion passes for the wrong reason (the test just observes its own write).

Proof: No threading.Event or barrier blocks fn() from running ahead of the test's CANCELLED write. The test happens to work because Python's executor submission has small overhead, but this is implementation-detail timing.

Suggestion: Inject a threading.Event into the mock task's execute() and have the test set CANCELLED, then signal the event, before awaiting the future. Deterministic.

Finding 3 — [Low priority, cleanup] Redundant `run.result = await future`

File: recce/apis/run_api.py:66, recce/apis/check_api.py:182, recce/mcp_server.py:1745

Issue: With update_run_result now setting run.result synchronously inside fn() before the future resolves, these three lines overwrite run.result with the same value (or with None on failure, where it was already None).

Suggestion: Replace with bare await future. Not a bug — pure cleanup.

Finding 4 — [Low priority, cosmetic] Brittle line reference in comment

File: recce/mcp_server.py:1751-1752

Issue: Comment says "_tool_create_check (line 1832: ...)" but actual line is 1843. Line numbers rot.

Suggestion: Drop the line number; symbolic reference suffices.

Tests verification

pytest tests/apis/test_run_func.py tests/test_mcp_server.py → 146 passed, 0 failed
flake8 recce/apis/run_func.py recce/mcp_server.py → clean

Verdict

✅ Approve.

Every suspicious change has a defensible justification. The race-condition fix is correctly implemented; the auto-approve flag covers all control-flow paths; the cloud-exception placement matches the documented rule. No CRITICAL or HIGH-severity issues. Findings 1–4 are LOW-priority polish that can land in a follow-up.

mcp_server.py — restructure _tool_run_check auto-approve: - Move update_check_by_id OUTSIDE the metadata-branch try/except RecceException (mirrors _tool_create_check structure). Cloud mode's CheckDAO re-raises RecceCloudException, which is NOT a RecceException subclass — leaving the call inside the wrapper would let cloud failures escape unwrapped, breaking the consistent ValueError contract. - Add `await loop.run_in_executor(None, export_persistent_state)` after auto-approve so local-mode is_checked changes survive process exit. Closes the persistence asymmetry vs _tool_create_check that was miscalled "matches policy" in the original comment. - Reframe the auto-approve comment block: drop the consumer-side "preset checks / ⏳ Unapproved" narrative (cross-repo concept bleed), add `See: DRC-3307` for PM-decision attribution, and explicitly document that an empty result IS valid evidence for metadata-only types (zero-change confirmation). - Add `logger.info(f"Auto-approved check {check_id} (triggered_by=...)")` so server logs preserve attribution even when the cloud audit-layer ContextVar isn't populated (pre-existing scope, but worth surfacing now that run_check also auto-approves). run_func.py — preserve CANCELLED sentinel + invert store order: - update_run_result now writes run.status BEFORE run.result/run.error so async readers in the loop thread never observe a "result-present + status-RUNNING" intermediate. - Both success and failure paths skip the status overwrite when run.status == CANCELLED. Previously only the failure path checked this; the success path unconditionally overwrote CANCELLED with FINISHED, silently regressing cancel_run from another thread. - Docstring rewritten: declares the cross-thread store-ordering invariant explicitly (CPython GIL atomic per attribute, but multi- store sequences are not serialized with the loop). Added DRC-3307 historical context with a sunset condition tied to the consumer-side defensive guards in recce-cloud-infra `derive_check_run_status` — safe to remove this sync-call requirement once those guards go. Coverage (87.5% patch -> 100%) + tighter regression tags: - New negative-path test `test_tool_run_check_metadata_branch_recce_exception_no_auto_approve`: asserts update_check_by_id and export_persistent_state are NOT called when _tool_lineage_diff raises RecceException. Closes the metadata branch coverage gap (codecov-flagged 2 missing lines). - New `test_cancel_sentinel_preserved_on_success`: simulates cancel_run setting CANCELLED before the executor finishes, asserts update_run_result success path does NOT overwrite. Exercises the newly-added `if run.status != RunStatus.CANCELLED` branch on the success path. - New `test_repr_fallback_when_str_error_is_none_string`: verifies `repr(error)` fallback when `str(error) == "None"` records useful failed_reason. Closes the run_func.py coverage gap (codecov-flagged 2 missing lines). - Tighten the three "DRC-3307" tags in test_mcp_server.py to "Regression: ... (DRC-3307 root cause)" so the meaning survives Linear-archive of the ticket. - Reframe the "no sleep needed" test comment as a forward-looking regression assertion ("if this fails, update_run_result is being scheduled async again"). Mock additions to existing tests: - All 5 _tool_run_check call sites in tests now mock `recce.apis.check_func.export_persistent_state`. Without this mock, the new persistence call hits a real default_context() and fails with AttributeError (no state_loader on None). Tests: 146 passing (was 115; +31 includes prior 28 unrelated suite additions and 3 new for this PR). DRC-3307 Signed-off-by: Kent <iamcxa@gmail.com>

The narrow except RecceException pattern silently misses cloud-mode failures because RecceCloudException inherits from Exception, not from RecceException. Documenting the rule + two acceptable fixes so future DAO-write additions catch this at review time. DRC-3307 Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Kent <iamcxa@gmail.com>

iamcxa added 2 commits April 24, 2026 19:32

iamcxa commented Apr 27, 2026

View reviewed changes

Comment thread recce/apis/run_func.py

Comment thread recce/apis/run_func.py Outdated

iamcxa commented Apr 27, 2026

View reviewed changes

iamcxa requested a review from wcchang1115 April 27, 2026 09:11

iamcxa self-assigned this Apr 27, 2026

Merge branch 'main' into feature/drc-3307-fix-checklist-sync-and-sugg…

ee0d5ee

…ested-actions-inconsistency

wcchang1115 approved these changes Apr 28, 2026

View reviewed changes

iamcxa merged commit 072b313 into main Apr 28, 2026
22 checks passed

iamcxa deleted the feature/drc-3307-fix-checklist-sync-and-suggested-actions-inconsistency branch April 28, 2026 09:28

even-wei mentioned this pull request May 9, 2026

chore(claude): add recce-pr-verifier subagent #1364

Merged

2 tasks

Conversation

iamcxa commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

Fix 1: submit_run race condition (run_func.py)

Fix 2: run_check auto-approve (mcp_server.py)

Tests

Testing

Staging Verification

Related

Uh oh!

codecov Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

iamcxa left a comment

Choose a reason for hiding this comment

Verification Matrix (from PR body)

Break-point Coverage

Advisory (not posted as inline comments)

Verification Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

iamcxa commented Apr 27, 2026

Uh oh!

iamcxa left a comment

Choose a reason for hiding this comment

v1 → v2 Closure

2 minor follow-ups

Verification Matrix (final)

Uh oh!

Uh oh!

Uh oh!

iamcxa commented Apr 27, 2026

Uh oh!

iamcxa left a comment

Choose a reason for hiding this comment

v2 → v3 Closure

Bonus

Closure Summary

Verification Matrix (final)

Uh oh!

wcchang1115 commented Apr 28, 2026

Code Review — PR #1342 (DRC-3307) — Adversarial Re-review

Summary

Adversarial proofs of validity

Claim 1: update_run_result async→sync is safe across ALL 6 callers

Claim 2: Cancel-sentinel preservation widening is correct

Claim 3: run_succeeded flag covers all control-flow paths through _tool_run_check

Claim 4: await asyncio.get_event_loop().run_in_executor(None, export_persistent_state) cost is acceptable

Claim 5: Test removal of await asyncio.sleep(0.1) is correct

Claim 6: CLAUDE.md "Cloud vs Local Mode Exception Types" rule is accurate

Findings

Finding 1 — [Low priority, test gap] Task-branch negative paths not asserted

Finding 2 — [Low priority, test robustness] test_cancel_sentinel_preserved_on_success has timing dependency

Finding 3 — [Low priority, cleanup] Redundant run.result = await future

Finding 4 — [Low priority, cosmetic] Brittle line reference in comment

Tests verification

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

iamcxa commented Apr 24, 2026 •

edited

Loading

codecov Bot commented Apr 24, 2026 •

edited

Loading

Claim 1: `update_run_result` async→sync is safe across ALL 6 callers

Claim 3: `run_succeeded` flag covers all control-flow paths through `_tool_run_check`

Claim 4: `await asyncio.get_event_loop().run_in_executor(None, export_persistent_state)` cost is acceptable

Claim 5: Test removal of `await asyncio.sleep(0.1)` is correct

Finding 2 — [Low priority, test robustness] `test_cancel_sentinel_preserved_on_success` has timing dependency

Finding 3 — [Low priority, cleanup] Redundant `run.result = await future`