Skip to content

fix(mcp): DRC-3307 fix submit_run race condition + run_check auto-approve#1342

Merged
iamcxa merged 6 commits into
mainfrom
feature/drc-3307-fix-checklist-sync-and-suggested-actions-inconsistency
Apr 28, 2026
Merged

fix(mcp): DRC-3307 fix submit_run race condition + run_check auto-approve#1342
iamcxa merged 6 commits into
mainfrom
feature/drc-3307-fix-checklist-sync-and-suggested-actions-inconsistency

Conversation

@iamcxa
Copy link
Copy Markdown
Contributor

@iamcxa iamcxa commented Apr 24, 2026

Problem

PR comment Checklist showed wrong status for two reasons:

  1. submit_run race condition: update_run_result was async and scheduled via run_coroutine_threadsafe. After await future, run.status was still RUNNING — auto-approve gate (run.status == FINISHED) never triggered.
  2. run_check missing auto-approve: _tool_run_check had no auto-approve logic. Preset checks (created via API before the agent runs) stayed ⏳ Unapproved forever.

Note: PR #1324 added auto-approve to _tool_create_check but it was defeated by the race condition — run_executed = run.status == RunStatus.FINISHED evaluated to False because run.status was still RUNNING.

Changes

Fix 1: submit_run race condition (run_func.py)

update_run_result changed from async def + run_coroutine_threadsafe to plain def called synchronously inside the executor thread fn(). Now run.status and run.result are set BEFORE the future resolves. This fixes ALL 6 callers of submit_run:

Caller File Benefit
REST API (run) run_api.py run.result populated
REST API (check rerun) check_api.py run.result populated
MCP run_check mcp_server.py Auto-approve works
MCP create_check mcp_server.py Auto-approve works (fixes #1324)
Preset checks run.py run_should_be_approved() sees result
State checks run.py Run completes before next iteration

Fix 2: run_check auto-approve (mcp_server.py)

Added auto-approve to _tool_run_check matching _tool_create_check policy:

  • Metadata checks (lineage_diff, schema_diff): unconditional auto-approve after success
  • Task checks: conditional on run.status == FINISHED and not run.error

Tests

  • Removed asyncio.sleep(0.1) workarounds (no longer needed with sync update)
  • Added assert run.status == RunStatus.FINISHED after await future
  • Added update_check_by_id(is_checked=True) assertions to all 3 run_check tests (row_count_diff, lineage_diff, schema_diff)

Testing

  • 143 tests pass (29 run_func + 114 mcp_server)
  • Break-point probe verified at Level B (unit tests exercise exact break-point)

Staging Verification

Local integration test (Docker image with this fix) on jaffle_shop_golden PR #26:

Metric Before After
✅ Approved 1/10 (10%) 8/10 (80%)
⏳ Unapproved 8/10 0/10

Related

iamcxa added 2 commits April 24, 2026 19:32
Two root-cause fixes for PR comment checklist showing wrong status:

1. submit_run race condition (run_func.py): update_run_result was async
   and scheduled via run_coroutine_threadsafe, so run.status stayed
   RUNNING after await future. Made it synchronous — called directly
   inside the executor thread fn() so run.status and run.result are
   set BEFORE the future resolves. Removed async sleep workarounds
   from tests.

2. run_check auto-approve (mcp_server.py): _tool_run_check had no
   auto-approve logic, so preset checks (created via API before the
   agent runs) stayed Unapproved forever. Added the same auto-approve
   policy as _tool_create_check: successful run = Approved (PM decision).

DRC-3307

Signed-off-by: Kent <iamcxa@gmail.com>
probe-run-check found that _tool_run_check tests passed but never
asserted auto-approve behavior — mock_run.status was MagicMock (not
RunStatus.FINISHED), so the auto-approve condition silently evaluated
to False. Tests would pass identically if auto-approve lines were
deleted.

Fix:
- Set mock_run.status = RunStatus.FINISHED in row_count_diff test
- Assert update_check_by_id(is_checked=True) in all 3 run_check tests
  (row_count_diff, lineage_diff, schema_diff)

DRC-3307

Signed-off-by: Kent <iamcxa@gmail.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 24, 2026

Codecov Report

❌ Patch coverage is 95.49550% with 5 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
recce/mcp_server.py 92.30% 2 Missing ⚠️
tests/apis/test_run_func.py 95.23% 2 Missing ⚠️
recce/apis/run_func.py 91.66% 1 Missing ⚠️
Files with missing lines Coverage Δ
tests/test_mcp_server.py 99.85% <100.00%> (+<0.01%) ⬆️
recce/apis/run_func.py 64.97% <91.66%> (+6.81%) ⬆️
recce/mcp_server.py 87.98% <92.30%> (+0.18%) ⬆️
tests/apis/test_run_func.py 90.54% <95.23%> (+0.51%) ⬆️

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor Author

@iamcxa iamcxa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iamcxa — review of DRC-3307 root-cause fix.

Overall: race-condition fix is correct in spirit and the 80% jaffle_shop_golden integration result is solid C-level evidence. Surfacing 12 inline findings; 2 HIGH worth resolving before merge (cloud exception leak + new cross-thread race window introduced by the sync fix), and a persistence-asymmetry MEDIUM that breaks the "matches _tool_create_check policy" claim. Defaulting to COMMENT because the core fix is sound but several edges need attention.

Verification Matrix (from PR body)

# Concern Status
V1 submit_run race fixed — sync update_run_result Confirmed at unit + integration. Introduces new smaller race: result-before-status store ordering visible to async-loop readers (#2)
V2 All 6 submit_run callers benefit Confirmed; readers may now see transient RUNNING + result-present inconsistency (#2)
V3 _tool_run_check auto-approve matches _tool_create_check policy Gate matches; persistence behavior differs_tool_create_check calls export_persistent_state, _tool_run_check does not (#3, #4)
V4 Removed asyncio.sleep(0.1) workarounds Confirmed; only 1 of 3 removed-sleep tests added a regression assertion (advisory B)

Break-point Coverage

BP1 update_run_result sync (run_func.py:152-178):    B (regression test) + C (jaffle_shop_golden 80%)
BP2 _tool_run_check metadata auto-approve:           B (2 success tests) — no negative test (#11)
BP3 _tool_run_check task auto-approve:               B (1 success test) — no negative test for error path

Residual uncertainty:

  • Cross-thread race window (NEW): result-before-status store ordering creates sub-µs window where async-loop readers see result is not None while status == RUNNING. Symptom likely undetectable in practice but a regression of the previous loop-serialized model. Fix: invert the store order, OR document GIL/CPython assumption (#2).
  • RecceCloudException not subclass of RecceException — cloud-mode update_check_by_id failure surfaces unwrapped to MCP caller (#1).
  • Persistence asymmetry: local-mode SIGTERM after run_check before any other operation → auto-approve lost (#3).
  • Cloud-mode acting_user_id=None for all MCP-triggered auto-approvals — shared-instance audit attribution gap (#6).

Recommended human probe before merge:

  1. Decide store-order inversion in update_run_result (status before result/error) OR add explicit GIL/CPython note in docstring.
  2. Move update_check_by_id outside the metadata try (matches _tool_create_check's structure) OR broaden except to include RecceCloudException.
  3. Decide whether to add export_persistent_state to _tool_run_check for parity with _tool_create_check, OR clarify the comment that persistence is intentionally deferred to MCP shutdown.
  4. Add mock_check_dao.update_check_by_id.assert_not_called() to the existing RecceException-path test in test_mcp_server.py (cheap regression guard, #11).

Advisory (not posted as inline comments)

# File:Line Root Summary Suggested Action
A run_func.py:155-163 DOC DRC-3307 historical narrative is load-bearing while consumer recce-cloud-infra#1245 defense exists — its derive_check_run_status workaround cites this docstring as the canonical justification Add sunset note: "Safe to remove once recce-cloud-infra defensive guards in derive_check_run_status are removed" (#9)
B tests/apis/test_run_func.py:376, 395 DOC 2 of 3 removed-sleep tests don't add regression assertion (run.status == FINISHED); only test_normalized_params_propagate_to_run does Decide: add regression assertion to all 3, OR document why one canonical assertion is sufficient
C NEW Cross-repo defense-in-depth pattern: root-cause repo's docstring outlives its own fix because consumer-side defense cites it as canonical justification Captured in recce-cloud-infra/.claude/review-lessons.md (just created in companion review)

Verification Summary

Check Result
ruff (run_func.py + mcp_server.py) clean
Pre-scan persistence asymmetry 1 finding (export_persistent_state missing in _tool_run_check)
code-reviewer 2 HIGH + 2 LOW
comment-analyzer 1 critical + 5 improvements + 4 rot risks
ToB security (standard) 2 MEDIUM + 2 LOW
Unit tests not executed locally — PR claims 143 pass

Comment thread recce/mcp_server.py Outdated
Comment thread recce/apis/run_func.py
Comment thread recce/mcp_server.py Outdated
Comment thread recce/mcp_server.py Outdated
Comment thread recce/mcp_server.py Outdated
Comment thread recce/mcp_server.py Outdated
Comment thread recce/apis/run_func.py
Comment thread tests/test_mcp_server.py Outdated
Comment thread tests/test_mcp_server.py Outdated
Comment thread tests/apis/test_run_func.py Outdated
mcp_server.py — restructure _tool_run_check auto-approve:

- Move update_check_by_id OUTSIDE the metadata-branch try/except
  RecceException (mirrors _tool_create_check structure). Cloud mode's
  CheckDAO re-raises RecceCloudException, which is NOT a RecceException
  subclass — leaving the call inside the wrapper would let cloud failures
  escape unwrapped, breaking the consistent ValueError contract.
- Add `await loop.run_in_executor(None, export_persistent_state)` after
  auto-approve so local-mode is_checked changes survive process exit.
  Closes the persistence asymmetry vs _tool_create_check that was
  miscalled "matches policy" in the original comment.
- Reframe the auto-approve comment block: drop the consumer-side
  "preset checks / ⏳ Unapproved" narrative (cross-repo concept bleed),
  add `See: DRC-3307` for PM-decision attribution, and explicitly
  document that an empty result IS valid evidence for metadata-only
  types (zero-change confirmation).
- Add `logger.info(f"Auto-approved check {check_id} (triggered_by=...)")`
  so server logs preserve attribution even when the cloud audit-layer
  ContextVar isn't populated (pre-existing scope, but worth surfacing
  now that run_check also auto-approves).

run_func.py — preserve CANCELLED sentinel + invert store order:

- update_run_result now writes run.status BEFORE run.result/run.error
  so async readers in the loop thread never observe a "result-present
  + status-RUNNING" intermediate.
- Both success and failure paths skip the status overwrite when
  run.status == CANCELLED. Previously only the failure path checked
  this; the success path unconditionally overwrote CANCELLED with
  FINISHED, silently regressing cancel_run from another thread.
- Docstring rewritten: declares the cross-thread store-ordering
  invariant explicitly (CPython GIL atomic per attribute, but multi-
  store sequences are not serialized with the loop). Added DRC-3307
  historical context with a sunset condition tied to the consumer-side
  defensive guards in recce-cloud-infra `derive_check_run_status` —
  safe to remove this sync-call requirement once those guards go.

Coverage (87.5% patch -> 100%) + tighter regression tags:

- New negative-path test
  `test_tool_run_check_metadata_branch_recce_exception_no_auto_approve`:
  asserts update_check_by_id and export_persistent_state are NOT called
  when _tool_lineage_diff raises RecceException. Closes the metadata
  branch coverage gap (codecov-flagged 2 missing lines).
- New `test_cancel_sentinel_preserved_on_success`: simulates
  cancel_run setting CANCELLED before the executor finishes, asserts
  update_run_result success path does NOT overwrite. Exercises the
  newly-added `if run.status != RunStatus.CANCELLED` branch on the
  success path.
- New `test_repr_fallback_when_str_error_is_none_string`: verifies
  `repr(error)` fallback when `str(error) == "None"` records useful
  failed_reason. Closes the run_func.py coverage gap (codecov-flagged
  2 missing lines).
- Tighten the three "DRC-3307" tags in test_mcp_server.py to
  "Regression: ... (DRC-3307 root cause)" so the meaning survives
  Linear-archive of the ticket.
- Reframe the "no sleep needed" test comment as a forward-looking
  regression assertion ("if this fails, update_run_result is being
  scheduled async again").

Mock additions to existing tests:
- All 5 _tool_run_check call sites in tests now mock
  `recce.apis.check_func.export_persistent_state`. Without this mock,
  the new persistence call hits a real default_context() and fails
  with AttributeError (no state_loader on None).

Tests: 146 passing (was 115; +31 includes prior 28 unrelated suite
additions and 3 new for this PR).

DRC-3307

Signed-off-by: Kent <iamcxa@gmail.com>
@iamcxa
Copy link
Copy Markdown
Contributor Author

iamcxa commented Apr 27, 2026

Surfacing 12 inline findings; 2 HIGH worth resolving before merge (cloud exception leak + new cross-thread race window introduced by the sync fix), and a persistence-asymmetry MEDIUM that breaks the "matches _tool_create_check policy" claim.

All 12 inline threads addressed in commit 8385e0a7 (pushed). Per-thread replies posted. Summary by severity:

HIGH (2) — both resolved

  • Cloud exception leak: update_check_by_id moved OUTSIDE the metadata-branch try/except RecceException (mirrors _tool_create_check structure), so RecceCloudException no longer escapes unwrapped.
  • Cross-thread race window: update_run_result now writes run.status BEFORE run.result/run.error (no "result-present + status-RUNNING"), and both success and failure paths skip the status overwrite when run.status == CANCELLED (preserves the cancel sentinel from cancel_run). Docstring declares the cross-thread store-ordering invariant explicitly.

MEDIUM (4) — all resolved

  • Persistence asymmetry: added await loop.run_in_executor(None, export_persistent_state) after auto-approve. Now genuinely matches _tool_create_check policy.
  • Misleading "same policy" claim: closed by the structural fix above.
  • Empty-result auto-approve: documented explicitly that empty result IS valid evidence for metadata-only types (zero changes confirms the upstream PR did not affect lineage/schema).
  • Cloud audit attribution: added logger.info(... triggered_by=...) at the auto-approve site for log-side attribution. Full ContextVar injection deferred — pre-existing scope shared with _tool_create_check, warrants a separate ticket.

LOW + NIT (5+1) — all resolved

  • Comment block reframed to drop consumer-side concepts ("preset checks" / ⏳).
  • (See: DRC-3307) added inline with PM-decision attribution.
  • Sunset condition added to historical-narrative docstring (tied to recce-cloud-infra defensive-guard removal).
  • 3 DRC-3307 test tags tightened to "Regression: ... (DRC-3307 root cause)".
  • "no sleep needed" reframed as forward-looking regression assertion.

Coverage: codecov-flagged 4 missing patch lines closed:

  • mcp_server.py (2 lines): new test_tool_run_check_metadata_branch_recce_exception_no_auto_approve exercises the negative path.
  • run_func.py (2 lines): new test_cancel_sentinel_preserved_on_success (CANCELLED-collision branch) + test_repr_fallback_when_str_error_is_none_string (repr(error) fallback when str(e) == "None").

Tests: 146 passing (was 115 + 31 unrelated suite additions during this fix). Manual review handoff per your request — no re-review tagging.

The narrow except RecceException pattern silently misses cloud-mode
failures because RecceCloudException inherits from Exception, not from
RecceException. Documenting the rule + two acceptable fixes so future
DAO-write additions catch this at review time.

DRC-3307

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
Copy link
Copy Markdown
Contributor Author

@iamcxa iamcxa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iamcxa — re-review after 8385e0a7.

Excellent close-out — would APPROVE if you weren't the author (GitHub blocks self-approve). All 12 v1 inline findings + 2 of 3 advisory items addressed; 10 fully fixed, 2 with documented partial trade-offs. Sign-off equivalent: ready for human reviewer + merge. The new tests (test_cancel_sentinel_preserved_on_success, test_repr_fallback_when_str_error_is_none_string, test_tool_run_check_metadata_branch_recce_exception_no_auto_approve) noticeably expand coverage beyond what v1 review asked for.

v1 → v2 Closure

# v1 finding Status
1 RecceCloudException not caught (HIGH) ✅ FIXED — auto-approve moved OUTSIDE the try / except RecceException blocks; explicit comment explains the trade-off
2 Cross-thread store ordering (HIGH) ✅ FIXED — status BEFORE result/error; CANCELLED preserved; new regression test
3 export_persistent_state missing (MEDIUM) ✅ FIXED — await ... run_in_executor(None, export_persistent_state) added
4 "Same policy" claim misleading (MEDIUM) ✅ FIXED — rewritten as "same gate as _tool_create_check (line 1832)" + persistence parity stated
5 Metadata empty result auto-approve (MEDIUM) ✅ FIXED via doc — "empty result IS valid evidence: zero changes confirms..."
6 acting_user_id=None audit gap (MEDIUM) ⚠️ PARTIAL — logger.info(triggered_by=...) added (the minimum I suggested); cloud-mode user-context propagation not done (acceptable, pre-existing pattern)
7 "preset checks" consumer-side context (LOW) ✅ FIXED — phrasing removed
8 PM decision ticket link (LOW) ✅ FIXED — (See: DRC-3307) added
9 Load-bearing docstring sunset (LOW) ✅ FIXED — explicit sunset-condition paragraph added
10 DRC-3307 tag fade in tests (NIT) ⚠️ Partial — wording reframed as regression-assertion, tag retained (acceptable)
11 Negative-path test gap (LOW) ✅ FIXED — test_tool_run_check_metadata_branch_recce_exception_no_auto_approve added
12 Forward-looking framing in test comment (NIT) ✅ FIXED — reframed as regression guard
Adv A Sunset for historical narrative ✅ FIXED via #9
Adv B Regression assertion in 2/3 removed-sleep tests ⚠️ Partial — replaced with NEW dedicated tests (cancel sentinel + repr fallback) instead; net coverage improved
Adv C Cross-repo review-lessons Captured in companion repo

2 minor follow-ups

Neither is blocking. Both are about the new cross-thread race window's residual edges, posted as inline LOW + NIT.

Verification Matrix (final)

# Concern Status
V1 submit_run race fixed ✅ Confirmed; ordering refined; cancel sentinel preserved
V2 All 6 callers benefit ✅ Confirmed; ordering protects completion-signal readers
V3 _tool_run_check matches _tool_create_check policy ✅ Confirmed; gate + persistence + comment all aligned
V4 Removed sleep workarounds ✅ Confirmed + new regression tests added

Comment thread recce/apis/run_func.py
Comment thread recce/apis/run_func.py Outdated
Two docstring-only refinements to make the cross-thread guarantees
match what the code actually provides:

1. Distinguish completion-signal readers (wait_run_handler polling on
   `result is not None`) from snapshot readers (single-render reads
   of both fields). The store-order guarantee only protects the
   former; the latter may briefly see (FINISHED, result=None) for the
   sub-µs gap and are expected to re-poll.

2. Document that the cancel-sentinel guard is check-then-assign, not
   atomic. A sub-µs GIL window remains where cancel_run can flip
   status between the check and the overwrite. The guard is
   best-effort against the macroscopic race (cancel before executor
   enters), which is the practically-relevant case. Note the
   threading.Lock / threading.Event escape hatch for future callers
   that need full atomicity.

No behavioral change.

DRC-3307

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
@iamcxa
Copy link
Copy Markdown
Contributor Author

iamcxa commented Apr 27, 2026

Excellent close-out — would APPROVE if you weren't the author (GitHub blocks self-approve). All 12 v1 inline findings + 2 of 3 advisory items addressed.

Two v2 LOW/NIT threads addressed in e175c783 (docstring-only refinements):

  • Cancel-sentinel guard now documents check-then-assign as best-effort against the macroscopic race; sub-µs GIL window between check and overwrite called out as practically unobservable, with threading.Lock / threading.Event noted as the escape hatch for callers that need full atomicity.
  • Store-order claim tightened: the guarantee protects completion-signal readers (e.g., wait_run_handler polling result is not None); snapshot readers may briefly see (FINISHED, result=None) for the sub-µs gap and are expected to re-poll. Doc now matches the actual guarantee.

No behavioral change. Tests still 31/31 on test_run_func.py.

Per your sign-off equivalent — ready for human reviewer + merge.

Copy link
Copy Markdown
Contributor Author

@iamcxa iamcxa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iamcxa — final sign-off after e175c783 + 9eb96f17.

All clear — would APPROVE if you weren't the author (GitHub blocks self-approve). v2 findings closed; bonus project-knowledge capture in CLAUDE.md. Sign-off equivalent: ready for human reviewer + merge.

v2 → v3 Closure

v2 finding v3 status
LOW (cancel sentinel sub-µs GIL race) ✅ Addressed via doc — "best-effort against the macroscopic race... use threading.Lock or threading.Event if a future caller needs full atomicity"
NIT (snapshot reader docstring nuance) ✅ Addressed — completion-signal vs snapshot reader distinction documented; sub-µs (FINISHED, result=None) window acknowledged

Bonus

9eb96f17 adds a Cloud vs Local Mode Exception Types section to CLAUDE.md capturing the cloud-vs-local exception class divergence rule (from v1 finding #1, RecceCloudException). This is exactly the project-knowledge promotion I would have proposed in D2 — captured proactively without prompting. 👍

Closure Summary

Round Commit Inline Body
v1 76eac07 12 (2 HIGH + 4 MEDIUM + 6 LOW/NIT) full review
v2 8385e0a7 2 (1 LOW + 1 NIT) closure + sign-off
v3 e175c783 0 final sign-off

Verification Matrix (final)

# Concern Status
V1 submit_run race fixed ✅ sync update_run_result, status-before-result, cancel sentinel preserved
V2 All 6 callers benefit ✅ ordering protects completion-signal readers
V3 _tool_run_check matches _tool_create_check policy ✅ gate + persistence + comment all aligned
V4 Removed sleep workarounds ✅ regression tests added (cancel sentinel + repr fallback + negative-path metadata)

Ready to ship.

@iamcxa iamcxa requested a review from wcchang1115 April 27, 2026 09:11
@iamcxa iamcxa self-assigned this Apr 27, 2026
@wcchang1115
Copy link
Copy Markdown
Collaborator

Code Review — PR #1342 (DRC-3307) — Adversarial Re-review

Summary

Re-reviewed every change with the assumption it's wrong until proven otherwise. All changes prove valid under scrutiny. No critical, high, or medium-severity bugs found. Two test gaps remain.


Adversarial proofs of validity

Claim 1: update_run_result async→sync is safe across ALL 6 callers

Suspect: Removing async and run_coroutine_threadsafe could break callers that depended on async scheduling timing or expected the old "result-set-before-status" ordering.

Proof:

  • Memory ordering: After loop.run_in_executor(None, fn), the future resolves only when fn() returns. By Python's asyncio spec, await future resumes via call_soon_threadsafe, which acts as a memory barrier — all writes from the executor thread (including run.status, run.result, run.error) are visible to the awaiter.

  • All 6 callers verified:

    Caller File:Line Reads after await future Safe
    create_run_handler run_api.py:66 None (returns run to FastAPI)
    rerun_check_handler check_api.py:182 None (returns run)
    wait_run_handler run_api.py:85 Polls run.result is None and run.error is None — only sees one field
    _tool_run_check (task branch) mcp_server.py:1746 run.status == FINISHED and not run.error
    _tool_create_check mcp_server.py:1833 run.status == FINISHED
    execute_preset_checks run.py:176 run_should_be_approved(run) reads run.error, run.result
    execute_state_checks run.py:258 None
  • Status-BEFORE-result inversion proof: New order is status = FINISHED; run.result = result. A reader observing run.result is not None is guaranteed (under GIL serialization at executor thread's GIL release point) to also see run.status == FINISHED. The reverse window from the OLD code (run.result set, but status still RUNNING) was the actual DRC-3307 bug — fixed.

  • updated_params merge moved to top: Previously between result-write and status-set on success. Now first. This means when an external observer sees status == FINISHED, params is also already merged. Strictly safer.

  • Loop captured correctly: loop = asyncio.get_running_loop() at line 147 is still used at line 256 for run_in_executor. No dangling references.

Verdict: ✅ Proven safe.

Claim 2: Cancel-sentinel preservation widening is correct

Suspect: The OLD success path had NO if run.status != CANCELLED guard. Adding it changes behavior — cancelled runs that finish anyway now stay CANCELLED instead of becoming FINISHED. Could a caller depend on the old behavior?

Proof:

  • Searched all callers of cancel_run — only cancel_run_handler in run_api.py:73. It does not poll for "cancelled but actually finished."
  • No frontend code reads cancelled-then-finished state as a special condition (frontend renders by status enum).
  • Test test_cancel_sentinel_preserved_on_success explicitly locks in the new behavior.
  • _tool_run_check task branch: run_succeeded = run.status == FINISHED and not run.errorCANCELLED correctly maps to False (no auto-approve).

Verdict: ✅ Proven correct. Behavior change is intentional and tested.

Claim 3: run_succeeded flag covers all control-flow paths through _tool_run_check

Suspect: The flag could be left in a wrong state along some control-flow path.

Proof — exhaustive path enumeration:

Path run_succeeded final Auto-approve Correct?
Metadata branch, success True Fires
Metadata branch, RecceException n/a (re-raised as ValueError) Skipped
Metadata branch, RecceCloudException n/a (propagates, since it's not RecceException) Skipped (run_succeeded never set True)
Task branch, submit_run raises RecceException n/a (re-raised) Skipped
Task branch, submit_run raises RecceCloudException n/a (propagates) Skipped
Task branch, run finishes successfully True Fires
Task branch, run fails (status=FAILED) False Skipped
Task branch, run cancelled False (CANCELLED ≠ FINISHED) Skipped

Cloud-vs-local exception handling: Both update_check_by_id and export_persistent_state are placed OUTSIDE the try / except RecceException blocks. Confirmed: RecceCloudException extends Exception, NOT RecceException (recce/util/recce_cloud.py:31). The CLAUDE.md addition documenting this is accurate.

Verdict: ✅ Proven correct.

Claim 4: await asyncio.get_event_loop().run_in_executor(None, export_persistent_state) cost is acceptable

Suspect: Could be expensive enough to regress agent latency.

Proof:

  • Local mode: state_loader.export() writes a JSON file. Sub-ms cost.
  • Cloud mode: HTTPS upload of full RecceState. Cost = O(checks + runs + artifacts). For typical sessions (≤100 checks/runs), this is small (~100KB-few MB upload).
  • Non-blocking: Wrapped in run_in_executor, so event loop continues servicing other coroutines.
  • Idempotent and safe under failure: If export_persistent_state fails after update_check_by_id succeeds, cloud is source of truth and remains consistent. Local state file is stale but recoverable on next session.
  • Same pattern as _tool_create_check:1847 — consistent.

Verdict: ✅ Cost is acceptable. Non-blocking. Could be debounced as a future optimization but not required.

Claim 5: Test removal of await asyncio.sleep(0.1) is correct

Suspect: The sleep might have been masking a real race condition that still exists.

Proof:

  • OLD code: update_run_result was async + scheduled via run_coroutine_threadsafe from the executor thread. The scheduled coroutine ran in the event loop AFTER await future resumed (because run_coroutine_threadsafe posts to the loop, which only processes it on the next iteration). The sleep gave the loop time to process the coroutine.
  • NEW code: update_run_result is called synchronously inside fn() BEFORE fn() returns. The future resolves only after fn() returns. Therefore by the time await future resumes, all writes are committed.
  • Regression guard: assert run.status == RunStatus.FINISHED in test_normalized_params_propagate_to_run will fail (without the sleep) if anyone reverts to async scheduling.

Verdict: ✅ Sleep correctly removed. Regression guard in place.

Claim 6: CLAUDE.md "Cloud vs Local Mode Exception Types" rule is accurate

Suspect: The rule could be wrong about the inheritance hierarchy.

Proof:

  • Verified recce/exceptions.py:1: class RecceException(Exception) — base is plain Exception.
  • Verified recce/util/recce_cloud.py:31: class RecceCloudException(Exception) — base is plain Exception, NOT RecceException.
  • Therefore try / except RecceException does not catch RecceCloudException. Documentation accurate.

Verdict: ✅ Accurate.


Findings

Finding 1 — [Low priority, test gap] Task-branch negative paths not asserted

File: tests/test_mcp_server.py

Issue: No test asserts that auto-approve does NOT fire when:

  • mock_run.status = RunStatus.FAILED, OR
  • mock_run.error = "some error" while status is FINISHED

Proof of gap: If the gate run.status == RunStatus.FINISHED and not run.error is accidentally weakened to True (or only one of the two clauses), the existing tests still pass because they only exercise the success case.

Suggestion: Add two tests mirroring test_tool_run_check_metadata_branch_recce_exception_no_auto_approve for the task branch (one for FAILED status, one for non-None error). 5–10 lines each.

Finding 2 — [Low priority, test robustness] test_cancel_sentinel_preserved_on_success has timing dependency

File: tests/apis/test_run_func.py:399-422

Issue: The test sets run.status = RunStatus.CANCELLED AFTER submit_run returns but BEFORE awaiting the future. This relies on the test thread winning the race against the executor's fn() execution. With a fast mock task, the executor could have already completed update_run_result(run, result, ...) before the test sets CANCELLED — in which case the assertion passes for the wrong reason (the test just observes its own write).

Proof: No threading.Event or barrier blocks fn() from running ahead of the test's CANCELLED write. The test happens to work because Python's executor submission has small overhead, but this is implementation-detail timing.

Suggestion: Inject a threading.Event into the mock task's execute() and have the test set CANCELLED, then signal the event, before awaiting the future. Deterministic.

Finding 3 — [Low priority, cleanup] Redundant run.result = await future

File: recce/apis/run_api.py:66, recce/apis/check_api.py:182, recce/mcp_server.py:1745

Issue: With update_run_result now setting run.result synchronously inside fn() before the future resolves, these three lines overwrite run.result with the same value (or with None on failure, where it was already None).

Suggestion: Replace with bare await future. Not a bug — pure cleanup.

Finding 4 — [Low priority, cosmetic] Brittle line reference in comment

File: recce/mcp_server.py:1751-1752

Issue: Comment says "_tool_create_check (line 1832: ...)" but actual line is 1843. Line numbers rot.

Suggestion: Drop the line number; symbolic reference suffices.


Tests verification

  • pytest tests/apis/test_run_func.py tests/test_mcp_server.py146 passed, 0 failed
  • flake8 recce/apis/run_func.py recce/mcp_server.py → clean

Verdict

Approve.

Every suspicious change has a defensible justification. The race-condition fix is correctly implemented; the auto-approve flag covers all control-flow paths; the cloud-exception placement matches the documented rule. No CRITICAL or HIGH-severity issues. Findings 1–4 are LOW-priority polish that can land in a follow-up.

@iamcxa iamcxa merged commit 072b313 into main Apr 28, 2026
22 checks passed
iamcxa added a commit that referenced this pull request Apr 28, 2026
mcp_server.py — restructure _tool_run_check auto-approve:

- Move update_check_by_id OUTSIDE the metadata-branch try/except
  RecceException (mirrors _tool_create_check structure). Cloud mode's
  CheckDAO re-raises RecceCloudException, which is NOT a RecceException
  subclass — leaving the call inside the wrapper would let cloud failures
  escape unwrapped, breaking the consistent ValueError contract.
- Add `await loop.run_in_executor(None, export_persistent_state)` after
  auto-approve so local-mode is_checked changes survive process exit.
  Closes the persistence asymmetry vs _tool_create_check that was
  miscalled "matches policy" in the original comment.
- Reframe the auto-approve comment block: drop the consumer-side
  "preset checks / ⏳ Unapproved" narrative (cross-repo concept bleed),
  add `See: DRC-3307` for PM-decision attribution, and explicitly
  document that an empty result IS valid evidence for metadata-only
  types (zero-change confirmation).
- Add `logger.info(f"Auto-approved check {check_id} (triggered_by=...)")`
  so server logs preserve attribution even when the cloud audit-layer
  ContextVar isn't populated (pre-existing scope, but worth surfacing
  now that run_check also auto-approves).

run_func.py — preserve CANCELLED sentinel + invert store order:

- update_run_result now writes run.status BEFORE run.result/run.error
  so async readers in the loop thread never observe a "result-present
  + status-RUNNING" intermediate.
- Both success and failure paths skip the status overwrite when
  run.status == CANCELLED. Previously only the failure path checked
  this; the success path unconditionally overwrote CANCELLED with
  FINISHED, silently regressing cancel_run from another thread.
- Docstring rewritten: declares the cross-thread store-ordering
  invariant explicitly (CPython GIL atomic per attribute, but multi-
  store sequences are not serialized with the loop). Added DRC-3307
  historical context with a sunset condition tied to the consumer-side
  defensive guards in recce-cloud-infra `derive_check_run_status` —
  safe to remove this sync-call requirement once those guards go.

Coverage (87.5% patch -> 100%) + tighter regression tags:

- New negative-path test
  `test_tool_run_check_metadata_branch_recce_exception_no_auto_approve`:
  asserts update_check_by_id and export_persistent_state are NOT called
  when _tool_lineage_diff raises RecceException. Closes the metadata
  branch coverage gap (codecov-flagged 2 missing lines).
- New `test_cancel_sentinel_preserved_on_success`: simulates
  cancel_run setting CANCELLED before the executor finishes, asserts
  update_run_result success path does NOT overwrite. Exercises the
  newly-added `if run.status != RunStatus.CANCELLED` branch on the
  success path.
- New `test_repr_fallback_when_str_error_is_none_string`: verifies
  `repr(error)` fallback when `str(error) == "None"` records useful
  failed_reason. Closes the run_func.py coverage gap (codecov-flagged
  2 missing lines).
- Tighten the three "DRC-3307" tags in test_mcp_server.py to
  "Regression: ... (DRC-3307 root cause)" so the meaning survives
  Linear-archive of the ticket.
- Reframe the "no sleep needed" test comment as a forward-looking
  regression assertion ("if this fails, update_run_result is being
  scheduled async again").

Mock additions to existing tests:
- All 5 _tool_run_check call sites in tests now mock
  `recce.apis.check_func.export_persistent_state`. Without this mock,
  the new persistence call hits a real default_context() and fails
  with AttributeError (no state_loader on None).

Tests: 146 passing (was 115; +31 includes prior 28 unrelated suite
additions and 3 new for this PR).

DRC-3307

Signed-off-by: Kent <iamcxa@gmail.com>
iamcxa added a commit that referenced this pull request Apr 28, 2026
The narrow except RecceException pattern silently misses cloud-mode
failures because RecceCloudException inherits from Exception, not from
RecceException. Documenting the rule + two acceptable fixes so future
DAO-write additions catch this at review time.

DRC-3307

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
@iamcxa iamcxa deleted the feature/drc-3307-fix-checklist-sync-and-suggested-actions-inconsistency branch April 28, 2026 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants