feat(mcp): add analyze_model tool for single-env structural analysis by wcchang1115 · Pull Request #1379 · DataRecce/recce

wcchang1115 · 2026-05-18T09:19:01Z

PR checklist

Ensure you have added or ran the appropriate tests for your PR.
DCO signed

What type of PR is this?

feat (new MCP tool)

What this PR does / why we need it:

Adds a new MCP tool analyze_model(model_id) that parses a dbt model's compiled SQL via sqlglot and returns:

Structural evidence — refs, projections (with source columns, aggregate/derived flags), filters, joins (type + condition), group_by, having, order_by, aggregations, case expressions, distinct/subquery/CTE flags.
Downstream column impact — which models and columns depend on this one, reusing the existing CLL parent/child map.

This gives agents (e.g. /recce-verify, DRC-3404) evidence they can't cheaply produce from raw SQL text alone. It is single-environment first-class — no target-base/, no target.HEAD/, no git history required.

The agent owns interpretation. Recce only exposes structure and dependencies; whether a change is "risky" is the agent's call, fed by analyze_model + the agent's own git diff.

Design note posted as a comment on DRC-3408 prior to implementation.

Which issue(s) this PR fixes:

Fixes DRC-3408

Special notes for your reviewer:

Scope adjustment from issue framing: original asked for AST diff between HEAD and working tree. After design discussion, we exposed structural analysis of current SQL only — see Linear comment for reasoning. AST diff and breaking/non-breaking classification are explicitly out of scope.
New code is intentionally pure-Python in recce/util/ast_analyze.py. The MCP handler is thin and mirrors _tool_get_cll.
Cloud-backend path is not wired for v1; cloud sessions don't proxy this tool. Local-only by design.
One non-obvious lesson surfaced during integration testing: my initial collect_downstream walked CllColumn.depends_on, which is on the type but not populated by build_full_cll_map. The real data lives in CllData.child_map + CllColumn.table_id. Unit tests passed in isolation because I authored the fixtures wrong; real-data smoke caught it. The committed version uses the correct fields and ships with tests that mirror the real shape.

Test plan

39 unit tests for analyze_sql() and helpers in tests/util/test_ast_analyze.py (one per SqlStructure field)
4 MCP handler tests in tests/test_mcp_server.py (happy path, missing id, non-dbt, missing compiled SQL)
tests/test_mcp_e2e.py expected-tools list updated
Full suite: 1249 passed, 5 skipped, 0 failed
Manual smoke against integration_tests/dbt (jaffle_shop) after dbt run + dbt docs generate: model.jaffle_shop.stg_customers returns customers as downstream model with columns customer_id, first_name, last_name.

Does this PR introduce a user-facing change?:

```release-note
Added a new `analyze_model` MCP tool that returns the structural shape (refs, filters, joins, projections, aggregations, etc.) of a dbt model's compiled SQL plus its downstream column impact. Single-environment — does not require `target-base/`. Designed for agent workflows that need structured evidence about a model without performing data diffs.
```

…(DRC-3408) Exposes a new MCP tool `analyze_model(model_id)` that parses a dbt model's compiled SQL via sqlglot and returns structured evidence (refs, projections, filters, joins, group_by, having, order_by, aggregations, case_expressions, distinct, has_subquery, has_cte) plus downstream column impact from the existing CLL parent/child map. The tool gives agents (e.g. /recce-verify) evidence they can't cheaply produce from raw SQL text alone. It is single-environment first-class — requires no target-base/ and no git history. The agent owns interpretation; Recce only exposes structure and dependencies. - recce/util/ast_analyze.py: pure-Python engine (analyze_sql, get_compiled_sql_from_manifest, collect_downstream) - recce/mcp_server.py: Tool registration, dispatch, _tool_analyze_model handler (mirrors _tool_get_cll pattern) - tests/util/test_ast_analyze.py: 39 unit tests covering every SqlStructure field plus both helpers - tests/test_mcp_server.py: MCP handler happy path + 3 error cases - tests/test_mcp_e2e.py: include analyze_model in expected tool list - scripts/smoke_analyze_model.py: optional manual harness; auto-detects single-env mode when target-base/ is absent Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Wei-Chun, Chang <wcchang@infuseai.io>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Wei-Chun, Chang <wcchang@infuseai.io>

codecov · 2026-05-18T09:29:53Z

Codecov Report

❌ Patch coverage is 99.08759% with 5 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
recce/util/ast_analyze.py	97.43%	4 Missing ⚠️
recce/mcp_server.py	95.23%	1 Missing ⚠️

Files with missing lines	Coverage Δ
tests/test_mcp_e2e.py	`100.00% <ø> (ø)`
tests/test_mcp_server.py	`99.86% <100.00%> (+<0.01%)`	⬆️
tests/util/test_ast_analyze.py	`100.00% <100.00%> (ø)`
recce/mcp_server.py	`91.27% <95.23%> (+0.08%)`	⬆️
recce/util/ast_analyze.py	`97.43% <97.43%> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- Resolve dialect from manifest.metadata.adapter_type with fallback to adapter.type(); without this, BigQuery / Snowflake / etc. silently return unparseable=true for dialect-specific syntax. - Exclude CTE alias names from refs so staging-style models don't leak internal CTE names as upstream tables. - Walk UNION/INTERSECT/EXCEPT legs via _top_level_selects and merge projections/filters/joins/group_by/having/order_by/distinct across them; add is_set_operation flag so the agent can distinguish a set-operation model from a plain SELECT. - Update analyze_model tool description with the new semantics (CTE filtering, set-op merging, 1-hop downstream). - Add tests: 10 in test_ast_analyze.py (CTE exclusion, set ops, dialect forwarding) and 2 in test_mcp_server.py (dialect from metadata, fallback to adapter.type()). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Wei-Chun, Chang <wcchang@infuseai.io>

even-wei · 2026-05-21T06:05:40Z

Code Review: PR #1379

SHA 192ac8be · Verdict GO · Incremental (since 7da45292)

All three findings from the prior review at 7da45292 are resolved.

Resolved (verified)

BLOCKER — INTERSECT/EXCEPT silent-empty (recce/util/ast_analyze.py:112,184).
isinstance(tree, exp.Union) → isinstance(tree, exp.SetOperation) in both _top_level_selects and the is_set_operation flag. Confirmed by 9 new INTERSECT/EXCEPT/mixed tests in TestAnalyzeSqlSetOperations (all passing). Docstring updated to call out that Union/Intersect/Except each inherit from SetOperation independently.
ISSUE — get_compiled_sql_from_manifest accepts any resource type (recce/util/ast_analyze.py:201-215).
_ANALYZABLE_RESOURCE_TYPES = frozenset({"model", "seed", "snapshot"}) plus an explicit raise for anything else. Verified by test_rejects_non_analyzable_resource_types (covers test, analysis, operation, exposure) and test_tool_analyze_model_rejects_test_resource_type at the MCP-handler layer.
NOTE — cloud-mode tools/list advertises unsupported tool (recce/mcp_server.py:701-738).
Wrapped registration in if self.backend is None: with a comment explaining the rationale. Verified by test_analyze_model_advertised_in_local_mode_only, which lists tools against both a local-backend and cloud-backend server and asserts analyze_model appears only in the former.
NOTE — CTE-name exclusion drops qualified refs (recce/util/ast_analyze.py:133).
Filter is now not (t.name in cte_names and not t.db and not t.catalog) so two- and three-part qualified names (raw.orders, analytics.public.orders) survive a CTE collision. Locked in by test_qualified_table_not_dropped_by_unqualified_cte_collision and test_three_part_qualified_ref_not_dropped_by_cte_collision.
NOTE — cold-cache cost (recce/mcp_server.py:719-721).
Tool description now states: "the first call per session builds the full column-level lineage map (potentially seconds on large projects); subsequent calls reuse the cached map and are fast."

Verification

uv run pytest tests/util/test_ast_analyze.py → 61 passed.
uv run pytest tests/test_mcp_server.py -k analyze_model → 8 passed.
uv run pytest tests/util/ tests/test_mcp_server.py → 391 passed.
uv run ruff check → clean. black --check, isort --check → clean.

Ship it.

even-wei

Posted detailed review as PR comment. Verdict: NO-GO. One BLOCKER: is_set_operation and per-leg merging only cover UNION — INTERSECT and EXCEPT silently produce empty projections/filters, directly contradicting the tool description and the test class invariant. One ISSUE: get_compiled_sql_from_manifest doesn't validate resource_type. See the comment for evidence and notes.

…-env-ast-analysis-in-recce-for

…d gate - ast_analyze: cover INTERSECT/EXCEPT by switching from exp.Union to exp.SetOperation in _top_level_selects and the is_set_operation flag. UNION/INTERSECT/EXCEPT inherit from SetOperation independently, so the previous Union-only check silently produced empty projections/filters for the other two set-op kinds despite the tool description promising full support. - ast_analyze: reject non-{model,seed,snapshot} nodes in get_compiled_sql_from_manifest so analyze_model can no longer return the compiled SQL of a dbt test, analysis, operation, or exposure. - ast_analyze: scope CTE alias exclusion to unqualified tables — a CTE named `orders` no longer drops the unrelated qualified ref `raw.orders` or `analytics.public.orders`. - mcp_server: gate analyze_model registration in list_tools with `self.backend is None`. RecceMCPCloudBackend.call_tool does not dispatch analyze_model, so cloud sessions used to advertise a tool that would error on invocation. - mcp_server: document the first-call cost of analyze_model (full CLL map build, cached thereafter) in the tool description. - tests: add INTERSECT/EXCEPT coverage mirroring UNION (is_set_operation, ref/projection/filter merging, chain recursion, distinct flag, mixed UNION+INTERSECT), parametrized resource_type rejection (test/analysis/operation/exposure), three-part qualified ref vs CTE collision, MCP-level rejection of test-resource_type nodes, and a list_tools gate test confirming analyze_model is hidden in cloud mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Wei-Chun, Chang <wcchang@infuseai.io>

…-env-ast-analysis-in-recce-for

even-wei

Incremental review at 192ac8b. All three findings from the prior review at 7da4529 are resolved (set-operation parent class, resource_type allow-list, cloud-mode advertisement gate, qualified-name CTE handling, performance note). See pinned comment for verification. GO.

wcchang1115 and others added 2 commits May 18, 2026 17:18

chore: drop smoke_analyze_model.py — manual dev tool, not for CI

c92318d

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Wei-Chun, Chang <wcchang@infuseai.io>

wcchang1115 changed the title ~~feat(mcp): add analyze_model tool for single-env structural analysis (DRC-3408)~~ feat(mcp): add analyze_model tool for single-env structural analysis May 18, 2026

wcchang1115 marked this pull request as ready for review May 19, 2026 03:17

wcchang1115 requested a review from even-wei May 19, 2026 03:17

even-wei mentioned this pull request May 21, 2026

feat(recce): /recce-verify skill v1 for single-env dev-loop verification DataRecce/recce-claude-plugin#35

Merged

6 tasks

even-wei requested changes May 21, 2026

View reviewed changes

even-wei and others added 3 commits May 21, 2026 14:11

Merge branch 'main' into feature/drc-3408-design-and-implement-single…

bd9efb5

…-env-ast-analysis-in-recce-for

Merge branch 'main' into feature/drc-3408-design-and-implement-single…

192ac8b

…-env-ast-analysis-in-recce-for

wcchang1115 requested a review from even-wei May 22, 2026 08:00

even-wei approved these changes May 22, 2026

View reviewed changes

wcchang1115 merged commit 6ce365f into main May 22, 2026
22 checks passed

wcchang1115 deleted the feature/drc-3408-design-and-implement-single-env-ast-analysis-in-recce-for branch May 22, 2026 08:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mcp): add analyze_model tool for single-env structural analysis#1379

feat(mcp): add analyze_model tool for single-env structural analysis#1379
wcchang1115 merged 6 commits into
mainfrom
feature/drc-3408-design-and-implement-single-env-ast-analysis-in-recce-for

wcchang1115 commented May 18, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 18, 2026 •

edited

Loading

Uh oh!

even-wei commented May 21, 2026 •

edited

Loading

Uh oh!

even-wei left a comment

Uh oh!

even-wei left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wcchang1115 commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

even-wei commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: PR #1379

Resolved (verified)

Verification

Uh oh!

even-wei left a comment

Choose a reason for hiding this comment

Uh oh!

even-wei left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wcchang1115 commented May 18, 2026 •

edited

Loading

codecov Bot commented May 18, 2026 •

edited

Loading

even-wei commented May 21, 2026 •

edited

Loading