Skip to content

feat(mcp): add analyze_model tool for single-env structural analysis#1379

Merged
wcchang1115 merged 6 commits into
mainfrom
feature/drc-3408-design-and-implement-single-env-ast-analysis-in-recce-for
May 22, 2026
Merged

feat(mcp): add analyze_model tool for single-env structural analysis#1379
wcchang1115 merged 6 commits into
mainfrom
feature/drc-3408-design-and-implement-single-env-ast-analysis-in-recce-for

Conversation

@wcchang1115
Copy link
Copy Markdown
Collaborator

@wcchang1115 wcchang1115 commented May 18, 2026

PR checklist

  • Ensure you have added or ran the appropriate tests for your PR.
  • DCO signed

What type of PR is this?

feat (new MCP tool)

What this PR does / why we need it:

Adds a new MCP tool analyze_model(model_id) that parses a dbt model's compiled SQL via sqlglot and returns:

  1. Structural evidence — refs, projections (with source columns, aggregate/derived flags), filters, joins (type + condition), group_by, having, order_by, aggregations, case expressions, distinct/subquery/CTE flags.
  2. Downstream column impact — which models and columns depend on this one, reusing the existing CLL parent/child map.

This gives agents (e.g. /recce-verify, DRC-3404) evidence they can't cheaply produce from raw SQL text alone. It is single-environment first-class — no target-base/, no target.HEAD/, no git history required.

The agent owns interpretation. Recce only exposes structure and dependencies; whether a change is "risky" is the agent's call, fed by analyze_model + the agent's own git diff.

Design note posted as a comment on DRC-3408 prior to implementation.

Which issue(s) this PR fixes:

Fixes DRC-3408

Special notes for your reviewer:

  • Scope adjustment from issue framing: original asked for AST diff between HEAD and working tree. After design discussion, we exposed structural analysis of current SQL only — see Linear comment for reasoning. AST diff and breaking/non-breaking classification are explicitly out of scope.
  • New code is intentionally pure-Python in recce/util/ast_analyze.py. The MCP handler is thin and mirrors _tool_get_cll.
  • Cloud-backend path is not wired for v1; cloud sessions don't proxy this tool. Local-only by design.
  • One non-obvious lesson surfaced during integration testing: my initial collect_downstream walked CllColumn.depends_on, which is on the type but not populated by build_full_cll_map. The real data lives in CllData.child_map + CllColumn.table_id. Unit tests passed in isolation because I authored the fixtures wrong; real-data smoke caught it. The committed version uses the correct fields and ships with tests that mirror the real shape.

Test plan

  • 39 unit tests for analyze_sql() and helpers in tests/util/test_ast_analyze.py (one per SqlStructure field)
  • 4 MCP handler tests in tests/test_mcp_server.py (happy path, missing id, non-dbt, missing compiled SQL)
  • tests/test_mcp_e2e.py expected-tools list updated
  • Full suite: 1249 passed, 5 skipped, 0 failed
  • Manual smoke against integration_tests/dbt (jaffle_shop) after dbt run + dbt docs generate: model.jaffle_shop.stg_customers returns customers as downstream model with columns customer_id, first_name, last_name.

Does this PR introduce a user-facing change?:

```release-note
Added a new `analyze_model` MCP tool that returns the structural shape (refs, filters, joins, projections, aggregations, etc.) of a dbt model's compiled SQL plus its downstream column impact. Single-environment — does not require `target-base/`. Designed for agent workflows that need structured evidence about a model without performing data diffs.
```

wcchang1115 and others added 2 commits May 18, 2026 17:18
…(DRC-3408)

Exposes a new MCP tool `analyze_model(model_id)` that parses a dbt model's
compiled SQL via sqlglot and returns structured evidence (refs, projections,
filters, joins, group_by, having, order_by, aggregations, case_expressions,
distinct, has_subquery, has_cte) plus downstream column impact from the
existing CLL parent/child map.

The tool gives agents (e.g. /recce-verify) evidence they can't cheaply
produce from raw SQL text alone. It is single-environment first-class —
requires no target-base/ and no git history. The agent owns interpretation;
Recce only exposes structure and dependencies.

- recce/util/ast_analyze.py: pure-Python engine (analyze_sql,
  get_compiled_sql_from_manifest, collect_downstream)
- recce/mcp_server.py: Tool registration, dispatch, _tool_analyze_model
  handler (mirrors _tool_get_cll pattern)
- tests/util/test_ast_analyze.py: 39 unit tests covering every
  SqlStructure field plus both helpers
- tests/test_mcp_server.py: MCP handler happy path + 3 error cases
- tests/test_mcp_e2e.py: include analyze_model in expected tool list
- scripts/smoke_analyze_model.py: optional manual harness; auto-detects
  single-env mode when target-base/ is absent

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Wei-Chun, Chang <wcchang@infuseai.io>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Wei-Chun, Chang <wcchang@infuseai.io>
@wcchang1115 wcchang1115 changed the title feat(mcp): add analyze_model tool for single-env structural analysis (DRC-3408) feat(mcp): add analyze_model tool for single-env structural analysis May 18, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 18, 2026

Codecov Report

❌ Patch coverage is 99.08759% with 5 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
recce/util/ast_analyze.py 97.43% 4 Missing ⚠️
recce/mcp_server.py 95.23% 1 Missing ⚠️
Files with missing lines Coverage Δ
tests/test_mcp_e2e.py 100.00% <ø> (ø)
tests/test_mcp_server.py 99.86% <100.00%> (+<0.01%) ⬆️
tests/util/test_ast_analyze.py 100.00% <100.00%> (ø)
recce/mcp_server.py 91.27% <95.23%> (+0.08%) ⬆️
recce/util/ast_analyze.py 97.43% <97.43%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- Resolve dialect from manifest.metadata.adapter_type with fallback to
  adapter.type(); without this, BigQuery / Snowflake / etc. silently
  return unparseable=true for dialect-specific syntax.
- Exclude CTE alias names from refs so staging-style models don't leak
  internal CTE names as upstream tables.
- Walk UNION/INTERSECT/EXCEPT legs via _top_level_selects and merge
  projections/filters/joins/group_by/having/order_by/distinct across
  them; add is_set_operation flag so the agent can distinguish a
  set-operation model from a plain SELECT.
- Update analyze_model tool description with the new semantics
  (CTE filtering, set-op merging, 1-hop downstream).
- Add tests: 10 in test_ast_analyze.py (CTE exclusion, set ops, dialect
  forwarding) and 2 in test_mcp_server.py (dialect from metadata,
  fallback to adapter.type()).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Wei-Chun, Chang <wcchang@infuseai.io>
@even-wei
Copy link
Copy Markdown
Contributor

even-wei commented May 21, 2026

Code Review: PR #1379

SHA 192ac8be · Verdict GO · Incremental (since 7da45292)

All three findings from the prior review at 7da45292 are resolved.

Resolved (verified)

  1. BLOCKER — INTERSECT/EXCEPT silent-empty (recce/util/ast_analyze.py:112,184).
    isinstance(tree, exp.Union)isinstance(tree, exp.SetOperation) in both _top_level_selects and the is_set_operation flag. Confirmed by 9 new INTERSECT/EXCEPT/mixed tests in TestAnalyzeSqlSetOperations (all passing). Docstring updated to call out that Union/Intersect/Except each inherit from SetOperation independently.

  2. ISSUE — get_compiled_sql_from_manifest accepts any resource type (recce/util/ast_analyze.py:201-215).
    _ANALYZABLE_RESOURCE_TYPES = frozenset({"model", "seed", "snapshot"}) plus an explicit raise for anything else. Verified by test_rejects_non_analyzable_resource_types (covers test, analysis, operation, exposure) and test_tool_analyze_model_rejects_test_resource_type at the MCP-handler layer.

  3. NOTE — cloud-mode tools/list advertises unsupported tool (recce/mcp_server.py:701-738).
    Wrapped registration in if self.backend is None: with a comment explaining the rationale. Verified by test_analyze_model_advertised_in_local_mode_only, which lists tools against both a local-backend and cloud-backend server and asserts analyze_model appears only in the former.

  4. NOTE — CTE-name exclusion drops qualified refs (recce/util/ast_analyze.py:133).
    Filter is now not (t.name in cte_names and not t.db and not t.catalog) so two- and three-part qualified names (raw.orders, analytics.public.orders) survive a CTE collision. Locked in by test_qualified_table_not_dropped_by_unqualified_cte_collision and test_three_part_qualified_ref_not_dropped_by_cte_collision.

  5. NOTE — cold-cache cost (recce/mcp_server.py:719-721).
    Tool description now states: "the first call per session builds the full column-level lineage map (potentially seconds on large projects); subsequent calls reuse the cached map and are fast."

Verification

  • uv run pytest tests/util/test_ast_analyze.py → 61 passed.
  • uv run pytest tests/test_mcp_server.py -k analyze_model → 8 passed.
  • uv run pytest tests/util/ tests/test_mcp_server.py → 391 passed.
  • uv run ruff check → clean. black --check, isort --check → clean.

Ship it.

Copy link
Copy Markdown
Contributor

@even-wei even-wei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posted detailed review as PR comment. Verdict: NO-GO. One BLOCKER: is_set_operation and per-leg merging only cover UNION — INTERSECT and EXCEPT silently produce empty projections/filters, directly contradicting the tool description and the test class invariant. One ISSUE: get_compiled_sql_from_manifest doesn't validate resource_type. See the comment for evidence and notes.

even-wei and others added 3 commits May 21, 2026 14:11
…d gate

- ast_analyze: cover INTERSECT/EXCEPT by switching from exp.Union to
  exp.SetOperation in _top_level_selects and the is_set_operation flag.
  UNION/INTERSECT/EXCEPT inherit from SetOperation independently, so the
  previous Union-only check silently produced empty projections/filters
  for the other two set-op kinds despite the tool description promising
  full support.
- ast_analyze: reject non-{model,seed,snapshot} nodes in
  get_compiled_sql_from_manifest so analyze_model can no longer return
  the compiled SQL of a dbt test, analysis, operation, or exposure.
- ast_analyze: scope CTE alias exclusion to unqualified tables — a CTE
  named `orders` no longer drops the unrelated qualified ref `raw.orders`
  or `analytics.public.orders`.
- mcp_server: gate analyze_model registration in list_tools with
  `self.backend is None`. RecceMCPCloudBackend.call_tool does not
  dispatch analyze_model, so cloud sessions used to advertise a tool
  that would error on invocation.
- mcp_server: document the first-call cost of analyze_model (full CLL
  map build, cached thereafter) in the tool description.
- tests: add INTERSECT/EXCEPT coverage mirroring UNION (is_set_operation,
  ref/projection/filter merging, chain recursion, distinct flag, mixed
  UNION+INTERSECT), parametrized resource_type rejection
  (test/analysis/operation/exposure), three-part qualified ref vs CTE
  collision, MCP-level rejection of test-resource_type nodes, and a
  list_tools gate test confirming analyze_model is hidden in cloud mode.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Wei-Chun, Chang <wcchang@infuseai.io>
@wcchang1115 wcchang1115 requested a review from even-wei May 22, 2026 08:00
Copy link
Copy Markdown
Contributor

@even-wei even-wei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incremental review at 192ac8b. All three findings from the prior review at 7da4529 are resolved (set-operation parent class, resource_type allow-list, cloud-mode advertisement gate, qualified-name CTE handling, performance note). See pinned comment for verification. GO.

@wcchang1115 wcchang1115 merged commit 6ce365f into main May 22, 2026
22 checks passed
@wcchang1115 wcchang1115 deleted the feature/drc-3408-design-and-implement-single-env-ast-analysis-in-recce-for branch May 22, 2026 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants