feat(mcp): add analyze_model tool for single-env structural analysis#1379
Conversation
…(DRC-3408) Exposes a new MCP tool `analyze_model(model_id)` that parses a dbt model's compiled SQL via sqlglot and returns structured evidence (refs, projections, filters, joins, group_by, having, order_by, aggregations, case_expressions, distinct, has_subquery, has_cte) plus downstream column impact from the existing CLL parent/child map. The tool gives agents (e.g. /recce-verify) evidence they can't cheaply produce from raw SQL text alone. It is single-environment first-class — requires no target-base/ and no git history. The agent owns interpretation; Recce only exposes structure and dependencies. - recce/util/ast_analyze.py: pure-Python engine (analyze_sql, get_compiled_sql_from_manifest, collect_downstream) - recce/mcp_server.py: Tool registration, dispatch, _tool_analyze_model handler (mirrors _tool_get_cll pattern) - tests/util/test_ast_analyze.py: 39 unit tests covering every SqlStructure field plus both helpers - tests/test_mcp_server.py: MCP handler happy path + 3 error cases - tests/test_mcp_e2e.py: include analyze_model in expected tool list - scripts/smoke_analyze_model.py: optional manual harness; auto-detects single-env mode when target-base/ is absent Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Wei-Chun, Chang <wcchang@infuseai.io>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Wei-Chun, Chang <wcchang@infuseai.io>
Codecov Report❌ Patch coverage is
🚀 New features to boost your workflow:
|
- Resolve dialect from manifest.metadata.adapter_type with fallback to adapter.type(); without this, BigQuery / Snowflake / etc. silently return unparseable=true for dialect-specific syntax. - Exclude CTE alias names from refs so staging-style models don't leak internal CTE names as upstream tables. - Walk UNION/INTERSECT/EXCEPT legs via _top_level_selects and merge projections/filters/joins/group_by/having/order_by/distinct across them; add is_set_operation flag so the agent can distinguish a set-operation model from a plain SELECT. - Update analyze_model tool description with the new semantics (CTE filtering, set-op merging, 1-hop downstream). - Add tests: 10 in test_ast_analyze.py (CTE exclusion, set ops, dialect forwarding) and 2 in test_mcp_server.py (dialect from metadata, fallback to adapter.type()). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Wei-Chun, Chang <wcchang@infuseai.io>
Code Review: PR #1379SHA All three findings from the prior review at Resolved (verified)
Verification
Ship it. |
even-wei
left a comment
There was a problem hiding this comment.
Posted detailed review as PR comment. Verdict: NO-GO. One BLOCKER: is_set_operation and per-leg merging only cover UNION — INTERSECT and EXCEPT silently produce empty projections/filters, directly contradicting the tool description and the test class invariant. One ISSUE: get_compiled_sql_from_manifest doesn't validate resource_type. See the comment for evidence and notes.
…-env-ast-analysis-in-recce-for
…d gate
- ast_analyze: cover INTERSECT/EXCEPT by switching from exp.Union to
exp.SetOperation in _top_level_selects and the is_set_operation flag.
UNION/INTERSECT/EXCEPT inherit from SetOperation independently, so the
previous Union-only check silently produced empty projections/filters
for the other two set-op kinds despite the tool description promising
full support.
- ast_analyze: reject non-{model,seed,snapshot} nodes in
get_compiled_sql_from_manifest so analyze_model can no longer return
the compiled SQL of a dbt test, analysis, operation, or exposure.
- ast_analyze: scope CTE alias exclusion to unqualified tables — a CTE
named `orders` no longer drops the unrelated qualified ref `raw.orders`
or `analytics.public.orders`.
- mcp_server: gate analyze_model registration in list_tools with
`self.backend is None`. RecceMCPCloudBackend.call_tool does not
dispatch analyze_model, so cloud sessions used to advertise a tool
that would error on invocation.
- mcp_server: document the first-call cost of analyze_model (full CLL
map build, cached thereafter) in the tool description.
- tests: add INTERSECT/EXCEPT coverage mirroring UNION (is_set_operation,
ref/projection/filter merging, chain recursion, distinct flag, mixed
UNION+INTERSECT), parametrized resource_type rejection
(test/analysis/operation/exposure), three-part qualified ref vs CTE
collision, MCP-level rejection of test-resource_type nodes, and a
list_tools gate test confirming analyze_model is hidden in cloud mode.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Wei-Chun, Chang <wcchang@infuseai.io>
…-env-ast-analysis-in-recce-for
PR checklist
What type of PR is this?
feat (new MCP tool)
What this PR does / why we need it:
Adds a new MCP tool
analyze_model(model_id)that parses a dbt model's compiled SQL via sqlglot and returns:This gives agents (e.g.
/recce-verify, DRC-3404) evidence they can't cheaply produce from raw SQL text alone. It is single-environment first-class — notarget-base/, notarget.HEAD/, no git history required.The agent owns interpretation. Recce only exposes structure and dependencies; whether a change is "risky" is the agent's call, fed by
analyze_model+ the agent's owngit diff.Design note posted as a comment on DRC-3408 prior to implementation.
Which issue(s) this PR fixes:
Fixes DRC-3408
Special notes for your reviewer:
recce/util/ast_analyze.py. The MCP handler is thin and mirrors_tool_get_cll.collect_downstreamwalkedCllColumn.depends_on, which is on the type but not populated bybuild_full_cll_map. The real data lives inCllData.child_map+CllColumn.table_id. Unit tests passed in isolation because I authored the fixtures wrong; real-data smoke caught it. The committed version uses the correct fields and ships with tests that mirror the real shape.Test plan
analyze_sql()and helpers intests/util/test_ast_analyze.py(one perSqlStructurefield)tests/test_mcp_server.py(happy path, missing id, non-dbt, missing compiled SQL)tests/test_mcp_e2e.pyexpected-tools list updatedintegration_tests/dbt(jaffle_shop) afterdbt run+dbt docs generate:model.jaffle_shop.stg_customersreturnscustomersas downstream model with columnscustomer_id,first_name,last_name.Does this PR introduce a user-facing change?:
```release-note
Added a new `analyze_model` MCP tool that returns the structural shape (refs, filters, joins, projections, aggregations, etc.) of a dbt model's compiled SQL plus its downstream column impact. Single-environment — does not require `target-base/`. Designed for agent workflows that need structured evidence about a model without performing data diffs.
```