fix(valuediff): normalise column case before adapter.quote() for Snowflake (DRC-3464) by iamcxa · Pull Request #1388 · DataRecce/recce

iamcxa · 2026-05-21T09:45:59Z

Summary

value_diff was failing on Snowflake with SQL compilation error: invalid identifier '"customer_id"'. The check params carry lowercase column names (from the dbt manifest convention), but Snowflake folds unquoted DDL identifiers to uppercase (CUSTOMER_ID). Calling adapter.quote("customer_id") produced the case-sensitive '"customer_id"' identifier, which Snowflake cannot resolve against its physical uppercase column.

Root cause

Two failure sites in recce/tasks/valuediff.py, both invoked from execute():

_verify_primary_key composite path (when primary_key is a list) — builds inline Jinja that calls adapter.quote(col) over the raw user-supplied list.
_query_value_diff — calls adapter.quote(field) / adapter.quote(column_to_compare) on the surrogate-key and column-comparison clauses.

The scalar primary_key path of _verify_primary_key uses adapter.dispatch('test_unique', 'dbt') which renders {{ column_name }} verbatim (unquoted) and so passes Snowflake case-insensitively. Same pattern in ValueDiffDetailTask. Other check tasks (profile_diff, histogram_diff, top_k_diff, row_count_diff, query_diff, schema_diff) are unaffected — profile_diff accidentally avoids the bug by sourcing column names from get_columns() (uppercase on Snowflake) rather than from check params.

Fix

Add a ValueDiffMixin._build_column_case_lookup() helper that calls dbt_adapter.get_columns() (base + current relations) and returns a {lower(physical_name): physical_name} lookup. _normalise_identifier() maps a user-supplied lowercase identifier to its physical catalog case, falling back to the original string when the column is unknown (preserves current behaviour for unknown columns instead of masking errors).

In ValueDiffTask.execute() and ValueDiffDetailTask.execute(), the lookup is built FIRST and the primary_key (and columns list when present) is normalised BEFORE _verify_primary_key() is called — so both _verify_primary_key's composite path AND _query_value_diff's adapter.quote() calls receive the physical catalog case. Mirrors the get_columns() pattern already used by profile_diff (recce/tasks/profile.py:222).

Tests

recce/tests/tasks/test_valuediff.py:

TestBuildColumnCaseLookup — 4 unit tests for the new helper (case-insensitive lookup, base+current relation merge, unknown-identifier passthrough, exception fallback).
test_value_diff_snowflake_uppercase_columns + test_value_diff_detail_snowflake_uppercase_columns — scalar primary_key on Snowflake: mock get_columns() returns uppercase columns; capture executed SQL; assert "CUSTOMER_ID" present, "customer_id" absent.
test_value_diff_snowflake_composite_primary_key + test_value_diff_detail_snowflake_composite_primary_key — composite primary_key on Snowflake; assert the SQL executed by _verify_primary_key's composite path contains uppercase quoted identifiers.

Pre-fix rigor (rebuild verified independently):

Revert valuediff.py to before either fix commit → all 4 Snowflake regression tests FAIL with verbatim AssertionError: '"CUSTOMER_ID"' in 'with a_query as (select "customer_id", ...)'.
Restore the fix → all 21 valuediff tests PASS in 4.64s.

Full tasks suite (pytest tests/tasks/ -q): 116 passed, 1 pre-existing portalocker warning (unrelated).

Audit trail

Investigated through DataRecce's internal debug-flywheel workflow as case 019. Stages: reproduce → discover → hypothesize → verify → fix-and-harden (cycle 1 after codex review caught the composite path gap on the first fix attempt).

Linear: DRC-3464

…flake On Snowflake with default quoting config, physical columns are stored uppercase (e.g. CUSTOMER_ID). user-supplied check params follow the dbt manifest convention (lowercase, e.g. customer_id). Passing them directly to adapter.quote() produces '"customer_id"' — a case-sensitive identifier that Snowflake cannot resolve, raising: SQL compilation error: invalid identifier '"customer_id"' Fix: add ValueDiffMixin._build_column_case_lookup() that calls get_columns() and builds a {lower(physical_name): physical_name} lookup. Before any adapter.quote() call in ValueDiffTask._query_value_diff and ValueDiffDetailTask._query_value_diff, each user-supplied identifier is mapped through the lookup so adapter.quote() receives the physical name ("CUSTOMER_ID") and produces a valid quoted identifier. Pass-through for identifiers not found in the catalog preserves existing behaviour for unknown columns rather than masking errors. Mirrors the pattern profile_diff already uses via get_columns(). Regression tests: test_value_diff_snowflake_uppercase_columns and test_value_diff_detail_snowflake_uppercase_columns patch get_columns() to return uppercase names (simulating Snowflake) and assert the generated SQL contains '"CUSTOMER_ID"' and NOT '"customer_id"'. Closes DRC-3464. Signed-off-by: Kent <kentchen@reccehq.com> Signed-off-by: Kent <iamcxa@gmail.com>

…y_key (DRC-3464 cycle 1) When primary_key is a list, execute() now builds the column-case lookup and normalises keys BEFORE calling _verify_primary_key, so the composite path's adapter.quote() calls receive physical catalog names (e.g. CUSTOMER_ID on Snowflake) instead of user-supplied lowercase identifiers. Also adds regression tests test_value_diff_snowflake_composite_primary_key and test_value_diff_detail_snowflake_composite_primary_key that patch dbt_adapter.adapter.execute (the low-level adapter used by _verify_primary_key) to capture and assert the generated SQL uses uppercase quoted identifiers. PRE-FIX: both tests FAIL (sql contains lowercase quoted identifiers) POST-FIX: both tests PASS (sql contains uppercase quoted identifiers) Signed-off-by: Kent <iamcxa@gmail.com>

Copilot

Pull request overview

Fixes Snowflake value_diff failures caused by quoting lowercase identifiers (from dbt manifest conventions) against physically uppercase columns, by normalizing user-supplied column identifiers to catalog/physical case before adapter.quote() is applied.

Changes:

Added a catalog-driven {lower(physical): physical} lookup helper and identifier normalization in ValueDiffTask and ValueDiffDetailTask.
Normalized primary_key before _verify_primary_key() to fix the composite primary-key quoting path.
Added Snowflake-focused regression tests that patch get_columns() to return uppercase columns and assert generated SQL uses uppercase quoted identifiers.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`recce/tasks/valuediff.py`	Adds catalog case lookup + normalizes PK/columns before quoting to avoid Snowflake invalid identifier errors.
`tests/tasks/test_valuediff.py`	Adds unit tests for the lookup/normalization helpers and Snowflake regression tests capturing executed SQL.

Comments suppressed due to low confidence (1)

recce/tasks/valuediff.py:431

As in ValueDiffTask._query_value_diff, case_lookup is built unconditionally (two get_columns() macro executions) even when columns is None and the function already calls get_columns() again to compute common columns. Since value_diff can run frequently and get_columns() may hit INFORMATION_SCHEMA, consider avoiding redundant catalog calls by building the lookup only when normalising user-supplied identifiers, or reusing the already-fetched column lists to construct the lookup.

        # Normalise user-supplied identifiers to physical catalog names before quoting.
        # On Snowflake (default quoting config), physical columns are stored uppercase;
        # adapter.quote("customer_id") → '"customer_id"' which Snowflake cannot resolve.
        case_lookup = self._build_column_case_lookup(dbt_adapter, model)

        if columns is None or len(columns) == 0:
            base_columns = [column.column for column in dbt_adapter.get_columns(model, base=True)]
            curr_columns = [column.column for column in dbt_adapter.get_columns(model, base=False)]
            columns = [column for column in base_columns if column in curr_columns]
        else:
            columns = [self._normalise_identifier(c, case_lookup) for c in columns]

codecov · 2026-05-21T09:53:31Z

Codecov Report

❌ Patch coverage is 98.51852% with 2 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
recce/tasks/valuediff.py	93.93%	2 Missing ⚠️

Files with missing lines	Coverage Δ
tests/tasks/test_valuediff.py	`99.64% <100.00%> (+0.19%)`	⬆️
recce/tasks/valuediff.py	`81.93% <93.93%> (+1.73%)`	⬆️

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…umns() (DRC-3464 PR #1388 review) Signed-off-by: Kent <iamcxa@gmail.com>

wcchang1115

Code Review: PR #1388

SHA 543dbf5d · Verdict GO

Notes

recce/tasks/valuediff.py:142-160, 417-434 — _query_value_diff re-runs
primary_key normalisation that execute() already performed (lines 336-340,
541-545). Idempotent because lookup[upper().lower()] == upper, so not a
bug, but the duplication makes the data flow harder to follow.
Evidence: every callable path goes through execute() first (grep confirms
no other caller of _query_value_diff); the case_lookup= parameter is
the only reason _query_value_diff could safely skip re-normalisation,
and it already accepts that parameter. Pass F.
recce/tasks/valuediff.py:48 — except Exception: pass in
_build_column_case_lookup silently swallows every failure (connection
errors, missing relations, permission denials). On a real outage the
lookup is empty, primary_key/columns pass through unchanged, and the
user gets the original "invalid identifier" Snowflake error this fix
was meant to prevent — with no log line explaining why the fix was
bypassed. Consider logging at debug/warning level so the bypass is
diagnosable. Pass D.
recce/tasks/valuediff.py:142-149, 417-424 — When columns is None,
the function calls get_columns() again (twice) even though
_build_column_case_lookup just called it twice. That's 4
get_columns() round-trips per execute() in the default path.
case_lookup already contains the union of base+curr physical names;
the columns is None branch could derive the column list from
case_lookup (or case_lookup could cache the per-relation lists).
Pass G.

What I verified

Local data contract preserved. On case-preserving warehouses
(Postgres, DuckDB, BigQuery), _build_column_case_lookup returns
{lower(name): name} where name already equals the user input,
so _normalise_identifier is a no-op. Test suite (116 tasks tests)
passes unchanged.
Frontend contract preserved. js/.../toValueDiffGrid.ts:8-11
("primary_keys match actual column casing") relies on
self.params.primary_key aligning with result column keys. The fix
feeds normalised (physical-case) primary_key into the
post-execution normalize_keys_to_columns(primary_key, columns) call
at valuediff.py:312, 518 — columns is also normalised, so the
output keeps matching the result-set casing. No frontend regression.
QueryDiffTask (inherits ValueDiffMixin) untouched. It only
consumes _verify_primary_key indirectly via its own path; it does
not call the new helpers. Other check types (profile_diff,
histogram_diff, row_count_diff, query_diff, schema_diff) are not
modified.
Regression tests are evidence-grade. Both '"CUSTOMER_ID"' in sql
and '"customer_id"' not in sql assertions — catches partial fixes.
Composite-PK tests catch the cycle-1 regression where
_verify_primary_key's composite Jinja path was missed on the first
attempt; without those, a future refactor could silently re-break it.
No new SQL-injection surface. Identifiers are looked up in a
warehouse-derived dict before being passed to adapter.quote();
unknown identifiers fall through to the existing quoting path — same
trust boundary as before.

flake8: clean. pytest tests/tasks/ 116 passed.

iamcxa added 2 commits May 20, 2026 14:44

iamcxa self-assigned this May 21, 2026

iamcxa requested a review from Copilot May 21, 2026 09:48

Copilot started reviewing on behalf of iamcxa May 21, 2026 09:48 View session

Copilot AI reviewed May 21, 2026

View reviewed changes

Comment thread recce/tasks/valuediff.py Outdated

Comment thread recce/tasks/valuediff.py Outdated

refactor(valuediff): pass case_lookup down to avoid redundant get_col…

bc1fcb0

…umns() (DRC-3464 PR #1388 review) Signed-off-by: Kent <iamcxa@gmail.com>

iamcxa requested a review from wcchang1115 May 25, 2026 07:34

Merge branch 'main' into fix/drc-3464-value-diff-snowflake-quoting

543dbf5

wcchang1115 approved these changes May 25, 2026

View reviewed changes

Merge branch 'main' into fix/drc-3464-value-diff-snowflake-quoting

aaf5368

wcchang1115 merged commit 832d9a1 into main May 25, 2026
22 checks passed

wcchang1115 deleted the fix/drc-3464-value-diff-snowflake-quoting branch May 25, 2026 10:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(valuediff): normalise column case before adapter.quote() for Snowflake (DRC-3464)#1388

fix(valuediff): normalise column case before adapter.quote() for Snowflake (DRC-3464)#1388
wcchang1115 merged 5 commits into
mainfrom
fix/drc-3464-value-diff-snowflake-quoting

iamcxa commented May 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 21, 2026 •

edited

Loading

Uh oh!

wcchang1115 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

iamcxa commented May 21, 2026

Summary

Root cause

Fix

Tests

Audit trail

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

wcchang1115 left a comment

Choose a reason for hiding this comment

Code Review: PR #1388

Notes

What I verified

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented May 21, 2026 •

edited

Loading