Skip to content

DataRecce/recce-dbt-package

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Recce dbt Package

A dbt package that captures metadata about your dbt project and stores it in warehouse tables. This enables Recce to perform cross-environment data validation without requiring local artifact files.

Installation

Step 1: Add the package

Add to your packages.yml:

packages:
  - git: "https://github.com/DataRecce/recce-dbt-package.git"
    revision: main

Then run:

dbt deps

Step 2: Configure your project

Add to your dbt_project.yml:

vars:
  # Schema where recce metadata tables will be created
  recce_schema: recce_metadata

# Hook to capture metadata after each run
on-run-end:
  - "{{ recce.upload_metadata() }}"

Note: If recce_schema is not set, tables will be created in your target schema (e.g., main for DuckDB, or your configured schema for Snowflake/BigQuery).

Step 3: Run dbt

Run dbt to create the metadata tables and start capturing data:

# First run creates the tables
dbt run -s recce

# Subsequent runs will capture metadata via on-run-end hook
dbt run

How It Works

The on-run-end hook captures metadata after each dbt run:

  1. Invocation metadata - Timestamp, dbt version, adapter type, git info, CI context
  2. Node metadata - All models, sources, seeds, snapshots, exposures, metrics
  3. Run results - Execution status, timing, and row counts for each node

Column information is queried directly from information_schema when Recce connects to the warehouse, ensuring you always have current schema details.

Tables Created

Table Description
recce_invocations Run context - invocation ID, timestamp, dbt version, adapter, git SHA/branch, CI metadata
recce_nodes_dbt All nodes - unique_id, name, resource_type, depends_on, raw_code, checksum
recce_run_results_dbt Run results - status, execution_time, rows_affected, message

Configuration Options

Variable Default Description
recce_schema target.schema Schema for metadata tables
recce_database target.database Database for metadata tables (optional)
disable_recce_metadata_upload false Set to true to disable automatic metadata capture

Example: Custom schema

vars:
  recce_schema: recce_metadata

Example: Cross-database (Snowflake)

vars:
  recce_schema: recce_metadata
  recce_database: ANALYTICS_DB

Example: Disable metadata upload

vars:
  disable_recce_metadata_upload: true

CI/CD Integration

The package automatically detects CI environments and captures relevant metadata:

CI Platform Detection Captured Metadata
dbt Cloud DBT_CLOUD_RUN_ID run_id, job_id, project_id
GitHub Actions GITHUB_ACTIONS=true run_id, run_number, workflow, repository
GitLab CI GITLAB_CI=true job_id, pipeline_id, project_path
CircleCI CIRCLECI=true build_num, workflow_id, project_reponame
Jenkins JENKINS_URL build_number, job_name, build_url

Git information is captured from environment variables:

  • GIT_SHA, GITHUB_SHA, or CI_COMMIT_SHA
  • GIT_BRANCH, GITHUB_REF_NAME, or CI_COMMIT_BRANCH

Warehouse Support

Warehouse Status
DuckDB Supported
Snowflake Supported
PostgreSQL Supported
BigQuery Experimental
Redshift Experimental

Usage with Recce

After installing this package and running dbt run, Recce can read metadata from the warehouse instead of local artifact files:

# Instead of:
recce server --base-manifest target-base/manifest.json

# Use:
recce server --warehouse-metadata

License

Apache 2.0

About

dbt package for capturing metadata to warehouse tables

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published