This repository now includes a config-driven QA framework that validates medallion flows, contracts, orchestration behavior, and governance controls end-to-end.
- Orchestrator: Airflow DAGs in
dags/. - Transformation engine: dbt project in
dbt/for the canonicalodp_staffing_demanduse case; during migration, physical source tables remain inodp_staffing_demandand serving models inodp_staffing_demand_dbt. - Warehouse/Serving: PostgreSQL warehouse for ODP Staffing Demand, with transitional physical schema
odp_staffing_demand, plus Superset SQL templates. - Contracts: dbt model YAML + config-driven dataset contracts in
tests/configs/datasets/*.yml. - Governance metadata:
schema/metrics.yaml, governance validation script, and QA governance policies intests/configs/policies/governance_policies.yml.
tests/data_quality/: baseline schema/null/unique/range/freshness checks.tests/contracts/: schema contract and naming/PK stability checks.tests/e2e/: pipeline execution, idempotency, incremental behavior, serving query checks.tests/governance/: metadata completeness, lineage, PII, RBAC, retention, auditability.tests/helpers/: connectors, policy engine, shared SQL assertions, environment config.tests/configs/datasets/: dataset-level contracts + expectations.tests/configs/policies/: governance policy rules.tests/configs/environments.yml: dev/test/prod isolation and mutation safety flags.
- Start platform services (warehouse + dependencies) and install deps:
make dev-install
docker compose up -d- Run full QA E2E suite with artifacts:
make test-e2e- Evidence and reports are written to:
tests/e2e/evidence/latest/results/report.htmltests/e2e/evidence/latest/results/qa_report.mdtests/e2e/evidence/latest/results/qa_report.jsontests/e2e/evidence/latest/results/junit.xml
The GitHub workflow .github/workflows/e2e-data-platform.yml runs on pull requests affecting pipeline/test/governance assets and executes:
./scripts/testing/run_e2e_tests.shIt uploads tests/e2e/evidence/latest as the evidence artifact.
- Add a dataset contract file in
tests/configs/datasets/<dataset>.yml.
Minimum required fields:
dataset: schema.table
owner: team@example.com
description: "What this dataset is for"
domain: odp_staffing_demand
layer: gold
classification: confidential
sensitivity: internal
product_tag: labor-market
pii_columns: []
pii_classifications: {}
retention_days: 365
timestamp_column: loaded_at
primary_key: [id]
upstreams: [source_schema.source_table]
tests:
freshness:
column: loaded_at
format: timestamp
max_age_hours: 24
schema:
required_columns: [id, loaded_at]
column_types:
id: text
constraints:
unique: [id]
not_null: [id]
reconciliations:
- type: row_count_ratio
upstream: source_schema.source_table
min_ratio: 0.95
max_ratio: 1.05
governance:
require_lineage: true
require_classification: true
require_rbac: false
pii_masking_required: false
allowed_roles_read: []-
If policy behavior needs to change, update
tests/configs/policies/governance_policies.yml. -
Run:
make qa-test- If needed, add a focused test in one of:
tests/data_quality/tests/contracts/tests/e2e/tests/governance/