[CT-707] Expose check_relations_equal + consolidate with dbt-utils generic tests

To assert that two relations are identical, we have similar logic expressed in a few places today:

1. Custom generic tests in `dbt-utils`: the [`equality`](https://github.com/dbt-labs/dbt-utils/blob/44eeb1978c1c720baca2e2cfe07671e94db02c7c/macros/generic_tests/equality.sql#L37-L73), [`cardinality_equality`](https://github.com/dbt-labs/dbt-utils/blob/44eeb1978c1c720baca2e2cfe07671e94db02c7c/macros/generic_tests/cardinality_equality.sql#L10-L51), and (to a lesser extent) [`equal_rowcount`](https://github.com/dbt-labs/dbt-utils/blob/44eeb1978c1c720baca2e2cfe07671e94db02c7c/macros/generic_tests/equal_rowcount.sql#L15-L36)
2. `COLUMNS_EQUAL_SQL` + `get_rows_different_sql`, which are used extensively (but exclusively) in our functional testing framework today:
https://github.com/dbt-labs/dbt-core/blob/3996a69861d5ba9a460092c93b7e08d8e2a63f88/core/dbt/adapters/base/impl.py#L1093-L1149

### Straightforward change

Could we consolidate the logic in both those places, so that we're not duplicating the same SQL (and requiring adapter maintainers to do the same)?

### Related improvements

Some good feedback on the latter, in https://github.com/dbt-labs/dbt-core/discussions/4455#discussioncomment-2850130 — trying to answer the question, "Could/should the functional testing framework also enable unit testing models/macros for end users?" — that the [output of `check_relations_equal`](https://github.com/dbt-labs/dbt-core/blob/3996a69861d5ba9a460092c93b7e08d8e2a63f88/core/dbt/tests/util.py#L334-L348) assertion statements could be more useful:

> When a test does fail we get this message:
`AssertionError: Got 1 different rows between DEV_CEREBRO.test16539256090777326094_test_complex_model.actual and DEV_CEREBRO.test16539256090777326094_test_complex_model.expected` which isn't informative enough to workout what's different between the expected and actual

	def get_rows_different_sql(
	self,
	relation_a: BaseRelation,
	relation_b: BaseRelation,
	column_names: Optional[List[str]] = None,
	except_operator: str = "EXCEPT",
	) -> str:
	"""Generate SQL for a query that returns a single row with a two
	columns: the number of rows that are different between the two
	relations and the number of mismatched rows.
	"""
	# This method only really exists for test reasons.
	names: List[str]
	if column_names is None:
	columns = self.get_columns_in_relation(relation_a)
	names = sorted((self.quote(c.name) for c in columns))
	else:
	names = sorted((self.quote(n) for n in column_names))
	columns_csv = ", ".join(names)

	sql = COLUMNS_EQUAL_SQL.format(
	columns=columns_csv,
	relation_a=str(relation_a),
	relation_b=str(relation_b),
	except_op=except_operator,
	)

	return sql


	COLUMNS_EQUAL_SQL = """
	with diff_count as (
	SELECT
	1 as id,
	COUNT(*) as num_missing FROM (
	(SELECT {columns} FROM {relation_a} {except_op}
	SELECT {columns} FROM {relation_b})
	UNION ALL
	(SELECT {columns} FROM {relation_b} {except_op}
	SELECT {columns} FROM {relation_a})
	) as a
	), table_a as (
	SELECT COUNT(*) as num_rows FROM {relation_a}
	), table_b as (
	SELECT COUNT(*) as num_rows FROM {relation_b}
	), row_count_diff as (
	select
	1 as id,
	table_a.num_rows - table_b.num_rows as difference
	from table_a, table_b
	)
	select
	row_count_diff.difference as row_count_difference,
	diff_count.num_missing as num_mismatched
	from row_count_diff
	join diff_count using (id)
	""".strip()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-707] Expose check_relations_equal + consolidate with dbt-utils generic tests #5318

Straightforward change

Related improvements

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development