Skip to content

[CT-707] Expose check_relations_equal + consolidate with dbt-utils generic tests #5318

Open
@jtcohen6

Description

To assert that two relations are identical, we have similar logic expressed in a few places today:

  1. Custom generic tests in dbt-utils: the equality, cardinality_equality, and (to a lesser extent) equal_rowcount
  2. COLUMNS_EQUAL_SQL + get_rows_different_sql, which are used extensively (but exclusively) in our functional testing framework today:
    def get_rows_different_sql(
    self,
    relation_a: BaseRelation,
    relation_b: BaseRelation,
    column_names: Optional[List[str]] = None,
    except_operator: str = "EXCEPT",
    ) -> str:
    """Generate SQL for a query that returns a single row with a two
    columns: the number of rows that are different between the two
    relations and the number of mismatched rows.
    """
    # This method only really exists for test reasons.
    names: List[str]
    if column_names is None:
    columns = self.get_columns_in_relation(relation_a)
    names = sorted((self.quote(c.name) for c in columns))
    else:
    names = sorted((self.quote(n) for n in column_names))
    columns_csv = ", ".join(names)
    sql = COLUMNS_EQUAL_SQL.format(
    columns=columns_csv,
    relation_a=str(relation_a),
    relation_b=str(relation_b),
    except_op=except_operator,
    )
    return sql
    COLUMNS_EQUAL_SQL = """
    with diff_count as (
    SELECT
    1 as id,
    COUNT(*) as num_missing FROM (
    (SELECT {columns} FROM {relation_a} {except_op}
    SELECT {columns} FROM {relation_b})
    UNION ALL
    (SELECT {columns} FROM {relation_b} {except_op}
    SELECT {columns} FROM {relation_a})
    ) as a
    ), table_a as (
    SELECT COUNT(*) as num_rows FROM {relation_a}
    ), table_b as (
    SELECT COUNT(*) as num_rows FROM {relation_b}
    ), row_count_diff as (
    select
    1 as id,
    table_a.num_rows - table_b.num_rows as difference
    from table_a, table_b
    )
    select
    row_count_diff.difference as row_count_difference,
    diff_count.num_missing as num_mismatched
    from row_count_diff
    join diff_count using (id)
    """.strip()

Straightforward change

Could we consolidate the logic in both those places, so that we're not duplicating the same SQL (and requiring adapter maintainers to do the same)?

Related improvements

Some good feedback on the latter, in #4455 (reply in thread) — trying to answer the question, "Could/should the functional testing framework also enable unit testing models/macros for end users?" — that the output of check_relations_equal assertion statements could be more useful:

When a test does fail we get this message:
AssertionError: Got 1 different rows between DEV_CEREBRO.test16539256090777326094_test_complex_model.actual and DEV_CEREBRO.test16539256090777326094_test_complex_model.expected which isn't informative enough to workout what's different between the expected and actual

Metadata

Assignees

No one assigned

    Labels

    Team:AdaptersIssues designated for the adapter area of the codetech_debtBehind-the-scenes changes, with little direct impact on end-user functionalityutilsCross-database building blocks

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions