Skip to content
This repository was archived by the owner on May 17, 2024. It is now read-only.
This repository was archived by the owner on May 17, 2024. It is now read-only.

Assertions prevent using joindiff on bigquery tables in different projects, even though it works fine. #302

Closed
@segv

Description

@segv

Request: If I explicitly call data_diff.diff_tables and pass in algorithm=data_diff.Algorithm.JOINDIFF it'd be wonderful if the tool would let me use it, even if it thinks I'm applying to two different DBs (maybe just a warning message that I can ignore?)

Context:

In bigquery you can always (permissions aside) join tables in different datasets and projects, however since data_diff considers two different projects to be two differnt databases a few of the assertions in JoinDiff fail.

Unlike in snowflake or postgres DBs, where prod and dev would be different schemas in the same DB, when using BigQuery I've found it to be pretty common to have a prod project and then each dev gets their own playground project.

I went in and commented out three assertions (and had to change the default value of SegmentInfo.rowcounts to {1: 0, 2: 0}, but once that was done the code worked great and was substantially faster than the hash diff.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requeststaleIssues/PRs that have gone stale

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions