Skip to content

Produce a diff of two arbitrary queries #6343

@timsehn

Description

@timsehn

Two people in the past couple weeks have asked for the ability to produce a diff of an arbitrary SQL query between Dolt commits.

We used to have this feature first as dolt query-diff (#732) then as dolt diff -q (#776). This feature was removed in early 2021 (#1321). After discussion internally the rationale for removing it was:

  • Little used
  • No specific Dolt logic used. It was the same as writing your own Go program that ran both queries and iterated through the result rows.
  • Potentially really slow. May give users a false impression that dolt diff is slow.

This issue suggests we should re-implement the feature. Proposed interface is a SQL table function called dolt _query_diff() or dolt_query_delta() that takes (<sql query>, fromCommit, toCommit).

Algorithmically we can:

  1. Brute force like we did
  2. Binary search both query results like datafold data-diff does (https://github.com/datafold/data-diff)
  3. Use relational algebra against our current diff tables

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions