Skip to content

Add TableSchema.diff method to understand the difference between two TableSchema objects #1670

Open
@tamargrey

Description

@tamargrey
  • As a user, I wish I had an easy way to tell the difference between two Woodwork TableSchemas.

When passing around Woodwork dataframes, it is easy to lose track of some of the woodwork types, like feature origins or metadata, and because the table schema repr only shows column names, logical types, and semantic tags, it is hard to tell if other woodwork typing info has changed without going through all the relevant fields and directly comparing. It would be great if there was a Woodwork method to make this easier.

Code Example

schema_1.diff(schema_2)

We would need to come up with a design for what the output could be, but we could go as simple as just displaying all fields that are not equal and outputting the entire value, leaving it up to the user to determine what exactly is different. A more involved option would be to isolate the difference and display that specifically.

For consistency's sake, we should use this function to implement the TableSchema.__eq__ method, which will make sure that these two always stay in sync.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    new featuresuggestions for new functionality

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions