Skip to content

[SPIKE] - Determine how to handle overwriting DataFrame methods on Accessor #639

Open
@tamargrey

Description

Our current implementation of the WoodworkTableAccessor has several methods that overwrite Pandas methods: drop, pop, rename, value_counts, index, and the serialization methods are all examples.

Some of these methods add additional functionality beyond the pandas operations (serialization adding the Schema to the output, index referring to the Schema index, value_counts only counting for categorical columns); however, others serve the purpose of performing the same operation while maintaining typing information that would otherwise be invalidated (rename, pop, and drop).

For the latter examples of overwritten names, it might be a problem if Woodwork's methods differ from pandas' or Dask's or Koalas'. df.ww.drop, for example, only offers the ability to drop columns, while df.drop allows dropping rows as well. This means that before we implemented drop on the accessor, users could get that functionality through the accessor, and we took it away by adding our own method. It's worth noting that users can still go through the DataFrame directly and then initialize Woodwork afterwards.

Since we ask users to go through the accessor for DataFrame operations, this means we're asking them to know when we've implemented our own method to overwrite the DataFrame one, and the input parameters might be different.

To help mitigate this, we could:

  • Make sure our parameters match Pandas' as closely as possible (though Dask and Koalas don't always match)
  • allow for extra kwargs beyond the ones necessary for schema updates in these methods and apply them to the DataFrame call that happens internal to the method
  • avoid overwriting pandas methods in these cases and instead choose method names that don't overlap - df.ww.drop_columns in this case.

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions