[SPIKE] - Determine how to handle overwriting DataFrame methods on Accessor #639
Description
Our current implementation of the WoodworkTableAccessor has several methods that overwrite Pandas methods: drop
, pop
, rename
, value_counts
, index
, and the serialization methods are all examples.
Some of these methods add additional functionality beyond the pandas operations (serialization adding the Schema to the output, index referring to the Schema index, value_counts only counting for categorical columns); however, others serve the purpose of performing the same operation while maintaining typing information that would otherwise be invalidated (rename, pop, and drop).
For the latter examples of overwritten names, it might be a problem if Woodwork's methods differ from pandas' or Dask's or Koalas'. df.ww.drop
, for example, only offers the ability to drop columns, while df.drop
allows dropping rows as well. This means that before we implemented drop on the accessor, users could get that functionality through the accessor, and we took it away by adding our own method. It's worth noting that users can still go through the DataFrame directly and then initialize Woodwork afterwards.
Since we ask users to go through the accessor for DataFrame operations, this means we're asking them to know when we've implemented our own method to overwrite the DataFrame one, and the input parameters might be different.
To help mitigate this, we could:
- Make sure our parameters match Pandas' as closely as possible (though Dask and Koalas don't always match)
- allow for extra kwargs beyond the ones necessary for schema updates in these methods and apply them to the DataFrame call that happens internal to the method
- avoid overwriting pandas methods in these cases and instead choose method names that don't overlap -
df.ww.drop_columns
in this case.