Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

depr(python): Deprecate overwrite_schema parameter for DataFrame.write_delta #14879

Merged
merged 3 commits into from
Mar 6, 2024

Conversation

stinodego
Copy link
Member

@stinodego stinodego commented Mar 6, 2024

deltalake changed their public API. They no longer have an overwrite_schema parameter. We exposed this parameter directly in our API.

I removed this parameter to defer directly to the deltalake API through delta_write_options. Users can pass delta_write_options={"schema_mode": "overwrite"} to get the same result.

Also had to update some tests as deltalake handles timezones differently now.

@ion-elgreco FYI

@github-actions github-actions bot added deprecation Add a deprecation warning to outdated functionality python Related to Python Polars labels Mar 6, 2024
@stinodego stinodego marked this pull request as ready for review March 6, 2024 11:22
@ion-elgreco
Copy link
Contributor

ion-elgreco commented Mar 6, 2024

Sorry forgetting to give some heads up, just did the release in the train:P Since this release was breaking for sure, I've added a feature to finally handled datetimes with timezones properly.

Maybe it could make sense to do it the same way as we did, add schema_mode as additional parameter in the function, in stead of passing it through delta_write_options. I expect schema_mode to be quite commonly used.

@stinodego
Copy link
Member Author

Maybe it could make sense to do it the same way as we did, add schema_mode as additional parameter in the function, in stead of passing it through delta_write_options. I expect schema_mode to be quite commonly used.

I considered this, as it was the original reason for including the parameter directly in our API. However, since there are now multiple options (merge, overwrite, None), possibly more added in the future, and a tie-in to the underlying engine (merge doesn't work with the pyarrow-engine), this is now getting too much of a responsibility to put on our own API.

@stinodego stinodego merged commit 6a181f2 into main Mar 6, 2024
15 of 16 checks passed
@stinodego stinodego deleted the fix-tests branch March 6, 2024 11:41
@ion-elgreco
Copy link
Contributor

@stinodego alright! Shall we maybe bring it back once the pyarrow engine gets deprecated and the overwrite_schema parameter in write_deltalake?

These things are likely to happen in deltalake 1.0 release

@stinodego
Copy link
Member Author

stinodego commented Mar 6, 2024

@stinodego alright! Shall we maybe bring it back once the pyarrow engine gets deprecated and the overwrite_schema parameter in write_deltalake?

These things are likely to happen in deltalake 1.0 release

Sounds good to me! We can revisit when the 1.0 release is out.

However, I am very wary of tying our public API to that of a dependency, exactly for the reason that caused this PR.

@c-peters c-peters added the accepted Ready for implementation label Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation deprecation Add a deprecation warning to outdated functionality python Related to Python Polars
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants