Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: remove deprecated overwrite_schema configuration which has incorrect behavior #2554

Merged
merged 2 commits into from
May 29, 2024

Conversation

rtyler
Copy link
Member

@rtyler rtyler commented May 29, 2024

Uses of mode='append' and overwrite_schema=True lead to inconsistent behavior between Rust and PyArrow engines for write_deltalake. In the PyArrow case the parameter is quietly omitted so users may experience unexpected behavior since schemas will not actually be overridden.

Users of this parameter set most likely want schema_mode='merge' which would allow for schema evolution on appends to a Delta Table

Fixes #2553

…rect behavior

Uses of mode='append' and overwrite_schema=True lead to inconsistent
behavior between Rust and PyArrow engines for write_deltalake. In the
PyArrow case the parameter is quietly omitted so users may experience
unexpected behavior since schemas will not actually be overridden.

Users of this parameter set most likely want schema_mode='merge' which
would allow for schema evolution on appends to a Delta Table

Fixes delta-io#2553
@github-actions github-actions bot added binding/python Issues for the Python package binding/rust Issues for the Rust crate labels May 29, 2024
…elease

Many of the subcrates are unafected by any changes here, so I'm just
expanding their compatibility range.
@rtyler rtyler marked this pull request as ready for review May 29, 2024 16:46
@rtyler rtyler merged commit b05d7e9 into delta-io:main May 29, 2024
22 of 23 checks passed
@rtyler rtyler deleted the fix-2553 branch May 29, 2024 18:07
github-merge-queue bot pushed a commit to Unstructured-IO/unstructured that referenced this pull request Jun 13, 2024
### Summary

Closes #3173. Removes the `overwrite_schema` kwarg from the Delta Table
connector and bumps the `deltalake` version. Per [this
PR](delta-io/delta-rs#2554) in the `deltalake`
repo, the `overwrite_schema` kwarg is deprecated as of version `0.18.0`.
Users can specify `schema_mode="merge"` to obtain the same behavior.

- `schema_mode="merge"` is equivalent to `overwrite_schema=False`
- `schema_mode="overwrite"` is equivalent to `overwrite_schema=True`

Also adds an `engine` parameter that you can use to set `"rust"` or
`"pyarrow"` as the engine. `engine` defaults to `"pyarrow"` and
`schema_mode` defaults to `None`, which is consistent with the behavior
in `deltalake` documented
[here](https://delta-io.github.io/delta-rs/api/delta_writer/).

### Testing

The Delta Table ingest tests should pass on this PR.

---------

Co-authored-by: Ahmet Melek <39141206+ahmetmeleq@users.noreply.github.com>
github-merge-queue bot pushed a commit to Unstructured-IO/unstructured that referenced this pull request Jun 13, 2024
### Summary

Closes #3173. Removes the `overwrite_schema` kwarg from the Delta Table
connector and bumps the `deltalake` version. Per [this
PR](delta-io/delta-rs#2554) in the `deltalake`
repo, the `overwrite_schema` kwarg is deprecated as of version `0.18.0`.
Users can specify `schema_mode="merge"` to obtain the same behavior.

- `schema_mode="merge"` is equivalent to `overwrite_schema=False`
- `schema_mode="overwrite"` is equivalent to `overwrite_schema=True`

Also adds an `engine` parameter that you can use to set `"rust"` or
`"pyarrow"` as the engine. `engine` defaults to `"pyarrow"` and
`schema_mode` defaults to `None`, which is consistent with the behavior
in `deltalake` documented
[here](https://delta-io.github.io/delta-rs/api/delta_writer/).

### Testing

The Delta Table ingest tests should pass on this PR.

---------

Co-authored-by: Ahmet Melek <39141206+ahmetmeleq@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package binding/rust Issues for the Rust crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

write_deltalake with rust engine fails when mode is append and overwrite schema is enabled
2 participants