-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Remove dict_id from schema and make it an IPC concern only #7467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ac99e01 to
f690fc8
Compare
|
It seems unfortunate to have methods requiring multiple different types of schema passed to them. I'm afraid I don't have the capacity at the moment to disentangle what this PR is actually changing, the flight changes in particular I think will lead to a breaking change that hasn't been communicated via a prior deprecation. I'm going to have to leave this one to other maintainers to look into, I no longer have the necessary context to follow what the IPC APIs are doing anymore. |
|
In all cases we have the option to remove the “regular” schema, but it would have to mean that we construct it wherever we need the arrow-schema schema, which we can do but it’s a schema ref almost everywhere so it seemed like that was intentional for performance or lifetime reasons. I can definitely make that change if that’s the path we want to try there’s a danger though that it is perf sensitive and then we need to end up breaking the api twice. |
|
@brancz I think it would help if we could break this PR up into smaller pieces. I came here looking to change |
|
I guess we theoretically could, but that seems inconsistent with the rest of the codebase and it could cause unexpected failure and behavior without the remaining changes of this PR. @alamb how do you propose we continue with this? I'm happy to rebase and fix the remaining things, if we can agree on the general trajectory of continuing with this strategy. |
I like @adriangb 's suggestion to try to break this PR up into smaller changes as it will be easier to review. Specifically I think it will be easier to achieve what @tustvold describes as:
I am finding this PR hard to review because I feel it is hard to understand the implications of the change (on compatibility, performance, etc). Smaller PRs might make that easier However, there might also be no way to break it up smaller . Some thoughts I had:
|
# Which issue does this PR close? - Closes #7810. # Rationale for this change Removes the last batch of functions that can be removed in 56.0.0. Some deprecated functions remain (esp in arrow-schema and arrow-ipc), but those will be dealt with elsewhere (#7467) # What changes are included in this PR? # Are these changes tested? Covered by existing tests # Are there any user-facing changes? Yes, public functions are removed
|
I split out just the IPC layer changes in #7929. That would have to be merged first in order to do the remaining changes that rely on the IPC layer no longer using the |
|
I was able to split this PR into 5 separate PRs so I'm closing this one. |
…7968) # Which issue does this PR close? Closes #6356 # Rationale for this change Now that #7940 is merged, nothing useful can be done with the `dict_id` field, therefore, it is now safe to be removed from this requirement. This was also split out from: #7467 # What changes are included in this PR? No longer require the `dict_id` fields of two `Field`s of schemas being merged to be equal, as at this point the `dict_id` is only an IPC concern, and the fact that it is still in the struct definition is just legacy, marked for removal, we're just going through the proper procedure of deprecating and replacing the APIs that use it. # Are these changes tested? Tests passing. # Are there any user-facing changes? No API changes, just a behavior change, that was to be expected and desired due to the deprecations around the `dict_id` field. @alamb @adriangb @tustvold
Thank you very much for doing so 🤗 |
Which issue does this PR close?
dict_idequality in field merging #6356dict_idfromarrow_schema::field::Fieldand make dictionary IDs an internal implementation detail of flight encoding/decoding #5981Rationale for this change
See the above issues. And this is a follow up to
dict_id#6873.What changes are included in this PR?
Removal of
dict_idfrom the canonicalSchema'sFieldand the resulting repercussions of that.Are there any user-facing changes?
Yes, the previously declared deprecations and a few more resulting changes that we didn't foresee previously.
The ones we did not previously foresee involve those places where we now need to pipe through an IPC schema where we previously only had the regular Schema (which used to be enough since that carried the
dict_id):flight_data_to_arrow_batch(arrow-flight)arrow_data_from_flight_data(arrow-flight; sql)read_record_batch(arrow-ipc)read_dictionary(arrow-ipc)FileDecoder::new(arrow-ipc)I think these don't have stability guarantees but for the sake of completeness:
field_to_json(arrow-integration-test)record_batch_from_json(arrow-integration-test)array_from_json(arrow-integration-test)dictionary_array_from_json(arrow-integration-test)@tustvold @alamb @thinkharderdev