-
Couldn't load subscription status.
- Fork 3.9k
ARROW-11414: [Rust] Reduce copies in Schema::try_merge #9347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| /// use arrow::datatypes::*; | ||
| /// | ||
| /// let merged = Schema::try_merge(&vec![ | ||
| /// let merged = Schema::try_merge(vec![ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change in this example shows a pretty good example of how I think the usability (as well as performance) of this API is improved by this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. In general, i think this is the pattern we would want to apply to the rest of the code base whenever applicable. i.e. if a function takes reference but does internal clones, then it's better to let caller manage the ownership instead.
Yes I like this approach a lot |
I was looking at this code yesterday while using it in IOx -- https://github.com/influxdata/influxdb_iox/pull/703
Rationale:
Even though
Schema::try_mergerequires a slice ofSchemas (not schema refs) ownership of its inputs, it copies all of its fields. This is inefficient ideal in the common case where most of the fields in the mergedSchemawill be the sameChanges:
This PR proposes to change the implementation so that
try_mergetakes something (like aVec) that can iterate over the Schemas passed in and consume them, avoiding at least one copy per unique field. I intend no algorithmic changes, only performance improvement.