Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Jan 28, 2021

I was looking at this code yesterday while using it in IOx -- https://github.com/influxdata/influxdb_iox/pull/703

Rationale:

Even though Schema::try_merge requires a slice of Schemas (not schema refs) ownership of its inputs, it copies all of its fields. This is inefficient ideal in the common case where most of the fields in the merged Schema will be the same

Changes:

This PR proposes to change the implementation so that try_merge takes something (like a Vec) that can iterate over the Schemas passed in and consume them, avoiding at least one copy per unique field. I intend no algorithmic changes, only performance improvement.

@github-actions
Copy link

/// use arrow::datatypes::*;
///
/// let merged = Schema::try_merge(&vec![
/// let merged = Schema::try_merge(vec![
Copy link
Contributor Author

@alamb alamb Jan 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change in this example shows a pretty good example of how I think the usability (as well as performance) of this API is improved by this PR

Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. 👍

Copy link
Contributor Author

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@houqp / @nevi-me -- any concerns about this PR?

(I think it was introduced here 494e7a9)

Copy link
Member

@houqp houqp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. In general, i think this is the pattern we would want to apply to the rest of the code base whenever applicable. i.e. if a function takes reference but does internal clones, then it's better to let caller manage the ownership instead.

@alamb
Copy link
Contributor Author

alamb commented Feb 2, 2021

i.e. if a function takes reference but does internal clones, then it's better to let caller manage the ownership instead.

Yes I like this approach a lot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants