ARROW-11414: [Rust] Reduce copies in Schema::try_merge #9347

alamb · 2021-01-28T11:49:11Z

I was looking at this code yesterday while using it in IOx -- https://github.com/influxdata/influxdb_iox/pull/703

Rationale:

Even though Schema::try_merge requires a slice of Schemas (not schema refs) ownership of its inputs, it copies all of its fields. This is inefficient ideal in the common case where most of the fields in the merged Schema will be the same

Changes:

This PR proposes to change the implementation so that try_merge takes something (like a Vec) that can iterate over the Schemas passed in and consume them, avoiding at least one copy per unique field. I intend no algorithmic changes, only performance improvement.

github-actions · 2021-01-28T11:49:34Z

https://issues.apache.org/jira/browse/ARROW-11414

alamb · 2021-01-28T11:50:06Z

rust/arrow/src/datatypes.rs

    /// use arrow::datatypes::*;
    ///
-    /// let merged = Schema::try_merge(&vec![
+    /// let merged = Schema::try_merge(vec![


The change in this example shows a pretty good example of how I think the usability (as well as performance) of this API is improved by this PR

jorgecarleitao

LGTM. 👍

alamb

@houqp / @nevi-me -- any concerns about this PR?

(I think it was introduced here 494e7a9)

houqp

LGTM. In general, i think this is the pattern we would want to apply to the rest of the code base whenever applicable. i.e. if a function takes reference but does internal clones, then it's better to let caller manage the ownership instead.

alamb · 2021-02-02T19:33:51Z

i.e. if a function takes reference but does internal clones, then it's better to let caller manage the ownership instead.

Yes I like this approach a lot

ARROW-11414: [Rust] Reduce copies in Schema::try_merge

7014e7b

github-actions bot added the Component: Rust label Jan 28, 2021

alamb commented Jan 28, 2021

View reviewed changes

jorgecarleitao approved these changes Jan 28, 2021

View reviewed changes

alamb commented Feb 2, 2021

View reviewed changes

houqp approved these changes Feb 2, 2021

View reviewed changes

alamb closed this in 660c81c Feb 3, 2021

houqp mentioned this pull request Mar 13, 2021

ARROW-11790: [Rust][DataFusion] RFC: Change builder signatures to take Vec<Expr> rather than &[Expr] #9692

Closed

asfimport mentioned this pull request Feb 3, 2021

[Rust] Reduce copies in Schema::try_merge #27302

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ARROW-11414: [Rust] Reduce copies in Schema::try_merge #9347

ARROW-11414: [Rust] Reduce copies in Schema::try_merge #9347

Uh oh!

alamb commented Jan 28, 2021

Uh oh!

github-actions bot commented Jan 28, 2021

Uh oh!

alamb Jan 28, 2021 •

edited

Loading

Uh oh!

jorgecarleitao left a comment

Uh oh!

alamb left a comment

Uh oh!

houqp left a comment •

edited

Loading

Uh oh!

alamb commented Feb 2, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

ARROW-11414: [Rust] Reduce copies in Schema::try_merge #9347

ARROW-11414: [Rust] Reduce copies in Schema::try_merge #9347

Uh oh!

Conversation

alamb commented Jan 28, 2021

Rationale:

Changes:

Uh oh!

github-actions bot commented Jan 28, 2021

Uh oh!

alamb Jan 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jorgecarleitao left a comment

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

houqp left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Feb 2, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alamb Jan 28, 2021 •

edited

Loading

houqp left a comment •

edited

Loading