Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extract datafusion-physical-plan to its own crate #7432

Merged
merged 21 commits into from
Sep 15, 2023

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Aug 28, 2023

Which issue does this PR close?

Closes #1754

Rationale for this change

DataFusion is getting bigger and so to keep the code more manageable we need to divide it into smaller crates

What changes are included in this PR?

  1. New datafusion-physical-plan crate
  2. Move / update includes and tests

Note I did copy several functions / testing structs from the core crate to the datafusion-physical-plan crate.

Are these changes tested?

Yes

Are there any user-facing changes?

I don't think so -- everything should be backwards compatible

@github-actions github-actions bot added the core Core DataFusion crate label Aug 28, 2023
use tempfile::TempDir;

#[tokio::test]
async fn simple_predicate() -> Result<()> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

predicate evaluation in filters is well covered by sqllogictests and since this test used CsvExec (which is in datasource) I simply removed the test instead of trying to rewrite it

@@ -502,88 +500,14 @@ mod tests {
binary(l, op, r, input_schema).unwrap()
}

#[tokio::test]
async fn project_first_column() -> Result<()> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basic projections are well covered by end to end / sqllogic tests -- this test used csvexec which isn't int he physical plan crate, so I removed this test rather than trying to port it

@@ -509,143 +442,6 @@ mod tests {
Ok(())
}

#[tokio::test]
async fn window_function_with_udaf() -> Result<()> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

#[tokio::test]
async fn window_function() -> Result<()> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let schema = csv.schema();

let sort_exec = Arc::new(SortExec::new(
vec![
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of this is simply copy/paste and relied on the CsvExec unnecessarily. I updated the tests to only depend on generated data

@alamb alamb marked this pull request as ready for review September 11, 2023 20:24
@alamb alamb requested a review from tustvold September 13, 2023 16:36
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is probably fine, however, I think we probably shouldn't make fields of *Exec public, this is a very strong API commitment and quite hard to later evolve/retract. I additionally think some of the duplication of test utilities could be reduced by adding them to datafusion-common, perhaps with #[doc(hidden)].

@@ -242,45 +242,6 @@ pub fn with_new_children_if_necessary(
/// Return a [wrapper](DisplayableExecutionPlan) around an
/// [`ExecutionPlan`] which can be displayed in various easier to
/// understand ways.
///
/// ```
/// use datafusion::prelude::*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this has been removed because the frontend is not a dependency of physical-plan. Perhaps we could manually construct the physical plan? Don't feel strongly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that is correct. I don't think this API needs an end to end example (even though I think I initially wrote it)

datafusion/physical-plan/src/repartition/mod.rs Outdated Show resolved Hide resolved
datafusion/physical-plan/src/repartition/mod.rs Outdated Show resolved Hide resolved
datafusion/physical-plan/src/test.rs Outdated Show resolved Hide resolved
.unwrap()
/// A mock execution plan that simply returns the provided statistics
#[derive(Debug, Clone)]
pub struct StatisticsExec {
Copy link
Contributor

@tustvold tustvold Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find where this was moved from. Possibly this is a duplicate? This seems generally useful for testing, and therefore not the end of the world were it be made public. We could always #[doc(hidden)]'ify it if that is a concern

Copy link
Contributor Author

@alamb alamb Sep 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure -- there are definitely some areas that would be nice to clean up with respect to testing

@alamb
Copy link
Contributor Author

alamb commented Sep 14, 2023

I think this is probably fine, however, I think we probably shouldn't make fields of *Exec public, this is a very strong API commitment and quite hard to later evolve/retract. I additionally think some of the duplication of test utilities could be reduced by adding them to datafusion-common, perhaps with #[doc(hidden)].

I have made both these changes -- I think the PR is substantially nicer because of it. Thank you again for the review

@alamb
Copy link
Contributor Author

alamb commented Sep 15, 2023

Clippy failures are due to #7572

@github-actions github-actions bot added physical-expr Physical Expressions optimizer Optimizer rules labels Sep 15, 2023
@github-actions github-actions bot removed physical-expr Physical Expressions optimizer Optimizer rules labels Sep 15, 2023
@alamb
Copy link
Contributor Author

alamb commented Sep 15, 2023

This PR has been open for quite and while -- I am merging it in and I am happy to make any additional changes as follow on PRs.

@alamb alamb merged commit 7b12666 into apache:main Sep 15, 2023
21 checks passed
@alamb
Copy link
Contributor Author

alamb commented Sep 15, 2023

FYI @jimexist I finally did it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

split datafusion-physical-plan sub-module
2 participants