Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TPCH-DS planning benchmark #9907

Merged
merged 3 commits into from
Apr 3, 2024
Merged

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Apr 2, 2024

Which issue does this PR close?

Part of faster planning #5637

Rationale for this change

As we work on improving planning speed (e.g #9637)
it would help to have more planner benchmarks to see how we are doing

What changes are included in this PR?

  1. Add TPC-DC planning benchmarks, similarly to the existing TPCH query benchmarks
  2. Refactor the tpch planning benchmark to follow the same pattern

Are these changes tested?

I tested them manually

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label Apr 2, 2024
let schema = Arc::new(create_schema(column_prefix, num_columns));
MemTable::try_new(schema, vec![]).map(Arc::new).unwrap()
}

pub fn create_tpch_schemas() -> [(String, Schema); 8] {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just moved this code into test_utils.

@@ -236,40 +144,83 @@ fn criterion_benchmark(c: &mut Criterion) {
})
});

// --- TPC-H ---

let tpch_ctx = register_defs(SessionContext::new(), tpch_schemas());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than using the same context for all tests, I made separate contexts for TPCH and TPC-DS

use arrow::datatypes::{DataType, Field, Schema};

/// Schemas for the TPCH tables
pub fn tpch_schemas() -> Vec<TableDef> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved out of sql_planner

// specific language governing permissions and limitations
// under the License.

use crate::TableDef;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was moved from tpcds_planning

@@ -1066,577 +1066,3 @@ async fn regression_test(query_no: u8, create_physical: bool) -> Result<()> {

Ok(())
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was just moved to test_utils


// --- TPC-DS ---

let tpcds_ctx = register_defs(SessionContext::new(), tpcds_schemas());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the new code

@alamb alamb marked this pull request as ready for review April 2, 2024 10:23
@alamb alamb requested a review from andygrove April 2, 2024 10:23
@alamb
Copy link
Contributor Author

alamb commented Apr 2, 2024

FYI @andygrove and @matthewmturner -- I hope this also helps improve the planning performance of DataFusion

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how this benchmark can be started?

@alamb
Copy link
Contributor Author

alamb commented Apr 2, 2024

how this benchmark can be started?

It can be run with the existing sql_planner suite, like this for example:

cargo bench --bench sql_planner

To test with just the new tests, something like

cargo bench --bench sql_planner -- tpcds

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks @alamb this will give more accurate details on the planner time

@alamb
Copy link
Contributor Author

alamb commented Apr 3, 2024

I have a (now not) secret plan to run these benchmarks against the 37.0.0 and report how much better planning time is for 38.0.0 with the work in #9595 from @haohuaijin / @matthewmturner and #9824 from you and some of the improvements I have cooking with @peter-toth / @jayzhan211 in #9637

@alamb
Copy link
Contributor Author

alamb commented Apr 3, 2024

(thank you @comphead )

@alamb alamb merged commit daf182d into apache:main Apr 3, 2024
24 checks passed
alamb added a commit to alamb/datafusion that referenced this pull request Apr 3, 2024
* Move TPCH schema definition

* Move schema definitions into test_util

* Add tpcds planning benchmark
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants