Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark for sort preserving merge #2431

Merged
merged 5 commits into from
May 20, 2022

Conversation

alamb
Copy link
Contributor

@alamb alamb commented May 3, 2022

Which issue does this PR close?

Part of #2427

Rationale for this change

Add benchmarks for the cases I intended to optimize for in #2427

What changes are included in this PR?

new merge benchmark

run:

cargo bench --bench merge

Here is an example flamegraph produced via:

TODO

Are there any user-facing changes?

No

@github-actions github-actions bot added the datafusion Changes in the datafusion crate label May 3, 2022
@alamb alamb changed the title WIP Add benchmark for sort preserving merge Benchmark for sort preserving merge May 18, 2022
@alamb alamb marked this pull request as ready for review May 18, 2022 18:15
@alamb alamb requested a review from tustvold May 18, 2022 18:57
@alamb
Copy link
Contributor Author

alamb commented May 18, 2022

cc @tustvold @yjshen @richox

Comment on lines +32 to +67
//! Rows are randombly
//! divided into separate
//! RecordBatch "streams",
//! ┌────┐ ┌────┐ ┌────┐ preserving the order ┌────┐ ┌────┐ ┌────┐
//! │ │ │ │ │ │ │ │ │ │ │ │
//! │ │ │ │ │ │ ──────────────┐ │ │ │ │ │ │
//! │ │ │ │ │ │ └─────────────▶ │ C1 │ │... │ │ CN │
//! │ │ │ │ │ │ ───────────────┐ │ │ │ │ │ │
//! │ │ │ │ │ │ ┌┼─────────────▶ │ │ │ │ │ │
//! │ │ │ │ │ │ ││ │ │ │ │ │ │
//! │ │ │ │ │ │ ││ └────┘ └────┘ └────┘
//! │ │ │ │ │ │ ││ ┌────┐ ┌────┐ ┌────┐
//! │ │ │ │ │ │ │└───────────────▶│ │ │ │ │ │
//! │ │ │ │ │ │ │ │ │ │ │ │ │
//! │ │ │ │ │ │ ... │ │ C1 │ │... │ │ CN │
//! │ │ │ │ │ │ ──────────────┘ │ │ │ │ │ │
//! │ │ │ │ │ │ ┌──────────────▶ │ │ │ │ │ │
//! │ C1 │ │... │ │ CN │ │ │ │ │ │ │ │
//! │ │ │ │ │ │───────────────┐│ └────┘ └────┘ └────┘
//! │ │ │ │ │ │ ││
//! │ │ │ │ │ │ ││
//! │ │ │ │ │ │ ││ ...
//! │ │ │ │ │ │ ────────────┼┼┐
//! │ │ │ │ │ │ │││
//! │ │ │ │ │ │ │││ ┌────┐ ┌────┐ ┌────┐
//! │ │ │ │ │ │ ──────────────┼┘│ │ │ │ │ │ │
//! │ │ │ │ │ │ │ │ │ │ │ │ │ │
//! │ │ │ │ │ │ │ │ │ C1 │ │... │ │ CN │
//! │ │ │ │ │ │ └─┼────────────▶ │ │ │ │ │ │
//! │ │ │ │ │ │ │ │ │ │ │ │ │
//! │ │ │ │ │ │ └─────────────▶ │ │ │ │ │ │
//! └────┘ └────┘ └────┘ └────┘ └────┘ └────┘
//! Input RecordBatch NUM_STREAMS input
//! Columns 1..N RecordBatches
//! INPUT_SIZE sorted rows (still INPUT_SIZE total
//! ~10% duplicates rows)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ Love the diagram!

@andygrove andygrove merged commit c327983 into apache:master May 20, 2022
@alamb alamb deleted the alamb/merge_benchmark branch May 20, 2022 17:26
@alamb
Copy link
Contributor Author

alamb commented May 20, 2022

Thanks @andygrove

@alamb alamb mentioned this pull request May 29, 2022
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants