Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Extend PruningPredicate with background and implementation info #9184

Merged
merged 12 commits into from
Feb 12, 2024

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Feb 9, 2024

Draft as it builds on #9183

Which issue does this PR close?

related to #9171

Rationale for this change

We use the PruningPredicate logic extensively in InfluxDB and we have found it is not an easy concept to understand. I have struggled to explain it face to face concisely so I wrote it down with the help of @appletreeisyellow

I would like to add this content to help both our future engineers as well as the broader community understand how this code works (perhaps so they can extend it)

I also harbor an ulterior motive that anyone who works on / with DataFusion can get an accessable master class in practical database implementation techniques by reading our docs

What changes are included in this PR?

Add background and algorithm description to PruningPredicate docs

Are these changes tested?

By doc test

Are there any user-facing changes?

More docs. No code changes

@alamb alamb added the documentation Improvements or additions to documentation label Feb 9, 2024
@github-actions github-actions bot added core Core DataFusion crate and removed documentation Improvements or additions to documentation labels Feb 9, 2024
@alamb alamb changed the title Docs: Extend PruningPredicate with background and implementation info… @alamb Docs: Extend PruningPredicate with background and implementation info Feb 9, 2024
datafusion/core/src/physical_optimizer/pruning.rs Outdated Show resolved Hide resolved
datafusion/core/src/physical_optimizer/pruning.rs Outdated Show resolved Hide resolved
///
/// If, for some other container, we knew `y` was between the values `4` and
/// `15`, then the rewritten predicate evaluates to `true` (verifying this is
/// left as an exercise to the reader -- are you still here?), and the container
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤣

///
/// * Rows that evaluate to `true` are returned in the query results
/// * Rows that evaluate to `false` are not returned (“filtered out” or “pruned” or “skipped”).
/// * Rows that evaluate to `NULL` are **NOT** returned (also “filtered out”) – *this property appears many times in the discussion below*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rows that evaluate to NULL are NOT returned (also “filtered out”)

Seems like this is the opposite of what discussed below.

For example, the quote below says -- "containers that evaluate to NULL are returned (also 'kept')"

/// * NULL: there MAY be rows that pass the predicate, KEEPS the container
/// Note that rewritten predicate can evaluate to NULL when some of
/// the min/max values are not known.

Did I misunderstand something here? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good observation.

Yes the NULL semantics in filter expressions is the opposite of what the rewritten pruning predicate does with NULL (which is quite confusing). I added a note trying to clarify this: 74261e8

@alamb alamb marked this pull request as ready for review February 11, 2024 11:54
@alamb alamb changed the title Docs: Extend PruningPredicate with background and implementation info Docs: Extend PruningPredicate with background and implementation info Feb 11, 2024
Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me, thanks for the great explanation. I'm not familiar with the PruningPredicate code much, but from the documentation alone here it looks good (reviewing from phone so it's a little hard to check out the code 😅)

datafusion/core/src/physical_optimizer/pruning.rs Outdated Show resolved Hide resolved
Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
@alamb alamb merged commit dc41ab5 into apache:main Feb 12, 2024
22 checks passed
@alamb alamb deleted the alamb/pruning_predicate_essay branch February 12, 2024 11:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants