Skip to content

Conversation

@adriangb
Copy link
Contributor

No description provided.

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation physical-plan Changes to the physical-plan crate labels Jan 27, 2026
@adriangb adriangb force-pushed the column-path-functions-logical branch 2 times, most recently from 73474e9 to 8f00c20 Compare January 27, 2026 20:50
@adriangb adriangb force-pushed the column-path-functions-logical branch from 8f00c20 to 36dda08 Compare January 27, 2026 22:22
adriangb added a commit to pydantic/datafusion that referenced this pull request Jan 29, 2026
… decisions

This extracts the ExpressionPlacement enum from PR apache#20036 to provide a
mechanism for expressions to indicate where they should be placed in
the query plan for optimal execution.

Changes:
- Add ExpressionPlacement enum with variants: Literal, Column,
  PlaceAtLeaves, PlaceAtRoot
- Add placement() method to Expr, ScalarUDF, ScalarUDFImpl traits
- Add placement() method to PhysicalExpr trait and implementations
- Implement placement() for GetFieldFunc to return PlaceAtLeaves when
  accessing struct fields with literal keys
- Replace is_expr_trivial() checks with placement() in optimizer and
  physical-plan projection code

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
adriangb added a commit to pydantic/datafusion that referenced this pull request Jan 31, 2026
… decisions

This extracts the ExpressionPlacement enum from PR apache#20036 to provide a
mechanism for expressions to indicate where they should be placed in
the query plan for optimal execution.

Changes:
- Add ExpressionPlacement enum with variants: Literal, Column,
  PlaceAtLeaves, PlaceAtRoot
- Add placement() method to Expr, ScalarUDF, ScalarUDFImpl traits
- Add placement() method to PhysicalExpr trait and implementations
- Implement placement() for GetFieldFunc to return PlaceAtLeaves when
  accessing struct fields with literal keys
- Replace is_expr_trivial() checks with placement() in optimizer and
  physical-plan projection code

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
github-merge-queue bot pushed a commit that referenced this pull request Feb 2, 2026
… decisions (#20065)

## Summary

This PR is part of work towards
#19387

Extracts the `ExpressionPlacement` enum from #20036 to
provide a mechanism for expressions to indicate where they should be
placed in the query plan for optimal execution.

I've opted to go the route of having expressions declare their behavior
via a new API on `enum Expr` and `trait PhysicalExpr`:

```rust
enum Expr {
    pub fn placement(&self) -> ExpressionPlacement { ... }
   ...
}
```
And:

```rust
trait PhysicalExpr {
   fn placement(&self) -> ExpressionPlacement { ... }
}
```

Where `ExpressionPlacement`:

```rust
enum ExpressionPlacement {
    /// Argument is a literal constant value or an expression that can be
    /// evaluated to a constant at planning time.
    Literal,
    /// Argument is a simple column reference.
    Column,
    /// Argument is a complex expression that can be safely placed at leaf nodes.
    /// For example, if `get_field(struct_col, 'field_name')` is implemented as a
    /// leaf-pushable expression, then it would return this variant.
    /// Then `other_leaf_function(get_field(...), 42)` could also be classified as
    /// leaf-pushable using the knowledge that `get_field(...)` is leaf-pushable.
    PlaceAtLeaves,
    /// Argument is a complex expression that should be placed at root nodes.
    /// For example, `min(col1 + col2)` is not leaf-pushable because it requires per-row computation.
    PlaceAtRoot,
}
```

We arrived at `ExprPlacement` after iterating through a version that
had:

```rust
enum ArgTriviality {
    Literal,
    Column,
    Trivial,
    NonTrivial,
}
```

This terminology came from existing concepts in the codebase that were
sprinkled around various places in the logical and physical layers. Some
examples:


https://github.com/apache/datafusion/blob/f819061833d0ee4d7899ed6a0a431c584533b241/datafusion/physical-plan/src/projection.rs#L282-L290


https://github.com/apache/datafusion/blob/f819061833d0ee4d7899ed6a0a431c584533b241/datafusion/physical-plan/src/projection.rs#L1120-L1125


https://github.com/apache/datafusion/blob/f819061833d0ee4d7899ed6a0a431c584533b241/datafusion/optimizer/src/optimize_projections/mod.rs#L589-L592

The new API adds the nuance / distinction of the case of `get_field(col,
'a')` where it is neither a column nor a literal but it is trivial.

It also gives scalar functions the ability to classify themselves.
This part was a bit tricky because `ScalarUDFImpl` (the scalar function
trait that users implement) lives in `datafuions-expr` which cannot have
references to `datafusion-physical-expr-common` (where `PhysicalExpr` is
defined).
But once we are in the physical layer scalar functions are represented
as `func: ScalarUDFImpl, args: Vec<Arc<dyn PhysicalExpr>>`.
And since we can't have a trait method referencing `PhysicalExpr` there
would be no way to ask a function to classify itself in the physical
layer.

Additionally even if we could refer to `PhysicalExpr` from the
`ScalarUDFImpl` trait we would then need 2 methods with similar but
divergent logic (match on the `Expr` enum in one, downcast to various
known types in the physical version) that adds boilerplate for
implementers.

The `ExprPlacement` enum solves this problem: we can have a single
method `ScalarUDFImpl::placement(args: &[ExpressionPlacement])`.
The parent of `ScalarUDFImpl` will call either `Expr::placement` or
`PhysicalExpr::placement` depending on which one it has.

## Changes

- Add `ExpressionPlacement` enum in `datafusion-expr-common` with four
variants:
  - `Literal` - constant values
  - `Column` - simple column references
- `PlaceAtLeaves` - cheap expressions (like `get_field`) that can be
pushed to leaf nodes
  - `PlaceAtRoot` - expensive expressions that should stay at root

- Add `placement()` method to:
  - `Expr` enum
- `ScalarUDF` / `ScalarUDFImpl` traits (with default returning
`PlaceAtRoot`)
  - `PhysicalExpr` trait (with default returning `PlaceAtRoot`)
- Physical expression implementations for `Column`, `Literal`, and
`ScalarFunctionExpr`

- Implement `placement()` for `GetFieldFunc` that returns
`PlaceAtLeaves` when accessing struct fields with literal keys

- Replace `is_expr_trivial()` function checks with `placement()` checks
in:
  - `datafusion/optimizer/src/optimize_projections/mod.rs`
  - `datafusion/physical-plan/src/projection.rs`

## Test Plan

- [x] `cargo check` passes on all affected packages
- [x] `cargo test -p datafusion-optimizer` passes
- [x] `cargo test -p datafusion-physical-plan` passes (except unrelated
zstd feature test)
- [x] `cargo test -p datafusion-functions --lib getfield` passes

🤖 Generated with [Claude Code](https://claude.ai/code)

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant