Skip to content

Properly documenting ExprIR #3497

@dangotbanned

Description

@dangotbanned

Description

(#2572) and more recently (expr-ir/logical-plan) add a lot of code, but only a little of docs.

My thinking was to document the deviations from main, since very big chunks are the same and already well-documented.
I was also concerned about sinking much time into things while they were changing.

Well now, enough of the core is stable (enough) to start blabbing about - so its time!

Documenting

Classes/docstrings

There are a lot, but many are tiny and only really need an understanding of a base class:

Look how smol

class _MomentAggExpr(AggExpr, dtype=map_first(dtm.moment_dtype)): ...
class Count(AggExpr, dtype=dtm.IDX_DTYPE):
"""Non-null count."""
class Len(AggExpr, dtype=dtm.IDX_DTYPE):
"""Null-inclusive count."""
class Max(AggExpr, dtype=same_dtype()): ...
class Mean(_MomentAggExpr): ...
class Median(_MomentAggExpr): ...

So these guys are my picks for what could be most beneficial for better docstrings in narwhals._plan.

Reading this back, okay this still looks like a lot 🤦‍♂️

Expressions

  • _expr_ir.ExprIR (priority)
  • expressions.aggregation.AggExpr
  • expressions.expr.Literal
  • expressions.name.KeepName
  • expressions.name.RenameAlias
  • options.ExprIROptions

Selectors

  • _expr_ir.SelectorIR
  • expressions.expr.RootSelector
  • expressions.expr.BinarySelector
  • expressions.expr.InvertSelector
  • expressions.selectors.Selector (priority)
  • expressions.selectors.DTypeSelector (priority)

Functions

  • _function.Function (priority)
  • expressions.expr.FunctionExpr (priority)
  • options.FunctionFlags
  • options.FunctionOptions (priority)
  • options.FunctionExprOptions
  • _dispatch.Dispatcher (priority)
  • _dtype.ResolveDType (new in logical-plan)

Expansion

  • _expr_ir.NamedIR
  • _expansion.Expander
  • meta.MetaNamespace
  • schema.FrozenSchema (+ freeze_schema )

Misc

  • _immutable.Immutable (priority)

Compliant

Narrative

Docstrings can only go so far.

How the pieces come together, why they do and what even is an IR in the first place?

questions asked by someone, probably

Note

Section needs more fluff

And now for some completely different

expr-ir/logical-plan adds (among other things), a new package /plans/.

While it is still a work-in-progress (read: expect things to change), the overall
idea is coming together.

The journey of a query looks like this:

LazyFrame -> LogicalPlan -> LogicalToResolved -> ResolvedPlan -> ResolvedToCompliant -> (CompliantLazyFrame | CompliantDataFrame | None)

Or more visually:

stateDiagram-v2
    [*] --> init_lp
    
    init_lp: Starting a LogicalPlan
    extend_lp: Extending a LogicalPlan

    init_lp --> extend_lp
    extend_lp --> LogicalToResolved

    %% Resolving the plan
    %% Doesn't look good when in a nested state
    LogicalToResolved --> ResolvedPlan


    state fork_resolve <<fork>>
  

    ResolvedPlan --> fork_resolve
    fork_resolve --> Schema: collect_schema
    fork_resolve --> ResolvedToCompliant: collect/sink_parquet

    ResolvedToCompliant --> CompliantLazyFrame
    CompliantLazyFrame --> CompliantDataFrame: collect
    CompliantDataFrame --> DataFrame
    CompliantLazyFrame --> None: sink_parquet
    
    state init_lp {
        [*] --> ScanCsv:  scan_csv
        --
        [*] --> ScanParquet:  scan_parquet
        --
        [*] --> ScanDataFrame: DataFrame.lazy(None)
        --
        [*] --> ScanLazyFrame: LazyFrame.from_native
        [*] --> ScanLazyFrame: DataFrame.lazy(backend)
    }
    
    state extend_lp {
        [*] --> LazyFrame
        LazyFrame --> LogicalPlan
        LogicalPlan --> LazyFrame
        LazyFrame --> Collect: collect
        LazyFrame --> SinkParquet: sink_parquet
        LazyFrame --> [*] : collect_schema

    }


    note left of init_lp
            First node has a schema.
            Children store unresolved
            ExprIR/SelectorIR(s)
    end note


    note right of LogicalToResolved
        A Protocol with a builtin 
        implementation (Resolver) 
        based *heavily* on polars.
    end note


    note left of ResolvedPlan
        Nodes that alter the 
        schema store an 
        output_schema.
        ExprIR resolve to NamedIR.
        SelectorIR expand to 
        tuple[str, ...]
    end note

    note right of Schema
        collect_schema() didn't 
        need to go through
        CompliantLazyFrame
    end note

    note right of ResolvedToCompliant
        collect() means we need
        *more than just a Schema*.
        Time to evaluate our plan!
    end note

    note right of ResolvedToCompliant
        Another Protocol, (like 
        LogicalToResolved)
        but backend-dependent
    end note

Loading

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationinternal

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions