-
Notifications
You must be signed in to change notification settings - Fork 180
Description
Description
(#2572) and more recently (expr-ir/logical-plan) add a lot of code, but only a little of docs.
My thinking was to document the deviations from main, since very big chunks are the same and already well-documented.
I was also concerned about sinking much time into things while they were changing.
Well now, enough of the core is stable (enough) to start blabbing about - so its time!
Documenting
Classes/docstrings
There are a lot, but many are tiny and only really need an understanding of a base class:
Look how smol
narwhals/narwhals/_plan/expressions/aggregation.py
Lines 46 to 53 in 51cebab
| class _MomentAggExpr(AggExpr, dtype=map_first(dtm.moment_dtype)): ... | |
| class Count(AggExpr, dtype=dtm.IDX_DTYPE): | |
| """Non-null count.""" | |
| class Len(AggExpr, dtype=dtm.IDX_DTYPE): | |
| """Null-inclusive count.""" | |
| class Max(AggExpr, dtype=same_dtype()): ... | |
| class Mean(_MomentAggExpr): ... | |
| class Median(_MomentAggExpr): ... |
So these guys are my picks for what could be most beneficial for better docstrings in narwhals._plan.
Reading this back, okay this still looks like a lot 🤦♂️
Expressions
-
_expr_ir.ExprIR(priority) -
expressions.aggregation.AggExpr -
expressions.expr.Literal -
expressions.name.KeepName -
expressions.name.RenameAlias -
options.ExprIROptions
Selectors
-
_expr_ir.SelectorIR -
expressions.expr.RootSelector -
expressions.expr.BinarySelector -
expressions.expr.InvertSelector -
expressions.selectors.Selector(priority) -
expressions.selectors.DTypeSelector(priority)
Functions
-
_function.Function(priority) -
expressions.expr.FunctionExpr(priority) -
options.FunctionFlags -
options.FunctionOptions(priority) -
options.FunctionExprOptions -
_dispatch.Dispatcher(priority) -
_dtype.ResolveDType(new inlogical-plan)
Expansion
-
_expr_ir.NamedIR -
_expansion.Expander -
meta.MetaNamespace -
schema.FrozenSchema(+freeze_schema)
Misc
-
_immutable.Immutable(priority)
Compliant
-
compliant.expr.CompliantExpr -
compliant.scalar.CompliantScalar -
compliant.{concat,io,ranges,translate}.*(in flux inlogical-plan)- All follow the same boilerplate
- Happy with the functionality, but maybe PEP 747 can reduce verbosity?
-
compliant.group_by.(Eager)DataFrameGroupBy(in flux inlogical-plan)- Big picture is they're the love-child of ...
Narrative
Docstrings can only go so far.
How the pieces come together, why they do and what even is an IR in the first place?
questions asked by someone, probably
Note
Section needs more fluff
And now for some completely different
expr-ir/logical-plan adds (among other things), a new package /plans/.
While it is still a work-in-progress (read: expect things to change), the overall
idea is coming together.
The journey of a query looks like this:
LazyFrame -> LogicalPlan -> LogicalToResolved -> ResolvedPlan -> ResolvedToCompliant -> (CompliantLazyFrame | CompliantDataFrame | None)Or more visually:
stateDiagram-v2
[*] --> init_lp
init_lp: Starting a LogicalPlan
extend_lp: Extending a LogicalPlan
init_lp --> extend_lp
extend_lp --> LogicalToResolved
%% Resolving the plan
%% Doesn't look good when in a nested state
LogicalToResolved --> ResolvedPlan
state fork_resolve <<fork>>
ResolvedPlan --> fork_resolve
fork_resolve --> Schema: collect_schema
fork_resolve --> ResolvedToCompliant: collect/sink_parquet
ResolvedToCompliant --> CompliantLazyFrame
CompliantLazyFrame --> CompliantDataFrame: collect
CompliantDataFrame --> DataFrame
CompliantLazyFrame --> None: sink_parquet
state init_lp {
[*] --> ScanCsv: scan_csv
--
[*] --> ScanParquet: scan_parquet
--
[*] --> ScanDataFrame: DataFrame.lazy(None)
--
[*] --> ScanLazyFrame: LazyFrame.from_native
[*] --> ScanLazyFrame: DataFrame.lazy(backend)
}
state extend_lp {
[*] --> LazyFrame
LazyFrame --> LogicalPlan
LogicalPlan --> LazyFrame
LazyFrame --> Collect: collect
LazyFrame --> SinkParquet: sink_parquet
LazyFrame --> [*] : collect_schema
}
note left of init_lp
First node has a schema.
Children store unresolved
ExprIR/SelectorIR(s)
end note
note right of LogicalToResolved
A Protocol with a builtin
implementation (Resolver)
based *heavily* on polars.
end note
note left of ResolvedPlan
Nodes that alter the
schema store an
output_schema.
ExprIR resolve to NamedIR.
SelectorIR expand to
tuple[str, ...]
end note
note right of Schema
collect_schema() didn't
need to go through
CompliantLazyFrame
end note
note right of ResolvedToCompliant
collect() means we need
*more than just a Schema*.
Time to evaluate our plan!
end note
note right of ResolvedToCompliant
Another Protocol, (like
LogicalToResolved)
but backend-dependent
end note