-
Notifications
You must be signed in to change notification settings - Fork 134
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
When no window frame is specified in the python implementation, we default to unbounded preceeding to current row. If we are to follow PostgreSQL implementation then we should set this value when order_by is specified and otherwise default to unbounded preceeding to unbounded following.
To Reproduce
from datafusion import SessionContext, WindowFrame, col, lit, functions as F
import pyarrow as pa
ctx = SessionContext()
# create a RecordBatch and a new DataFrame from it
batch = pa.RecordBatch.from_arrays(
[pa.array([1.0, 10.0, 20.0])],
names=["a"],
)
df = ctx.create_dataframe([[batch]])
window_frame = WindowFrame("rows", None, None)
df = df.select(col("a"), F.window("avg", [col("a")]).alias('no_frame'), F.window("avg", [col("a")], window_frame=window_frame).alias('with_frame'))
df.show()
Produces:
DataFrame()
+------+--------------------+--------------------+
| a | no_frame | with_frame |
+------+--------------------+--------------------+
| 1.0 | 1.0 | 10.333333333333334 |
| 10.0 | 5.5 | 10.333333333333334 |
| 20.0 | 10.333333333333334 | 10.333333333333334 |
+------+--------------------+--------------------+
Expected behavior
When order_by is not specified, default to unbounded preceeding to unbounded following.
Additional context
The offending line of code appears to be here:
https://github.com/apache/datafusion-python/blob/main/src/functions.rs#L230
Michael-J-Ward
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working