Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creation of CI artifacts for cudf-polars wheels #16680

Merged
merged 48 commits into from
Sep 17, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
7742b8b
Avoid GPU initialisation during import
wence- Jul 26, 2024
ef0b49f
Require polars >= 1.3
wence- Jul 29, 2024
e9fd96d
Adapt to IR changes
wence- Jul 25, 2024
9d69621
Use new IR versioning to report if we don't support an IR version
wence- Jul 26, 2024
918a40e
Use new GPUEngine config object to set things up
wence- Jul 22, 2024
f8f2d0d
Plausibly provide useful error message if driver is too old
wence- Jul 26, 2024
bcedb6b
Update overview docs
wence- Jul 24, 2024
6f2d406
Support right join
wence- Jul 29, 2024
f3bbd3f
Test that invalid GPUEngine config raises
wence- Jul 29, 2024
1d4c30c
More coverage for gpuengine config
wence- Jul 29, 2024
abcf22b
Versioned handling of PythonScan translation
wence- Jul 30, 2024
62a5dbd
Merge pull request #16347 from wence-/wence/fea/polars-engine-config
wence- Aug 2, 2024
7d0c7ad
Adapt to IR changes in polars 1.4 (#16494)
lithomas1 Aug 5, 2024
5de29b3
Implement polars string Replace and ReplaceMany (#16039)
lithomas1 Aug 6, 2024
7f6b00f
Use a key column rather than a placeholder for count agg
wence- Aug 19, 2024
822e7d0
Backport: Remove cuDF dependency from pylibcudf column from_device te…
lithomas1 Aug 20, 2024
152111b
Implement scan-based whole-frame aggregations for cudf-polars (#16509)
lithomas1 Aug 20, 2024
13a1493
Merge pull request #16599 from wence/fix/remove-placeholder-column
wence- Aug 21, 2024
7cf3289
Implement order preserving groupby in cudf-polars (#16555)
lithomas1 Aug 22, 2024
f6c938f
Fix integer overflow in indexalator pointer logic
davidwendt Aug 22, 2024
4ded370
use std::ptrdiff_t
davidwendt Aug 23, 2024
edabb67
Correctly export empty column names in DataFrame.to_polars (#16596)
wence- Aug 27, 2024
a4c35e9
Forward-merge 24.08
wence- Aug 27, 2024
0a95b2c
Add more `cudf-polars` unaryops (#16579)
brandon-b-miller Aug 27, 2024
cc892fc
Merge pull request #16667 from wence-/wence/merge-2408
wence- Aug 27, 2024
41a3a95
Add `pylibcudf`/`cudf-polars` string `strip` (#16504)
brandon-b-miller Aug 27, 2024
0bf68d4
`cudf-polars`/`pylibcudf` string -> date parsing (#16306)
brandon-b-miller Aug 28, 2024
40d33cb
Support quantile in cudf_polars (#16093)
lithomas1 Aug 29, 2024
95da2c5
Implement handlers for first/last in groupby (#16688)
wence- Aug 30, 2024
434afab
Ensure IR validation always checks for empty columns
wence- Aug 30, 2024
385ae98
Need to check for nulls in nested dtypes
wence- Aug 30, 2024
1cf1146
Add test reading nested Null column
wence- Aug 30, 2024
de445a3
Move creation of regex program to initialisation
wence- Aug 30, 2024
f39713e
Merge pull request #16703 from wence-/wence/fea/polars-reject-invalid…
wence- Aug 30, 2024
ad364c6
Include failing node in error message
wence- Aug 30, 2024
d158b22
Merge pull request #16702 from wence-/wence/fea/polars-no-empty-columns
wence- Sep 2, 2024
b550645
Partially reject dynamic groupby (#16720)
wence- Sep 3, 2024
eb2a23e
Implement Kleene logic handling for Any/All and bitwise Or/And (#16476)
wence- Sep 4, 2024
ebc3bbe
Some fixes for unary functions (#16719)
wence- Sep 4, 2024
5d262df
Implement unpivot in cudf-polars (#16689)
wence- Sep 4, 2024
c76e90b
Small scan-handler fixes (#16721)
wence- Sep 4, 2024
ccb8061
Implement cudf-polars datetime extraction methods (#16500)
lithomas1 Sep 5, 2024
feb2e63
Polars 1.7 will change a minor thing in the IR, adapt to that (#16755)
wence- Sep 6, 2024
6d2e455
Run polars test suite (defaulting to GPU) in CI (#16710)
wence- Sep 6, 2024
1b5cb1a
skip test_groupby_literal_in_agg if polars>=1.7.1
brandon-b-miller Sep 16, 2024
b6a110e
API Doc for Polars GPU Engine (#16753)
singhmanas1 Sep 16, 2024
3b7ffb8
test in polars 1.7.0 environment
brandon-b-miller Sep 16, 2024
9428154
Revert "skip test_groupby_literal_in_agg if polars>=1.7.1"
brandon-b-miller Sep 16, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Some fixes for unary functions (#16719)
Correctly handle `pow` and `log` by translating to binary expressions
when we observe the node.

Upgrade our minimum supported polars version (so that we see all these
function names from the rust IR).

Also tighten check for which groupby-aggs are supported when the
expression contains a unary function.
  • Loading branch information
wence- authored Sep 4, 2024
commit ebc3bbe4eefc07dcf917b659418740b85b1a1e4a
2 changes: 1 addition & 1 deletion dependencies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -631,7 +631,7 @@ dependencies:
common:
- output_types: [conda, requirements, pyproject]
packages:
- polars>=1.3
- polars>=1.6
run_dask_cudf:
common:
- output_types: [conda, requirements, pyproject]
Expand Down
20 changes: 3 additions & 17 deletions python/cudf_polars/cudf_polars/dsl/expr.py
Original file line number Diff line number Diff line change
Expand Up @@ -1003,6 +1003,7 @@ class UnaryFunction(Expr):
_non_child = ("dtype", "name", "options")
children: tuple[Expr, ...]

# Note: log, and pow are handled via translation to binops
_OP_MAPPING: ClassVar[dict[str, plc.unary.UnaryOperator]] = {
"sin": plc.unary.UnaryOperator.SIN,
"cos": plc.unary.UnaryOperator.COS,
Expand All @@ -1017,7 +1018,6 @@ class UnaryFunction(Expr):
"arccosh": plc.unary.UnaryOperator.ARCCOSH,
"arctanh": plc.unary.UnaryOperator.ARCTANH,
"exp": plc.unary.UnaryOperator.EXP,
"log": plc.unary.UnaryOperator.LOG,
"sqrt": plc.unary.UnaryOperator.SQRT,
"cbrt": plc.unary.UnaryOperator.CBRT,
"ceil": plc.unary.UnaryOperator.CEIL,
Expand All @@ -1034,7 +1034,6 @@ class UnaryFunction(Expr):
"round",
"set_sorted",
"unique",
"pow",
}
)
_supported_cum_aggs = frozenset(
Expand Down Expand Up @@ -1169,21 +1168,6 @@ def do_evaluate(
)
arg = evaluated.obj_scalar if evaluated.is_scalar else evaluated.obj
return Column(plc.replace.replace_nulls(column.obj, arg))
elif self.name == "pow":
(base, exponent) = (
c.evaluate(df, context=context, mapping=mapping) for c in self.children
)
base_obj = (
base.obj_scalar
if (base.is_scalar and not exponent.is_scalar)
else base.obj
)
exponent_obj = exponent.obj_scalar if exponent.is_scalar else exponent.obj
return Column(
plc.binaryop.binary_operation(
base_obj, exponent_obj, plc.binaryop.BinaryOperator.POW, self.dtype
)
)
elif self.name in self._OP_MAPPING:
column = self.children[0].evaluate(df, context=context, mapping=mapping)
if column.obj.type().id() != self.dtype.id():
Expand Down Expand Up @@ -1241,6 +1225,8 @@ def do_evaluate(

def collect_agg(self, *, depth: int) -> AggInfo:
"""Collect information about aggregations in groupbys."""
if self.name in {"unique", "drop_nulls"} | self._supported_cum_aggs:
raise NotImplementedError(f"{self.name} in groupby")
if depth == 1:
# inside aggregation, need to pre-evaluate, groupby
# construction has checked that we don't have nested aggs,
Expand Down
52 changes: 20 additions & 32 deletions python/cudf_polars/cudf_polars/dsl/translate.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,25 +76,11 @@ def _translate_ir(
def _(
node: pl_ir.PythonScan, visitor: NodeTraverser, schema: dict[str, plc.DataType]
) -> ir.IR:
if visitor.version()[0] == 1:
# https://github.com/pola-rs/polars/pull/17939
# Versioning can be dropped once polars 1.4 is lowest
# supported version.
scan_fn, with_columns, source_type, predicate, nrows = node.options
options = (scan_fn, with_columns, source_type, nrows)
predicate = (
translate_named_expr(visitor, n=predicate)
if predicate is not None
else None
)
else: # pragma: no cover; CI tests 1.4
# version == 0
options = node.options
predicate = (
translate_named_expr(visitor, n=node.predicate)
if node.predicate is not None
else None
)
scan_fn, with_columns, source_type, predicate, nrows = node.options
options = (scan_fn, with_columns, source_type, nrows)
predicate = (
translate_named_expr(visitor, n=predicate) if predicate is not None else None
)
return ir.PythonScan(schema, options, predicate)


Expand All @@ -115,13 +101,8 @@ def _(
n_rows = -1 # All rows
skip_rows = 0 # Don't skip
else:
if visitor.version() >= (1, 0):
# Polars 1.4 n_rows property is (skip, nrows)
skip_rows, n_rows = n_rows
else: # pragma: no cover; CI tests 1.4
# Polars 1.3 n_rows property is integer, skip rows was
# always zero because it was not pushed down to reader.
skip_rows = 0
# TODO: with versioning, rename on the rust side
skip_rows, n_rows = n_rows

row_index = file_options.row_index
return ir.Scan(
Expand Down Expand Up @@ -445,12 +426,19 @@ def _(node: pl_expr.Function, visitor: NodeTraverser, dtype: plc.DataType) -> ex
*(translate_expr(visitor, n=n) for n in node.input),
)
elif isinstance(name, str):
return expr.UnaryFunction(
dtype,
name,
options,
*(translate_expr(visitor, n=n) for n in node.input),
)
children = (translate_expr(visitor, n=n) for n in node.input)
if name == "log":
(base,) = options
(child,) = children
return expr.BinOp(
dtype,
plc.binaryop.BinaryOperator.LOG_BASE,
child,
expr.Literal(dtype, pa.scalar(base, type=plc.interop.to_arrow(dtype))),
)
elif name == "pow":
return expr.BinOp(dtype, plc.binaryop.BinaryOperator.POW, *children)
return expr.UnaryFunction(dtype, name, options, *children)
raise NotImplementedError(
f"No handler for Expr function node with {name=}"
) # pragma: no cover; polars raises on the rust side for now
Expand Down
10 changes: 5 additions & 5 deletions python/cudf_polars/cudf_polars/utils/versions.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,11 @@

POLARS_VERSION = parse(__version__)

POLARS_VERSION_GE_13 = POLARS_VERSION >= parse("1.3")
POLARS_VERSION_GT_13 = POLARS_VERSION > parse("1.3")
POLARS_VERSION_LT_13 = POLARS_VERSION < parse("1.3")
POLARS_VERSION_GE_16 = POLARS_VERSION >= parse("1.6")
POLARS_VERSION_GT_16 = POLARS_VERSION > parse("1.6")
POLARS_VERSION_LT_16 = POLARS_VERSION < parse("1.6")

if POLARS_VERSION < parse("1.3"):
if POLARS_VERSION_LT_16:
raise ImportError(
"cudf_polars requires py-polars v1.3 or greater."
"cudf_polars requires py-polars v1.6 or greater."
) # pragma: no cover
2 changes: 1 addition & 1 deletion python/cudf_polars/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ license = { text = "Apache 2.0" }
requires-python = ">=3.9"
dependencies = [
"cudf==24.8.*,>=0.0.0a0",
"polars>=1.3",
"polars>=1.6",
] # This list was generated by `rapids-dependency-file-generator`. To make changes, edit ../../dependencies.yaml and run `rapids-dependency-file-generator`.
classifiers = [
"Intended Audience :: Developers",
Expand Down
15 changes: 13 additions & 2 deletions python/cudf_polars/tests/expressions/test_numeric_unaryops.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,7 @@
"arcsinh",
"arccosh",
"arctanh",
# "exp", Missing rust side impl
# "log", Missing rust side impl
"exp",
"sqrt",
"cbrt",
"ceil",
Expand Down Expand Up @@ -78,3 +77,15 @@ def test_pow(ldf, base_literal, exponent_literal):
q = ldf.select(base.pow(exponent))

assert_gpu_result_equal(q, check_exact=False)


@pytest.mark.parametrize("natural", [True, False])
def test_log(ldf, natural):
if natural:
expr = pl.col("a").log()
else:
expr = pl.col("a").log(10)

q = ldf.select(expr)

assert_gpu_result_equal(q, check_exact=False)
9 changes: 9 additions & 0 deletions python/cudf_polars/tests/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -178,3 +178,12 @@ def test_groupby_literal_in_agg(df, key, expr):
# so just sort by the group key
q = df.group_by(key).agg(expr).sort(key, maintain_order=True)
assert_gpu_result_equal(q)


@pytest.mark.parametrize(
"expr",
[pl.col("int").unique(), pl.col("int").drop_nulls(), pl.col("int").cum_max()],
)
def test_groupby_unary_non_pointwise_raises(df, expr):
q = df.group_by("key1").agg(expr)
assert_ir_translation_raises(q, NotImplementedError)
Loading