fix: Return the truth values of `ne_missing` and `eq_missing` operations for struct instead of `null` #18930

barak1412 · 2024-09-25T15:27:37Z

barak1412 · 2024-09-25T15:53:54Z

I saw the failures, will work on a fix.

barak1412 · 2024-09-26T16:20:43Z

@ritchie46 I found the problem - the root of the problem is not in my code:
Before my change, the function assert_frame_equal was buggy due to the issue this PR attempts to solve.

For instance, this code won't fail:

import polars as pl
from polars.testing import assert_frame_equal


df1 = pl.DataFrame({
    "a": [None, {"foo": 0, "bar": "1"}],
})
df2 = pl.DataFrame({
    "a": [{"foo": 0, "bar": "1"}, {"foo": 0, "bar": "1"}],
})

assert_frame_equal(df1, df2)

So although the code in main branch passes, there is a bug in when().then().else() expression.
Do you want me to try solve the issue in this PR or open a new issue (and temporary remove the struct from the failed test) and try to solve it first?

EDIT:
I need some further investigation, although I am sure assert_frame_equal has bug due to this PR issue.

barak1412 · 2024-09-26T17:08:08Z

It looks like a regression bug (took code from my failed test):

import polars as pl


dtype = pl.Struct({"foo": pl.Int32, "bar": pl.String})
len = 2

broadcast = (False, False, True)
mask = [False, True]
if_true = [None, {'foo': 0, 'bar': '1'}]
if_false = [{'foo': 0, 'bar': '1'}, {'foo': 0, 'bar': '1'}]

py_mask, py_true, py_false = (
    [c[0]] * len if b else c
    for b, c in zip(broadcast, [mask, if_true, if_false])
)
pl_mask, pl_true, pl_false = (
    c.first() if b else c
    for b, c in zip(broadcast, [pl.col.mask, pl.col.if_true, pl.col.if_false])
)

expected = pl.DataFrame(
    {"if_true": [t if m else f for m, t, f in zip(py_mask, py_true, py_false)]},
    schema={"if_true": dtype},
)
df = pl.DataFrame(
    {
        "mask": mask,
        "if_true": if_true,
        "if_false": if_false,
    },
    schema={"mask": pl.Boolean, "if_true": dtype, "if_false": dtype},
)

ans = df.select(pl.when(pl.col.mask).then(pl.col.if_true).otherwise(pl.col.if_false.first()))

print(expected)
print(ans)

On main branch:

shape: (2, 1)
┌───────────┐
│ if_true   │
│ ---       │
│ struct[2] │
╞═══════════╡
│ {0,"1"}   │
│ {0,"1"}   │
└───────────┘
shape: (2, 1)
┌───────────┐
│ if_true   │
│ ---       │
│ struct[2] │
╞═══════════╡
│ null      │
│ {0,"1"}   │
└

On 1.7.0:

shape: (2, 1)
┌───────────┐
│ if_true   │
│ ---       │
│ struct[2] │
╞═══════════╡
│ {0,"1"}   │
│ {0,"1"}   │
└───────────┘
shape: (2, 1)
┌───────────┐
│ if_true   │
│ ---       │
│ struct[2] │
╞═══════════╡
│ {0,"1"}   │
│ {0,"1"}   │
└───────────┘

Can someone reproduce?

barak1412 · 2024-09-26T18:50:57Z

To wrap all things up, this is what happend:

Since some version after 1.7.0 and current main, a regression bug has occured in when().then().otherwise() expression.
Due to a bug in ne_missing which influences assert_frame_equal, the regression bug wasn't exposed.
When this PR fixed the ne_missing bug, assert_frame_equal was also fixed and the regression bug was exposed and made the test fail.

I plan tommorow to open an issue of the regression bug, and when it will get fixed (I hope I will have time to), come back to this PR.

cmdlineluser · 2024-09-26T19:01:03Z

I can reproduce the test behaviour.

On 1.8.1 I get {0,"1"}

On 1.8.2 I get null

Update: This seems to be where the behaviour changes: #18886

It seems there is a problem once you add another expression e.g. first/last/etc

df = pl.DataFrame({"foo": [dict(a=1), dict(a=1)]})

df.select(pl.col.foo == pl.col.foo)
# shape: (2, 1)
# ┌──────┐
# │ foo  │
# │ ---  │
# │ bool │
# ╞══════╡
# │ true │
# │ true │
# └──────┘

df.select(pl.col.foo == pl.col.foo.last())
# shape: (2, 1)
# ┌───────┐
# │ foo   │
# │ ---   │
# │ bool  │
# ╞═══════╡
# │ false │
# │ false │
# └───────┘

ritchie46 · 2024-09-27T11:33:40Z

@coastalwhite can you take a look?

…truct_missing_comparision_ops

codecov · 2024-09-29T14:23:58Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.88%. Comparing base (d0963db) to head (9b45bbd).

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #18930      +/-   ##
==========================================
- Coverage   79.89%   79.88%   -0.01%     
==========================================
  Files        1524     1524              
  Lines      207655   207661       +6     
  Branches     2904     2904              
==========================================
- Hits       165898   165896       -2     
- Misses      41210    41218       +8     
  Partials      547      547

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

coastalwhite · 2024-09-29T14:52:08Z

crates/polars-core/src/chunked_array/comparison/mod.rs

@@ -801,6 +804,7 @@ impl ChunkCompareEq<&StructChunked> for StructChunked {
            |l, r| l.equal(r).unwrap(),


Actually, this is not what this PR is trying to solve, but we made a policy that nested equality is always eq_missing. Maybe, this should changed here too.

Make sense, thanks.

I think not all types support eq_missing, so the code reaches panic.
Maybe it is better to get fixed in separate PR?

coastalwhite · 2024-09-29T14:53:09Z

crates/polars-core/src/chunked_array/comparison/mod.rs

@@ -759,6 +759,7 @@ fn struct_helper<F, R>(
    op: F,
    reduce: R,
    value: bool,
+    apply_null_validity: bool,


nit: this is named is_missing in other helpers. Maybe that should be here as well. Not really a blocker though

Sure, I will rename to is_missing

ritchie46 · 2024-09-29T15:22:46Z

Thank you @barak1412 and thanks for the review @coastalwhite

ritchie46 · 2024-09-29T17:14:38Z

Failure is unrelated.

barak1412 · 2024-09-29T17:17:49Z

@ritchie46 Thanks! (by the way make pre-commit is broken?)

first working version with test

d0b0385

barak1412 requested review from ritchie46, c-peters, alexander-beedie, MarcoGorelli, reswqa and orlp as code owners September 25, 2024 15:27

github-actions bot added fix Bug fix python Related to Python Polars rust Related to Rust Polars labels Sep 25, 2024

barak1412 added 2 commits September 25, 2024 19:32

second trial

434b124

third trial

e90a30f

ritchie46 changed the title ~~fix: Return the trurth values of ne_missing and eq_missing operations for struct instead of null~~ fix: Return the truth values of ne_missing and eq_missing operations for struct instead of null Sep 26, 2024

debug

7ddee06

barak1412 mentioned this pull request Sep 27, 2024

Regression - when().then().else() wrong result on struct #18961

Closed

2 tasks

barak1412 mentioned this pull request Sep 27, 2024

fix: Return correct value for when().then().else() on structs when using first()\last() #18969

Merged

barak1412 added 3 commits September 27, 2024 17:44

working version

bc51878

Merge branch 'main' of https://github.com/barak1412/polars into fix_s…

87559e6

…truct_missing_comparision_ops

Merge branch 'main' of https://github.com/barak1412/polars into fix_s…

9b45bbd

…truct_missing_comparision_ops

coastalwhite approved these changes Sep 29, 2024

View reviewed changes

barak1412 added 2 commits September 29, 2024 18:03

cr fix

9d0e073

cr fix

b26829f

ritchie46 merged commit c23266b into pola-rs:main Sep 29, 2024
24 of 25 checks passed

barak1412 mentioned this pull request Sep 30, 2024

fix: Force nested struct missing equality #19031

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Return the truth values of `ne_missing` and `eq_missing` operations for struct instead of `null` #18930

fix: Return the truth values of `ne_missing` and `eq_missing` operations for struct instead of `null` #18930

barak1412 commented Sep 25, 2024 •

edited

Loading

barak1412 commented Sep 25, 2024

barak1412 commented Sep 26, 2024 •

edited

Loading

barak1412 commented Sep 26, 2024

barak1412 commented Sep 26, 2024

cmdlineluser commented Sep 26, 2024 •

edited

Loading

ritchie46 commented Sep 27, 2024

codecov bot commented Sep 29, 2024

coastalwhite Sep 29, 2024

barak1412 Sep 29, 2024

barak1412 Sep 29, 2024

coastalwhite Sep 29, 2024

barak1412 Sep 29, 2024

ritchie46 commented Sep 29, 2024

ritchie46 commented Sep 29, 2024

barak1412 commented Sep 29, 2024

		@@ -801,6 +804,7 @@ impl ChunkCompareEq<&StructChunked> for StructChunked {
		\|l, r\| l.equal(r).unwrap(),

fix: Return the truth values of ne_missing and eq_missing operations for struct instead of null #18930

fix: Return the truth values of ne_missing and eq_missing operations for struct instead of null #18930

Conversation

barak1412 commented Sep 25, 2024 • edited Loading

barak1412 commented Sep 25, 2024

barak1412 commented Sep 26, 2024 • edited Loading

barak1412 commented Sep 26, 2024

barak1412 commented Sep 26, 2024

cmdlineluser commented Sep 26, 2024 • edited Loading

ritchie46 commented Sep 27, 2024

codecov bot commented Sep 29, 2024

Codecov Report

coastalwhite Sep 29, 2024

Choose a reason for hiding this comment

barak1412 Sep 29, 2024

Choose a reason for hiding this comment

barak1412 Sep 29, 2024

Choose a reason for hiding this comment

coastalwhite Sep 29, 2024

Choose a reason for hiding this comment

barak1412 Sep 29, 2024

Choose a reason for hiding this comment

ritchie46 commented Sep 29, 2024

ritchie46 commented Sep 29, 2024

barak1412 commented Sep 29, 2024

fix: Return the truth values of `ne_missing` and `eq_missing` operations for struct instead of `null` #18930

fix: Return the truth values of `ne_missing` and `eq_missing` operations for struct instead of `null` #18930

barak1412 commented Sep 25, 2024 •

edited

Loading

barak1412 commented Sep 26, 2024 •

edited

Loading

cmdlineluser commented Sep 26, 2024 •

edited

Loading