Skip to content

BUG: NA value doesn't match mask condition, still masked #52955

Open
@shobsi

Description

@shobsi

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

>>> import pandas as pd
>>> s = pd.Series([123456789, -987654321, 314159, pd.NA, -234892, 55555], name='int64_col', dtype=pd.Int64Dtype())
>>> s
0     123456789
1    -987654321
2        314159
3          <NA>
4       -234892
5         55555
Name: int64_col, dtype: Int64
>>> s.mask(s%2 == 1)
0       <NA>
1       <NA>
2       <NA>
3       <NA>
4    -234892
5       <NA>
Name: int64_col, dtype: Int64
>>> s.mask(s%2 == 1, -1)
0         -1
1         -1
2         -1
3         -1
4    -234892
5         -1
Name: int64_col, dtype: Int64
>>> pd.__version__
'2.0.1'
>>>

Issue Description

Series.mask API is masking NA which does not match the mask condition. See the repro example.

Expected Behavior

>>> import pandas as pd
>>> s = pd.Series([123456789, -987654321, 314159, pd.NA, -234892, 55555], name='int64_col', dtype=pd.Int64Dtype())
>>> s
0     123456789
1    -987654321
2        314159
3          <NA>
4       -234892
5         55555
Name: int64_col, dtype: Int64
>>> s.mask(s%2 == 1)
0       <NA>
1       <NA>
2       <NA>
3       <NA>
4    -234892
5       <NA>
Name: int64_col, dtype: Int64
>>> s.mask(s%2 == 1, -1)
0         -1
1         -1
2         -1
3         <NA>
4    -234892
5         -1
Name: int64_col, dtype: Int64

Installed Versions

pd.__version__
'2.0.1'

Metadata

Metadata

Assignees

Labels

BugConditionalsE.g. where, mask, case_whenNA - MaskedArraysRelated to pd.NA and nullable extension arrays

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions