Skip to content

Commit

Permalink
BUG: AttributeError: 'BooleanArray' object has no attribute 'sum' whi…
Browse files Browse the repository at this point in the history
  • Loading branch information
zhangxiaoxing authored Nov 20, 2021
1 parent 63ebf77 commit c676844
Show file tree
Hide file tree
Showing 3 changed files with 26 additions and 1 deletion.
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.4.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -646,6 +646,7 @@ I/O
- Bug in :func:`read_csv` with :code:`float_precision="round_trip"` which did not skip initial/trailing whitespace (:issue:`43713`)
- Bug in dumping/loading a :class:`DataFrame` with ``yaml.dump(frame)`` (:issue:`42748`)
- Bug in :func:`read_csv` raising ``ValueError`` when ``parse_dates`` was used with ``MultiIndex`` columns (:issue:`8991`)
- Bug in :func:`read_csv` raising ``AttributeError`` when attempting to read a .csv file and infer index column dtype from an nullable integer type (:issue:`44079`)
- :meth:`DataFrame.to_csv` and :meth:`Series.to_csv` with ``compression`` set to ``'zip'`` no longer create a zip file containing a file ending with ".zip". Instead, they try to infer the inner file name more smartly. (:issue:`39465`)

Period
Expand Down
2 changes: 1 addition & 1 deletion pandas/io/parsers/base_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -705,7 +705,7 @@ def _infer_types(self, values, na_values, try_num_bool=True):
# error: Argument 2 to "isin" has incompatible type "List[Any]"; expected
# "Union[Union[ExtensionArray, ndarray], Index, Series]"
mask = algorithms.isin(values, list(na_values)) # type: ignore[arg-type]
na_count = mask.sum()
na_count = mask.astype("uint8", copy=False).sum()
if na_count > 0:
if is_integer_dtype(values):
values = values.astype(np.float64)
Expand Down
24 changes: 24 additions & 0 deletions pandas/tests/io/parser/test_index_col.py
Original file line number Diff line number Diff line change
Expand Up @@ -297,3 +297,27 @@ def test_multiindex_columns_index_col_with_data(all_parsers):
index=Index(["data"]),
)
tm.assert_frame_equal(result, expected)


@skip_pyarrow
def test_infer_types_boolean_sum(all_parsers):
# GH#44079
parser = all_parsers
result = parser.read_csv(
StringIO("0,1"),
names=["a", "b"],
index_col=["a"],
dtype={"a": "UInt8"},
)
expected = DataFrame(
data={
"a": [
0,
],
"b": [1],
}
).set_index("a")
# Not checking index type now, because the C parser will return a
# index column of dtype 'object', and the Python parser will return a
# index column of dtype 'int64'.
tm.assert_frame_equal(result, expected, check_index_type=False)

0 comments on commit c676844

Please sign in to comment.