Skip to content

BUG: histogram weights aren't dropped if NaN values in data #48888

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 40 commits into from
Jan 3, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
1c9854f
drop weights corresponding to nan y
AdamOrmondroyd Sep 30, 2022
ce78d4a
Merge branch 'main' into weights
AdamOrmondroyd Sep 30, 2022
43db4be
remove unused/nonexistent import from previous experiment
AdamOrmondroyd Sep 30, 2022
3d9393c
correct handling of weights kwd (each loop makes a new copy of self.k…
AdamOrmondroyd Sep 30, 2022
2c56296
Merge branch 'main' into weights
AdamOrmondroyd Sep 30, 2022
7419c70
no need to set kwds[weights] unless weights is None
AdamOrmondroyd Sep 30, 2022
2b75c3b
formatted with black
AdamOrmondroyd Sep 30, 2022
e23efe1
create test
AdamOrmondroyd Oct 1, 2022
8ab15d2
Merge branch 'main' into weights
AdamOrmondroyd Oct 1, 2022
f62f18f
tidy slicing
AdamOrmondroyd Oct 1, 2022
cb02106
Merge branch 'main' into weights
AdamOrmondroyd Oct 2, 2022
7028e75
remove square brackets
AdamOrmondroyd Oct 4, 2022
ab2941d
don't pick rows at random
AdamOrmondroyd Oct 4, 2022
a506972
set df nan elements individually
AdamOrmondroyd Oct 4, 2022
6ff09b3
clearer comment
AdamOrmondroyd Oct 4, 2022
9ee8759
instead of random df and weights, manually create a 3x3 df etc
AdamOrmondroyd Oct 6, 2022
1f5734a
Merge branch 'pandas-dev:main' into weights
AdamOrmondroyd Oct 6, 2022
b29b50d
Merge branch 'main' into weights
mroeschke Oct 6, 2022
f0054d6
Merge branch 'main' into weights
AdamOrmondroyd Oct 7, 2022
7f3fd9a
nested if to save checking None twice
AdamOrmondroyd Oct 7, 2022
42a12e3
Merge branch 'pandas-dev:main' into weights
AdamOrmondroyd Oct 7, 2022
59fdac1
add entry to docs
AdamOrmondroyd Oct 10, 2022
c9e5a36
response to comment on docs entry
AdamOrmondroyd Oct 10, 2022
58e8da3
Merge branch 'pandas-dev:main' into weights
AdamOrmondroyd Oct 12, 2022
599359b
specify DataFrame.plot.hist
AdamOrmondroyd Oct 12, 2022
e946b50
change test name to test_hist_with_nans_and_weights
AdamOrmondroyd Oct 12, 2022
a20b8ad
remove comments from test
AdamOrmondroyd Oct 12, 2022
53228b7
Merge branch 'main' into weights
AdamOrmondroyd Oct 14, 2022
4dd9471
Merge branch 'main' into weights
AdamOrmondroyd Nov 6, 2022
e0a2944
skip test if no mpl
AdamOrmondroyd Nov 6, 2022
d0330bc
Merge branch 'main' into weights
AdamOrmondroyd Dec 1, 2022
6823577
check that weights is the correct shape
AdamOrmondroyd Dec 1, 2022
c540381
ran pre-commit
AdamOrmondroyd Dec 1, 2022
4597844
check for IndexError rather than shape
AdamOrmondroyd Dec 2, 2022
2a3d9cc
Merge branch 'main' into weights
AdamOrmondroyd Dec 2, 2022
58621eb
Merge branch 'main' into weights
AdamOrmondroyd Dec 28, 2022
8a4e7d7
remove data shape hint from error, as data's type is unclear
AdamOrmondroyd Dec 29, 2022
8e54d95
Merge branch 'main' into weights
AdamOrmondroyd Dec 29, 2022
0feb62a
Merge branch 'main' into weights
AdamOrmondroyd Jan 3, 2023
8937c88
add out of bounds test and raise ValueError from IndexError
AdamOrmondroyd Jan 3, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -926,6 +926,7 @@ Period

Plotting
^^^^^^^^
- Bug in :meth:`DataFrame.plot.hist`, not dropping elements of ``weights`` corresponding to ``NaN`` values in ``data`` (:issue:`48884`)
- ``ax.set_xlim`` was sometimes raising ``UserWarning`` which users couldn't address due to ``set_xlim`` not accepting parsing arguments - the converter now uses :func:`Timestamp` instead (:issue:`49148`)
-

Expand Down
17 changes: 13 additions & 4 deletions pandas/plotting/_matplotlib/hist.py
Original file line number Diff line number Diff line change
Expand Up @@ -146,14 +146,23 @@ def _make_plot(self) -> None:
kwds["label"] = self.columns
kwds.pop("color")

y = reformat_hist_y_given_by(y, self.by)

# We allow weights to be a multi-dimensional array, e.g. a (10, 2) array,
# and each sub-array (10,) will be called in each iteration. If users only
# provide 1D array, we assume the same weights is used for all iterations
weights = kwds.get("weights", None)
if weights is not None and np.ndim(weights) != 1:
kwds["weights"] = weights[:, i]
if weights is not None:
if np.ndim(weights) != 1 and np.shape(weights)[-1] != 1:
try:
weights = weights[:, i]
except IndexError as err:
raise ValueError(
"weights must have the same shape as data, "
"or be a single column"
) from err
weights = weights[~isna(y)]
kwds["weights"] = weights

y = reformat_hist_y_given_by(y, self.by)

artists = self._plot(ax, y, column_num=i, stacking_id=stacking_id, **kwds)

Expand Down
30 changes: 30 additions & 0 deletions pandas/tests/plotting/test_hist_method.py
Original file line number Diff line number Diff line change
Expand Up @@ -560,6 +560,36 @@ def test_hist_secondary_legend(self):
assert ax.get_yaxis().get_visible()
tm.close()

@td.skip_if_no_mpl
def test_hist_with_nans_and_weights(self):
# GH 48884
df = DataFrame(
[[np.nan, 0.2, 0.3], [0.4, np.nan, np.nan], [0.7, 0.8, 0.9]],
columns=list("abc"),
)
weights = np.array([0.25, 0.3, 0.45])
no_nan_df = DataFrame([[0.4, 0.2, 0.3], [0.7, 0.8, 0.9]], columns=list("abc"))
no_nan_weights = np.array([[0.3, 0.25, 0.25], [0.45, 0.45, 0.45]])

from matplotlib.patches import Rectangle

_, ax0 = self.plt.subplots()
df.plot.hist(ax=ax0, weights=weights)
rects = [x for x in ax0.get_children() if isinstance(x, Rectangle)]
heights = [rect.get_height() for rect in rects]
_, ax1 = self.plt.subplots()
no_nan_df.plot.hist(ax=ax1, weights=no_nan_weights)
no_nan_rects = [x for x in ax1.get_children() if isinstance(x, Rectangle)]
no_nan_heights = [rect.get_height() for rect in no_nan_rects]
assert all(h0 == h1 for h0, h1 in zip(heights, no_nan_heights))

idxerror_weights = np.array([[0.3, 0.25], [0.45, 0.45]])

msg = "weights must have the same shape as data, or be a single column"
with pytest.raises(ValueError, match=msg):
_, ax2 = self.plt.subplots()
no_nan_df.plot.hist(ax=ax2, weights=idxerror_weights)


@td.skip_if_no_mpl
class TestDataFrameGroupByPlots(TestPlotBase):
Expand Down