Skip to content

BUG: set_levels set illegal levels. #14236

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Oct 10, 2016
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
BUG: set_levels set illegal levels.
`MultiIndex.set_levels`, when passed in illegal level values, raises an error.
When `inplace=True`, though, the illegal level values are still accepted. This
commit fixes that behavior by checking that the proposed level values are legal
before setting them.

added tests and docs.

lint

added tests.

force build

force build

force build

force build for osx
  • Loading branch information
Ben Kandel committed Oct 9, 2016
commit 4093ff935e4ccfa37ab385096ad2d8e397221081
3 changes: 2 additions & 1 deletion doc/source/whatsnew/v0.19.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1560,6 +1560,7 @@ Bug Fixes
- Bug in ``.to_string()`` when called with an integer ``line_width`` and ``index=False`` raises an UnboundLocalError exception because ``idx`` referenced before assignment.
- Bug in ``eval()`` where the ``resolvers`` argument would not accept a list (:issue:`14095`)
- Bugs in ``stack``, ``get_dummies``, ``make_axis_dummies`` which don't preserve categorical dtypes in (multi)indexes (:issue:`13854`)
- ``PeridIndex`` can now accept ``list`` and ``array`` which contains ``pd.NaT`` (:issue:`13430`)
- ``PeriodIndex`` can now accept ``list`` and ``array`` which contains ``pd.NaT`` (:issue:`13430`)
- Bug in ``df.groupby`` where ``.median()`` returns arbitrary values if grouped dataframe contains empty bins (:issue:`13629`)
- Bug in ``Index.copy()`` where ``name`` parameter was ignored (:issue:`14302`)
- Bug in ``MultiIndex.set_levels`` where illegal level values were still set after raising an error (:issue:`13754`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to 0.19.1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved.

28 changes: 19 additions & 9 deletions pandas/indexes/multi.py
Original file line number Diff line number Diff line change
Expand Up @@ -116,12 +116,22 @@ def __new__(cls, levels=None, labels=None, sortorder=None, names=None,

return result

def _verify_integrity(self):
def _verify_integrity(self, new_labels=None, new_levels=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call these labels, levels. the new is confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

"""Raises ValueError if length of levels and labels don't match or any
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a Parameters section

label would exceed level bounds"""
label would exceed level bounds

Parameters
----------
new_labels : optional list
Labels to check for validity. Defaults to current labels.
new_levels : optional list
Levels to check for validity. Defaults to current levels.
"""
# NOTE: Currently does not check, among other things, that cached
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a Raises section

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

# nlevels matches nor that sortorder matches actually sortorder.
labels, levels = self.labels, self.levels
labels = new_labels or self.labels
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can new_levels or new_labels ever be a NumPy array (e.g. user passes idx.set_levels(np.array['a', 'b']), or will they have been coerced to a list by now? If they're an array this will raise a ValueError. You could use labels = new_labels if new_labels is not None else self.labels.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, looks like it'll be a FrozenList by this point. Never mind then.

levels = new_levels or self.levels

if len(levels) != len(labels):
raise ValueError("Length of levels and labels must match. NOTE:"
" this index is in an inconsistent state.")
Expand Down Expand Up @@ -162,6 +172,9 @@ def _set_levels(self, levels, level=None, copy=False, validate=True,
new_levels[l] = _ensure_index(v, copy=copy)._shallow_copy()
new_levels = FrozenList(new_levels)

if verify_integrity:
self._verify_integrity(new_levels=new_levels)

names = self.names
self._levels = new_levels
if any(names):
Expand All @@ -170,9 +183,6 @@ def _set_levels(self, levels, level=None, copy=False, validate=True,
self._tuples = None
self._reset_cache()

if verify_integrity:
self._verify_integrity()

def set_levels(self, levels, level=None, inplace=False,
verify_integrity=True):
"""
Expand Down Expand Up @@ -268,13 +278,13 @@ def _set_labels(self, labels, level=None, copy=False, validate=True,
lab, lev, copy=copy)._shallow_copy()
new_labels = FrozenList(new_labels)

if verify_integrity:
self._verify_integrity(new_labels=new_labels)

self._labels = new_labels
self._tuples = None
self._reset_cache()

if verify_integrity:
self._verify_integrity()

def set_labels(self, labels, level=None, inplace=False,
verify_integrity=True):
"""
Expand Down
21 changes: 20 additions & 1 deletion pandas/tests/indexes/test_multi.py
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ def assert_matching(actual, expected):
# as much as possible
self.assertEqual(len(actual), len(expected))
for act, exp in zip(actual, expected):
act = np.asarray(act)
act = np.asarray(act, dtype=np.object_)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you need to change this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the expected is changed to np.object_ and if I don't change act also the test fails because the expected and actual are not changed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you show me an example? it is passing now, so unclear why you had to change it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed up commit e0c8897 that undoes this change and fails with the error:

E       AssertionError: numpy array are different
E       
E       Attribute "dtype" are different
E       [left]:  int8
E       [right]: object

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int8 is right

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But when I take out the coercion to np.object_, the test test_set_levels fails with

E       AssertionError: numpy array are different
E       
E       Attribute "dtype" are different
E       [left]:  object
E       [right]: |S4

This is in the test

        # level changing [w/o mutation]
        ind2 = self.index.set_levels(new_levels)
        assert_matching(ind2.levels, new_levels)
        assert_matching(self.index.levels, levels)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like maybe _ensure_index is doing the conversion to np.object_ in that test case (https://github.com/pydata/pandas/blob/master/pandas/indexes/base.py#L3616 and https://github.com/pydata/pandas/blob/master/pandas/lib.pyx#L1029).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this issue.

exp = np.asarray(exp, dtype=np.object_)
tm.assert_numpy_array_equal(act, exp)

Expand Down Expand Up @@ -204,6 +204,25 @@ def assert_matching(actual, expected):
assert_matching(ind2.levels, new_levels)
assert_matching(self.index.levels, levels)

# illegal level changing should not change levels
# GH 13754
original_index = self.index.copy()
with assertRaisesRegexp(ValueError, "^On"):
self.index.set_levels(['c'], level=0, inplace=True)
assert_matching(self.index.levels, original_index.levels)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you test set_labels (with invalid labels)

with assertRaisesRegexp(ValueError, "^On"):
self.index.set_labels([0, 1, 2, 3, 4, 5], level=0, inplace=True)
assert_matching(self.index.labels, original_index.labels)

with assertRaisesRegexp(TypeError, "^Levels"):
self.index.set_levels('c', level=0, inplace=True)
assert_matching(self.index.levels, original_index.levels)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add tests for inplace=False as well (you can loop over the param I think)


with assertRaisesRegexp(TypeError, "^Labels"):
self.index.set_labels(1, level=0, inplace=True)
assert_matching(self.index.labels, original_index.labels)

def test_set_labels(self):
# side note - you probably wouldn't want to use levels and labels
# directly like this - but it is possible.
Expand Down