-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEPR: removal of deprecation warnings for float indexers #12246
Conversation
45596d0
to
6680f1f
Compare
there are no real changes here (aside from a bug fix), except for now raising but if anyone would has comments great. |
@TomAugspurger @jorisvandenbossche @shoyer any objections? |
except TypeError: | ||
|
||
# but we will allow setting | ||
if is_setter: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that series[1.0]
will TypeError, but series[1.0] = 2
will succeed? Or does that raise later on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
short answer is yes, though, series[1.0] = 2
works because its a label assignment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this seems not great (on your branch)
In [10]: s = Series([1,2,3])
In [11]: s[1.] = 5
In [12]: s
Out[12]:
0 1
1 2
2 3
1 5
dtype: int64
In [13]: s.index
Out[13]: Int64Index([0, 1, 2, 1], dtype='int64')
So before this would have warned about the Floating scalar, cast to an int, and then overwritten the value at index=1
. I guess that would make this an API change, but this seems like a pretty hairy edge-case
I think I see why you would want to allow setting of floats that could be cast as integers, but maybe you could explain your thinking a bit more?
EDIT: I accidentally left out a "not" in my first line, so now I sound like a sarcastic jerk 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I was fixing this bug
In [1]: s = Series([1,2,3])
In [2]: s
Out[2]:
0 1
1 2
2 3
dtype: int64
In [3]: s.index
Out[3]: RangeIndex(start=0, stop=3, step=1)
In [4]: s['0'] = 5
In [5]: s
Out[5]:
0 1
1 2
2 3
0 5
dtype: int64
In [6]: s.index
Out[6]: Index([0, 1, 2, u'0'], dtype='object')
so s[1.0] = 1
should change this to a Float64Index
but I couldn't get it to work and it seemed innocuous enough. Let me have another look. This is a giant rabbit hole.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In [1]: s = Series([1,2,3])
This is on current master. This should work and coerce to a Float64Index
.
In [2]: s[1.1] = 1
pandas/core/indexing.py:1095: FutureWarning: scalar indexers for index type RangeIndex should be integers and not floating point
IndexingError: 1.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I fixed everything, but had to change this:
What kind of index should this be:
Index([1., 2., 3.], dtype=int)
Index([1., 2, 3.], dtype=int)
w/o actually running it?
4ee7073
to
a702344
Compare
If setting works and changes the index to
|
We should not make using floats instead of ints for label based indexing (e.g., with
I'm totally on board with making this error in cases were it was previously doing positional indexing (e.g.,
|
Sorry about sending you down it :/ I think I'm with @shoyer here. If two items compare as equal, label getting / setting go through (assuming that's possible to implement, while still raising on positional getting/setting). The additional thing we have to decide on is what casting to do when setting. Do we cast the label to be like the rest of the Index or the Index to be like the new label? FWIW NumPy will cast the new item for arrays In [40]: x = np.array([1, 2, 3], dtype='int64')
In [41]: x[1] = 2.0
In [42]: x.dtype
Out[42]: dtype('int64') Oh, but they also do this even if the float can't be cast to an equal int. So setting |
So this comes down to: is
what should Option 1
Option 2
Option 3 and simply have
Then there is the case where we are doing |
538be0d
to
ad78ce2
Compare
Ok here's the revised setting behavior, I think its pretty natural the value is coerced (gently) to the index type and regular indexing is performed
Comparison to 0.17.1
|
6da1949
to
1584661
Compare
At a glance those all look correct. Haven't had a chance to look through the code. 👍 to your general approach of coercing the new member to the Index type if possible, otherwise coerce the index. |
# we will not create a duplicate index, rather | ||
# index to that element | ||
# e.g. 0.0 -> 0 | ||
# GH12245 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
12246
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
…nal setting, and raise a TypeError, xref pandas-dev#4892 BUG: index type coercion when setting with an integer-like closes pandas-dev#11836
Some after merge feedback: I took a look and I also like the logic now as how you explained it in your last post (for label based indexing: if the label evaluates as equal, then it is interpreted as the existing label) A few things:
|
@@ -52,6 +52,7 @@ Highlights include: | |||
- ``pd.test()`` top-level nose test runner is available (:issue:`4327`) | |||
- Adding support for a ``RangeIndex`` as a specialized form of the ``Int64Index`` for memory savings, see :ref:`here <whatsnew_0180.enhancements.rangeindex>`. | |||
- API breaking ``.resample`` changes to make it more ``.groupby`` like, see :ref:`here <whatsnew_0180.breaking.resample>`. | |||
- Removal of support for deprecated float indexers; these will now raise a ``TypeError``, see :ref:`here <whatsnew_0180.enhancements.float_indexers>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe mention that it only is about positional indexing? Otherwise people could think FloatIndex is removed.
Actually, I don't yet agree with the behaviour of how it is now merged in this PR (but now only saw this by looking at your other PR for the doc issues #12330). We had a discussion about what to do if a float (eg 1.0) evaluates as equal to a label (eg 1). Initially this raised, but after the discussion the behaviour was changed for setting:
This comment above gives an overview of this: #12246 (comment) But, the behaviour was not changed for getting, so the getting variant of the example above raises:
Which feels a bit inconsistent. And it is also, if I understand it correctly, what @shoyer opposed to:
@jreback That is actually the reason I gave the comment to update the whatsnew notes, as I thought they were incorrect (assuming that not only setting was changed, but also getting) |
hmm, for It comes down to whether we are actually indexing with a label or a position (and that of course is the ambiguity here). @shoyer comment is too simple for many of these cases. We have different indexers for this reason (and some legacy w.r.t. @jorisvandenbossche can you enumerate when you think we ought to coerce by indexer? maybe open a new issue. |
This just looks very odd:
And I don't think we should introduce even more differences between loc/ix/[] here. But indeed, will open a new issue |
raise a
TypeError
instead, xref #4892closes #11836
similar to numpy in 1.11 here