DEPR: removal of deprecation warnings for float indexers #12246

jreback · 2016-02-06T20:09:15Z

raise a TypeError instead, xref #4892
closes #11836

similar to numpy in 1.11 here

jreback · 2016-02-06T23:13:42Z

there are no real changes here (aside from a bug fix), except for now raising TypeError where we should have been showing the FutureWarning previously

but if anyone would has comments great.

jreback · 2016-02-11T02:59:25Z

@TomAugspurger @jorisvandenbossche @shoyer any objections?

TomAugspurger · 2016-02-11T03:14:29Z

pandas/core/indexing.py

+        except TypeError:
+
+            # but we will allow setting
+            if is_setter:


Does this mean that series[1.0] will TypeError, but series[1.0] = 2 will succeed? Or does that raise later on?

short answer is yes, though, series[1.0] = 2 works because its a label assignment.

Hmm, this seems not great (on your branch)

In [10]: s = Series([1,2,3]) In [11]: s[1.] = 5 In [12]: s Out[12]: 0 1 1 2 2 3 1 5 dtype: int64 In [13]: s.index Out[13]: Int64Index([0, 1, 2, 1], dtype='int64')

So before this would have warned about the Floating scalar, cast to an int, and then overwritten the value at index=1. I guess that would make this an API change, but this seems like a pretty hairy edge-case

I think I see why you would want to allow setting of floats that could be cast as integers, but maybe you could explain your thinking a bit more?

EDIT: I accidentally left out a "not" in my first line, so now I sound like a sarcastic jerk 😄

So I was fixing this bug

In [1]: s = Series([1,2,3]) In [2]: s Out[2]: 0 1 1 2 2 3 dtype: int64 In [3]: s.index Out[3]: RangeIndex(start=0, stop=3, step=1) In [4]: s['0'] = 5 In [5]: s Out[5]: 0 1 1 2 2 3 0 5 dtype: int64 In [6]: s.index Out[6]: Index([0, 1, 2, u'0'], dtype='object')

so s[1.0] = 1 should change this to a Float64Index

but I couldn't get it to work and it seemed innocuous enough. Let me have another look. This is a giant rabbit hole.

In [1]: s = Series([1,2,3])

This is on current master. This should work and coerce to a Float64Index.

In [2]: s[1.1] = 1 pandas/core/indexing.py:1095: FutureWarning: scalar indexers for index type RangeIndex should be integers and not floating point IndexingError: 1.1

So I fixed everything, but had to change this:

What kind of index should this be:
Index([1., 2., 3.], dtype=int)
Index([1., 2, 3.], dtype=int)

w/o actually running it?

kawochen · 2016-02-11T06:40:27Z

If setting works and changes the index to Float64Index, what should s be after this?

In [17]: s = pandas.Series([1,2,3])

In [18]: s.index = pandas.Int64Index([2**53, 2**53+1, 2**53+2])

In [19]: s
Out[19]: 
9007199254740992    1
9007199254740993    2
9007199254740994    3
dtype: int64

In [20]: s[2**53+0.0] = 4

shoyer · 2016-02-11T07:24:14Z

We should not make using floats instead of ints for label based indexing (e.g., with .loc) error. I think we need to following the precedence of the Python dictionary here:

In [16]: x = {1: 'one'}

In [17]: x[1.0]
Out[17]: 'one'

I'm totally on board with making this error in cases were it was previously doing positional indexing (e.g., .iloc), though. That would be consistent with NumPy and base Python:

In [18]: y = [1, 2]

In [19]: y[1.0]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-19-d03a4798580f> in <module>()
----> 1 y[1.0]

TypeError: list indices must be integers or slices, not float

TomAugspurger · 2016-02-11T13:26:50Z

This is a giant rabbit hole.

Sorry about sending you down it :/

I think I'm with @shoyer here. If two items compare as equal, label getting / setting go through (assuming that's possible to implement, while still raising on positional getting/setting).

The additional thing we have to decide on is what casting to do when setting. Do we cast the label to be like the rest of the Index or the Index to be like the new label? FWIW NumPy will cast the new item for arrays

In [40]: x = np.array([1, 2, 3], dtype='int64')

In [41]: x[1] = 2.0

In [42]: x.dtype
Out[42]: dtype('int64')

Oh, but they also do this even if the float can't be cast to an equal int. So setting x[1] = 2.5 also truncates to 2. I don't think we can do that.

jreback · 2016-02-11T13:31:32Z

So this comes down to: is 1.0 a key in an Int64Index/RangeIndex or is it not found, in which case we upcast the Index to either Float64Index or object

In [1]: s = Series(range(3), range(3))

In [2]: s
Out[2]: 
0    0
1    1
2    2
dtype: int64

what should
s[1.0] = 5 do

Option 1
In this PR (this looks kind of wrong to me, essentially

In [6]: s[1.0] = 5     

In [7]: s
Out[7]: 
0.0    0
1.0    1
2.0    2
1.0    5
dtype: int64

In [8]: s.index
Out[8]: Float64Index([0.0, 1.0, 2.0, 1.0], dtype='float64')

Option 2
and alternative is to upcast this

In [10]: s2 = Series(range(4),index=Index([0,1,2,1.0],dtype=object))  

In [11]: s2.index
Out[11]: Index([0, 1, 2, 1.0], dtype='object')

In [12]: s
Out[12]: 
0.0    0
1.0    1
2.0    2
1.0    5
dtype: int64

Option 3
Third we could make 1.0 match 1 (remember this is for .loc/[], e.g. ONLY for label indexing), though .ix actually would NOT match this for positional so .ix would do the same

and simply have

In [2]: s
Out[2]: 
0    0
1    5
2    2
dtype: int64

Then there is the case where we are doing s[0.1] = 5 (and option 3 is off the table).

jreback · 2016-02-12T16:44:05Z

Ok here's the revised setting behavior, I think its pretty natural

the value is coerced (gently) to the index type and regular indexing is performed
this only actually matters in the case of a float with an integer index (e.g. 0.0 -> 0)
strings and non-coercible floats (e.g. 4.1) will force conversion to object or Float64Index as appropriate.

In [2]: s = Series(range(3))

In [3]: s2 = s.copy()

In [4]: s2[0] = 10

In [5]: s2.index
Out[5]: RangeIndex(start=0, stop=3, step=1)

In [6]: s2 = s.copy()

In [7]: s2['0'] = 10

In [8]: s2.index
Out[8]: Index([0, 1, 2, u'0'], dtype='object')

In [9]: s2 = s.copy()

In [10]: s2[0.0] = 10

In [11]: s2.index
Out[11]: RangeIndex(start=0, stop=3, step=1)

In [12]: s2 = s.copy()

In [13]: s2[4] = 10

In [14]: s2.index
Out[14]: Int64Index([0, 1, 2, 4], dtype='int64')

In [15]: s2 = s.copy()

In [16]: s2[4.0] = 10

In [17]: s2.index
Out[17]: Int64Index([0, 1, 2, 4], dtype='int64')

In [18]: s2 = s.copy()

In [19]: s2[4.1] = 10

In [20]: s2.index
Out[20]: Float64Index([0.0, 1.0, 2.0, 4.1], dtype='float64')

Comparison to 0.17.1
differences are a 2 bug fixes
and no warning now issued (though I could do that)

In [3]: s = Series(range(3))

In [4]: s2 = s.copy()

In [5]: s2[0] = 10

In [6]: s2.index
Out[6]: Int64Index([0, 1, 2], dtype='int64')

In [7]: s2 = s.copy()

# this is wrong and a bug fix
In [8]: s2['0'] = 10

In [9]: s2.index
Out[9]: Int64Index([0, 1, 2, 0], dtype='int64')

In [10]: s2 = s.copy()

# warning issued in 0.17.1
In [11]: s2[0.0] = 10
/Users/jreback/miniconda/lib/python2.7/site-packages/pandas/core/indexing.py:1063: FutureWarning: scalar indexers for index type Int64Index should be integers and not floating point
  obj = self._convert_scalar_indexer(obj, axis)

In [12]: s2.index
Out[12]: Int64Index([0, 1, 2], dtype='int64')

In [13]: s2 = s.copy()

In [14]: s2[4] = 10

In [15]: s2.index
Out[15]: Int64Index([0, 1, 2, 4], dtype='int64')

In [16]: s2 = s.copy()

In [17]: s2[4.0] = 10

In [18]: s2.index
Out[18]: Int64Index([0, 1, 2, 4], dtype='int64')

In [19]: s2 = s.copy()

# this was buggy in 0.17.1
In [20]: s2[4.1] = 10
IndexingError: 4.1

jreback · 2016-02-12T19:26:34Z

cc @shoyer @TomAugspurger @jorisvandenbossche

TomAugspurger · 2016-02-12T19:29:40Z

At a glance those all look correct. Haven't had a chance to look through the code.

👍 to your general approach of coercing the new member to the Index type if possible, otherwise coerce the index.

kawochen · 2016-02-12T20:30:40Z

pandas/core/indexing.py

+                    # we will not create a duplicate index, rather
+                    # index to that element
+                    # e.g. 0.0 -> 0
+                    # GH12245


…nal setting, and raise a TypeError, xref pandas-dev#4892 BUG: index type coercion when setting with an integer-like closes pandas-dev#11836

jorisvandenbossche · 2016-02-13T16:41:49Z

Some after merge feedback:

I took a look and I also like the logic now as how you explained it in your last post (for label based indexing: if the label evaluates as equal, then it is interpreted as the existing label)

A few things:

The whatsnew needs to be updated (as it still says it raises now for all cases, as you started the PR)
(minor) When using iloc with a float (so the case it still should raise an error), the error message is not fully correct:
```
In [76]: s2 = s.copy()

In [77]: s2.iloc[2.0] = 10
TypeError: cannot do label indexing on <class 'pandas.indexes.range.RangeIndex'>
with these indexers [2.0] of <type 'float'>
```
This is not 'label' indexing, but 'integer' or 'positional' indexing

The behaviour of using a float indexer with ix on a non-numerical index has changed:

In [78]: s3 = pd.Series(range(3), index=list('abc'))

In [79]: s4 = s3.copy()

In [80]: s4.ix[1.0] = 10

In [81]: s4
Out[81]:
a       0
b       1
c       2
1.0    10
dtype: int64

In [82]: s4.index
Out[82]: Index([u'a', u'b', u'c', 1.0], dtype='object')

In [83]: pd.__version__
Out[83]: '0.17.1+350.g5f7c9e9'

and with 0.16.2

In [1]: pd.__version__
Out[1]: '0.16.2'

In [2]: s3 = pd.Series(range(3), index=list('abc'))

In [3]: s4 = s3.copy()

In [4]: s4.ix[1.0] = 10
C:\Anaconda\lib\site-packages\pandas\core\index.py:805: FutureWarning: scalar i
dexers for index type Index should be integers and not floating point
type(self).__name__),FutureWarning)

In [5]: s4
Out[5]:
a     0
b    10
c     2
dtype: int64

In [6]: s4.index
Out[6]: Index([u'a', u'b', u'c'], dtype='object')

The change is logical, as before the float was interpreted as a positional indexer (and for this the warning was raised). But now a float cannot be a positional indexer anymore, so it is interpreted as a new label.
To be clear, I think this change is OK, but just wanted to point out a case where this change will not raise an error but alter your results (worth mentioning in the whatsnew docs?)

jorisvandenbossche · 2016-02-13T16:43:15Z

doc/source/release.rst

@@ -52,6 +52,7 @@ Highlights include:
 - ``pd.test()`` top-level nose test runner is available (:issue:`4327`)
 - Adding support for a ``RangeIndex`` as a specialized form of the ``Int64Index`` for memory savings, see :ref:`here <whatsnew_0180.enhancements.rangeindex>`.
 - API breaking ``.resample`` changes to make it more ``.groupby`` like, see :ref:`here <whatsnew_0180.breaking.resample>`.
+- Removal of support for deprecated float indexers; these will now raise a ``TypeError``, see :ref:`here <whatsnew_0180.enhancements.float_indexers>`.


maybe mention that it only is about positional indexing? Otherwise people could think FloatIndex is removed.

jorisvandenbossche · 2016-02-15T20:41:24Z

Actually, I don't yet agree with the behaviour of how it is now merged in this PR (but now only saw this by looking at your other PR for the doc issues #12330).

We had a discussion about what to do if a float (eg 1.0) evaluates as equal to a label (eg 1). Initially this raised, but after the discussion the behaviour was changed for setting:

In [7]: s = pd.Series([1,2,3])

In [8]: s[0.0] = 10

In [9]: s
Out[9]:
0    10
1     2
2     3
dtype: int64

In [10]: s.index
Out[10]: RangeIndex(start=0, stop=3, step=1)

This comment above gives an overview of this: #12246 (comment)

But, the behaviour was not changed for getting, so the getting variant of the example above raises:

In [11]: s[0.0]
TypeError: cannot do label indexing on <class 'pandas.indexes.range.RangeIndex'>
 with these indexers [0.0] of <type 'float'>

Which feels a bit inconsistent. And it is also, if I understand it correctly, what @shoyer opposed to:

We should not make using floats instead of ints for label based indexing (e.g., with .loc) error. I think we need to following the precedence of the Python dictionary here:

In [16]: x = {1: 'one'}

In [17]: x[1.0]
Out[17]: 'one'

@jreback That is actually the reason I gave the comment to update the whatsnew notes, as I thought they were incorrect (assuming that not only setting was changed, but also getting)

jreback · 2016-02-15T20:47:00Z

hmm, for []/.ix I could agree with you (certainly not for .loc/.iloc though) as these are quite strict.

It comes down to whether we are actually indexing with a label or a position (and that of course is the ambiguity here).

@shoyer comment is too simple for many of these cases. We have different indexers for this reason (and some legacy w.r.t. []).

@jorisvandenbossche can you enumerate when you think we ought to coerce by indexer?

maybe open a new issue.

jorisvandenbossche · 2016-02-15T20:50:40Z

This just looks very odd:

In [11]: s = pd.Series([1,2,3])

In [12]: s.loc[0.0] = 10

In [13]: s.loc[0.0]
TypeError: cannot do label indexing on <class 'pandas.indexes.range.RangeIndex'>
 with these indexers [0.0] of <type 'float'>

And I don't think we should introduce even more differences between loc/ix/[] here.

But indeed, will open a new issue

jreback added Compat pandas objects compatability with Numpy or Python functions Deprecate Functionality to remove in pandas labels Feb 6, 2016

jreback added this to the 0.18.0 milestone Feb 6, 2016

jsexauer mentioned this pull request Feb 6, 2016

DEPR: Clean up list of deprecations from prior versions #6581

Closed

1 task

jreback force-pushed the deprecate2 branch 3 times, most recently from 45596d0 to 6680f1f Compare February 6, 2016 22:55

jreback force-pushed the deprecate2 branch from 6680f1f to e04bed0 Compare February 10, 2016 00:27

jreback mentioned this pull request Feb 10, 2016

decide what to do about float->int conversion errors in 1.11 numpy/numpy#7162

Closed

jreback force-pushed the deprecate2 branch from e04bed0 to 9d720f4 Compare February 10, 2016 19:09

TomAugspurger reviewed Feb 11, 2016
View reviewed changes

jreback force-pushed the deprecate2 branch 2 times, most recently from 4ee7073 to a702344 Compare February 11, 2016 04:43

jreback force-pushed the deprecate2 branch 2 times, most recently from 538be0d to ad78ce2 Compare February 12, 2016 16:28

jreback force-pushed the deprecate2 branch 3 times, most recently from 6da1949 to 1584661 Compare February 12, 2016 18:29

kawochen reviewed Feb 12, 2016
View reviewed changes

DEPR: removal of deprecation warnings for float indexers in a positio…

5f7c9e9

…nal setting, and raise a TypeError, xref pandas-dev#4892 BUG: index type coercion when setting with an integer-like closes pandas-dev#11836

jreback force-pushed the deprecate2 branch from 1584661 to 5f7c9e9 Compare February 12, 2016 20:38

jreback closed this in a8be55c Feb 13, 2016

jorisvandenbossche reviewed Feb 13, 2016
View reviewed changes

jreback mentioned this pull request Feb 13, 2016

COMPAT: issues w.r.t. deprecating float indexers #12322

Closed

jreback mentioned this pull request Feb 15, 2016

API: behaviour of label indexing with floats on integer index #12333

Closed

jreback mentioned this pull request Jul 24, 2016

DEPR: deprecations log for removed issues #13777

Closed

Uh oh!

DEPR: removal of deprecation warnings for float indexers #12246

DEPR: removal of deprecation warnings for float indexers #12246

Uh oh!

Conversation

jreback commented Feb 6, 2016

Uh oh!

jreback commented Feb 6, 2016

Uh oh!

jreback commented Feb 11, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kawochen commented Feb 11, 2016

Uh oh!

shoyer commented Feb 11, 2016

Uh oh!

TomAugspurger commented Feb 11, 2016

Uh oh!

jreback commented Feb 11, 2016

Uh oh!

jreback commented Feb 12, 2016

Uh oh!

jreback commented Feb 12, 2016

Uh oh!

TomAugspurger commented Feb 12, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche commented Feb 13, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche commented Feb 15, 2016

Uh oh!

jreback commented Feb 15, 2016

Uh oh!

jorisvandenbossche commented Feb 15, 2016

Uh oh!

Uh oh!