Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add warning when setting into nonexistent attribute #16951

Merged
merged 22 commits into from
Aug 7, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
b5dde1e
ENH: adds warning when setting list-like into attribute
deniederhut Jul 15, 2017
4d3f87d
TST: Adds tests for warning behavior on set on nonexistent attr
deniederhut Jul 15, 2017
fb29857
DOC: replaces silent error for setattr on nonexistent name with UserW…
deniederhut Jul 15, 2017
af5ab38
CLN: juggles doc url to fit within 80 char per line
deniederhut Jul 15, 2017
f485e77
BUG: replaces single backticks with double backticks for rst
deniederhut Jul 15, 2017
bfb9f93
BUG: adds stacklevel to warning call
deniederhut Jul 15, 2017
61eca9d
DOC: adds 0.21.0 for appearance of setting warnings
deniederhut Jul 15, 2017
15ebdbc
ENH: Add warning when colname collides with methods
deniederhut Jul 15, 2017
30d98de
TST: Adds testing around warnings for setting attributes
deniederhut Jul 15, 2017
8d38f68
CLN: reformatting and cleanup of setting tests
deniederhut Jul 15, 2017
c9c43db
ENH: Clarifies wording of warning message for column name collision
deniederhut Jul 15, 2017
0a25a56
BUG: coerces _set_item key to a string
deniederhut Jul 15, 2017
150d586
FIX: only warn getattr when key is str
deniederhut Jul 19, 2017
02cb426
TST: adds comments about tests which should not warn
deniederhut Jul 19, 2017
562c164
DOC: adds column/attribute warnings to whatsnew
deniederhut Jul 19, 2017
f3fdd19
DOC: moves examples from whatsnew to indexing
deniederhut Jul 19, 2017
c6fe062
FIX: replaces ndim check with ABC subclass check
deniederhut Jul 19, 2017
8ad4d05
CLN: breaks line to avoid 81 char length
deniederhut Jul 23, 2017
e5cc166
TST: renames 'bool' columns in pytables test to avoid collision warning
deniederhut Jul 24, 2017
e02244d
BUG: fixes 'string_types' encoding failure with unicode cols in py2
deniederhut Jul 27, 2017
d38ecca
BUG: fixes reversion of isnull/isna produced in merge
deniederhut Jul 29, 2017
b86546e
DOC: fixes references in indexing and whatsnew
deniederhut Aug 4, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 31 additions & 4 deletions doc/source/indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -227,10 +227,6 @@ as an attribute:
dfa.A
panel.one

You can use attribute access to modify an existing element of a Series or column of a DataFrame, but be careful;
if you try to use attribute access to create a new column, it fails silently, creating a new attribute rather than a
new column.

.. ipython:: python

sa.a = 5
Expand Down Expand Up @@ -267,6 +263,37 @@ You can also assign a ``dict`` to a row of a ``DataFrame``:
x.iloc[1] = dict(x=9, y=99)
x

You can use attribute access to modify an existing element of a Series or column of a DataFrame, but be careful;
if you try to use attribute access to create a new column, it creates a new attribute rather than a
new column. In 0.21.0 and later, this will raise a ``UserWarning``:

.. code-block:: ipython

In[1]: df = pd.DataFrame({'one': [1., 2., 3.]})
In[2]: df.two = [4, 5, 6]
UserWarning: Pandas doesn't allow Series to be assigned into nonexistent columns - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute_access
In[3]: df
Out[3]:
one
0 1.0
1 2.0
2 3.0

Similarly, it is possible to create a column with a name which collides with one of Pandas's
built-in methods or attributes, which can cause confusion later when attempting to access
that column as an attribute. This behavior now warns:

.. code-block:: ipython

In[4]: df['sum'] = [5., 7., 9.]
UserWarning: Column name 'sum' collides with a built-in method, which will cause unexpected attribute behavior
In[5]: df.sum
Out[5]:
<bound method DataFrame.sum of one sum
0 1.0 5.0
1 2.0 7.0
2 3.0 9.0>

Slicing ranges
--------------

Expand Down
46 changes: 45 additions & 1 deletion doc/source/whatsnew/v0.21.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@ New features
- Added ``skipna`` parameter to :func:`~pandas.api.types.infer_dtype` to
support type inference in the presence of missing values (:issue:`17059`).


.. _whatsnew_0210.enhancements.infer_objects:

``infer_objects`` type conversion
Expand Down Expand Up @@ -62,6 +61,51 @@ using the :func:`to_numeric` function (or :func:`to_datetime`, :func:`to_timedel
df['C'] = pd.to_numeric(df['C'], errors='coerce')
df.dtypes

.. _whatsnew_0210.enhancements.attribute_access:

Improved warnings when attempting to create columns
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

New users are often flummoxed by the relationship between column operations and attribute
access on ``DataFrame`` instances (:issue:`5904` & :issue:`7175`). Two specific instances
of this confusion include attempting to create a new column by setting into an attribute:

.. code-block:: ipython

In[1]: df = pd.DataFrame({'one': [1., 2., 3.]})
In[2]: df.two = [4, 5, 6]

This does not raise any obvious exceptions, but also does not create a new column:

.. code-block:: ipython

In[3]: df
Out[3]:
one
0 1.0
1 2.0
2 3.0

The second source of confusion is creating a column whose name collides with a method or
attribute already in the instance namespace:

.. code-block:: ipython

In[4]: df['sum'] = [5., 7., 9.]

This does not permit that column to be accessed as an attribute:

.. code-block:: ipython

In[5]: df.sum
Out[5]:
<bound method DataFrame.sum of one sum
0 1.0 5.0
1 2.0 7.0
2 3.0 9.0>

Both of these now raise a ``UserWarning`` about the potential for unexpected behavior. See :ref:`Attribute Access <indexing.attribute_access>`.

.. _whatsnew_0210.enhancements.other:

Other Enhancements
Expand Down
12 changes: 11 additions & 1 deletion pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
pandas_dtype)
from pandas.core.dtypes.cast import maybe_promote, maybe_upcast_putmask
from pandas.core.dtypes.missing import isna, notna
from pandas.core.dtypes.generic import ABCSeries, ABCPanel
from pandas.core.dtypes.generic import ABCSeries, ABCPanel, ABCDataFrame

from pandas.core.common import (_values_from_object,
_maybe_box_datetimelike,
Expand Down Expand Up @@ -1907,6 +1907,10 @@ def _slice(self, slobj, axis=0, kind=None):
return result

def _set_item(self, key, value):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a single-point-of-contact that all (most? many?) setter methods go through? i.e. will the various loc.__setitem__ paths eventually wind through here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe that is correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is my understanding, yes. Here is an example of setting while using .loc:

import pandas as pd
df = pd.DataFrame({'one': [0, 1, 2]})
df.loc[:, 'sum'] = df.one.sum()
UserWarning: Column name 'sum' collides with a built-in method, which will cause unexpected attribute behavior
  self._set_item(key, value)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this point is only for [] setting, e.g. setting a column on a DF or an element on a Series. .loc/.iloc are handled in core/indexing.py

if isinstance(key, str) and callable(getattr(self, key, None)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use compat.string_types rather than str here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using string_types causes failures in Python2 when creating columns that have unicode characters in their names. See c90aa22.

warnings.warn("Column name '{key}' collides with a built-in "
"method, which will cause unexpected attribute "
"behavior".format(key=key), stacklevel=3)
self._data.set(key, value)
self._clear_item_cache()

Expand Down Expand Up @@ -3357,6 +3361,12 @@ def __setattr__(self, name, value):
else:
object.__setattr__(self, name, value)
except (AttributeError, TypeError):
if isinstance(self, ABCDataFrame) and (is_list_like(value)):
warnings.warn("Pandas doesn't allow Series to be assigned "
"into nonexistent columns - see "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would write this a bit more generic, as it does not only raise for Series objects (and I also find it a bit confusing, as pandas certainly allows assigning Series objects, just with another syntax).

Maybe something like "pandas doesn't allow to add a new column using attribute access" ? (can certainly be improved further)

"https://pandas.pydata.org/pandas-docs/"
"stable/indexing.html#attribute-access",
stacklevel=2)
object.__setattr__(self, name, value)

# ----------------------------------------------------------------------
Expand Down
38 changes: 38 additions & 0 deletions pandas/tests/dtypes/test_generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import numpy as np
import pandas as pd
from pandas.core.dtypes import generic as gt
from pandas.util import testing as tm


class TestABCClasses(object):
Expand Down Expand Up @@ -38,3 +39,40 @@ def test_abc_types(self):
assert isinstance(self.sparse_array, gt.ABCSparseArray)
assert isinstance(self.categorical, gt.ABCCategorical)
assert isinstance(pd.Period('2012', freq='A-DEC'), gt.ABCPeriod)


def test_setattr_warnings():
# GH5904 - Suggestion: Warning for DataFrame colname-methodname clash
# GH7175 - GOTCHA: You can't use dot notation to add a column...
d = {'one': pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
'two': pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)

with catch_warnings(record=True) as w:
# successfully add new column
# this should not raise a warning
df['three'] = df.two + 1
assert len(w) == 0
assert df.three.sum() > df.two.sum()

with catch_warnings(record=True) as w:
# successfully modify column in place
# this should not raise a warning
df.one += 1
assert len(w) == 0
assert df.one.iloc[0] == 2

with catch_warnings(record=True) as w:
# successfully add an attribute to a series
# this should not raise a warning
df.two.not_an_index = [1, 2]
assert len(w) == 0

with tm.assert_produces_warning(UserWarning):
# warn when setting column to nonexistent name
df.four = df.two + 2
assert df.four.sum() > df.two.sum()

with tm.assert_produces_warning(UserWarning):
# warn when column has same name as method
df['sum'] = df.two
4 changes: 2 additions & 2 deletions pandas/tests/io/test_pytables.py
Original file line number Diff line number Diff line change
Expand Up @@ -2011,7 +2011,7 @@ def check(obj, comparator):
df['string'] = 'foo'
df['float322'] = 1.
df['float322'] = df['float322'].astype('float32')
df['bool'] = df['float322'] > 0
df['boolean'] = df['float322'] > 0
df['time1'] = Timestamp('20130101')
df['time2'] = Timestamp('20130102')
check(df, tm.assert_frame_equal)
Expand Down Expand Up @@ -2141,7 +2141,7 @@ def test_table_values_dtypes_roundtrip(self):
df1['string'] = 'foo'
df1['float322'] = 1.
df1['float322'] = df1['float322'].astype('float32')
df1['bool'] = df1['float32'] > 0
df1['boolean'] = df1['float32'] > 0
df1['time1'] = Timestamp('20130101')
df1['time2'] = Timestamp('20130102')

Expand Down