Skip to content

WIP: Refactor accessors, unify usage, make "recipe" #17042

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 21 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
b77e103
Move PandasDelegate and AccessorProperty; update imports
jbrockmendel Jul 19, 2017
dbc149d
Move apply _shared_docs to functions and attach to methods with copy
jbrockmendel Jul 20, 2017
3c77d94
Implement _make_accessor as classmethod on StringMethods
jbrockmendel Jul 20, 2017
19f7ff6
Add example/recipe
jbrockmendel Jul 20, 2017
d152421
Test to go along with example/recipe
jbrockmendel Jul 20, 2017
101e7e5
Transition to _make_accessor
jbrockmendel Jul 20, 2017
774a35d
Merge branch 'master' into accessory
jbrockmendel Jul 20, 2017
ccec595
Merge branch 'master' into accessory
jbrockmendel Jul 20, 2017
74e4539
Remove unused import that was causing a lint error
jbrockmendel Jul 20, 2017
953598a
merge pulled
jbrockmendel Jul 20, 2017
22d4892
Wrap long line
jbrockmendel Jul 22, 2017
014fae0
Refactor tests and documentation
jbrockmendel Jul 22, 2017
dd8315c
Typos, flake8 fixes, rearrange comments
jbrockmendel Jul 22, 2017
74a237b
Simplify categorical make_accessor args
jbrockmendel Jul 23, 2017
c931d4b
Rename PandasDelegate subclasses FooDelegate
jbrockmendel Jul 25, 2017
6c771b4
Revert import rearrangement; update names FooDelegate
jbrockmendel Jul 25, 2017
d3a4460
Deprecate StringAccessorMixin
jbrockmendel Jul 25, 2017
48f3b4d
Merge branch 'master' into accessory
jbrockmendel Jul 25, 2017
73a0633
lint fixes
jbrockmendel Jul 25, 2017
aa793ad
Merge branch 'accessory' of https://github.com/jbrockmendel/pandas in…
jbrockmendel Jul 25, 2017
264a7e7
Merge branch 'master' into accessory
jbrockmendel Sep 20, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Refactor tests and documentation
  • Loading branch information
jbrockmendel committed Jul 22, 2017
commit 014fae0165ff7b4b8db5d74662776196de7c1db8
209 changes: 46 additions & 163 deletions pandas/core/accessors.py
Original file line number Diff line number Diff line change
@@ -1,181 +1,73 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""

An example/recipe for creating a custom accessor.


The primary use case for accessors is when a Series contains instances
of a particular class and we want to access properties/methods of these
instances in Series form.

Suppose we have a custom State class representing US states:

class State(object):
def __repr__(self):
return repr(self.name)

def __init__(self, name):
self.name = name
self._abbrev_dict = {'California': 'CA', 'Alabama': 'AL'}

@property
def abbrev(self):
return self._abbrev_dict[self.name]

@abbrev.setter
def abbrev(self, value):
self._abbrev_dict[self.name] = value

def fips(self):
return {'California': 6, 'Alabama': 1}[self.name]


We can construct a series of these objects:

>>> ser = pd.Series([State('Alabama'), State('California')])
>>> ser
0 'Alabama'
1 'California'
dtype: object
from pandas.core.base import PandasObject

We would like direct access to the `abbrev` property and `fips` method.
One option is to access these manually with `apply`:

>>> ser.apply(lambda x: x.fips())
0 1
1 6
dtype: int64
class PandasDelegate(PandasObject):
""" an abstract base class for delegating methods/properties

But doing that repeatedly gets old in a hurry, so we decide to make a
custom accessor. This entails subclassing `PandasDelegate` to specify
what should be accessed and how.
Usage: To make a custom accessor, subclass `PandasDelegate`, overriding
the methods below. Then decorate this subclass with
`accessors.wrap_delegate_names` describing the methods and properties
that should be delegated.

There are four methods that *may* be defined in this subclass, one of which
*must* be defined. The mandatory method is a classmethod called
`_make_accessor`. `_make_accessor` is responsible doing any validation on
inputs for the accessor. In this case, the inputs must be a Series
containing State objects.
Examples can be found in:

pandas.core.accessors.CategoricalAccessor
pandas.core.indexes.accessors (complicated example)
pandas.core.indexes.category.CategoricalIndex
pandas.core.strings.StringMethods
pandas.tests.test_accessors

class StateDelegate(PandasDelegate):
"""

def __init__(self, values):
"""
The subclassed constructor will generally only be called by
_make_accessor. See _make_accessor.__doc__.
"""
self.values = values

@classmethod
def _make_accessor(cls, data):
if not isinstance(data, pd.Series):
raise ValueError('Input must be a Series of States')
elif not data.apply(lambda x: isinstance(x, State)).all():
raise ValueError('All entries must be State objects')
return StateDelegate(data)


With `_make_accessor` defined, we have enough to create the accessor, but
not enough to actually do anything useful with it. In order to access
*methods* of State objects, we implement `_delegate_method`.
`_delegate_method` calls the underlying method for each object in the
series and wraps these in a new Series. The simplest version looks like:

def _delegate_method(self, name, *args, **kwargs):
state_method = lambda x: getattr(x, name)(*args, **kwargs)
return self.values.apply(state_method)

Similarly in order to access *properties* of State objects, we need to
implement `_delegate_property_get`:

def _delegate_property_get(self, name):
state_property = lambda x: getattr(x, name)
return self.values.apply(state_property)


On ocassion, we may want to be able to *set* property being accessed.
This is discouraged, but allowed (as long as the class being accessed
allows the property to be set). Doing so requires implementing
`_delegate_property_set`:

def _delegate_property_set(self, name, new_values):
for (obj, val) in zip(self.values, new_values):
setattr(obj, name, val)


With these implemented, `StateDelegate` knows how to handle methods and
properties. We just need to tell it what names and properties it is
supposed to handle. This is done by decorating the `StateDelegate`
class with `pd.accessors.wrap_delegate_names`. We apply the decorator
once with a list of all the methods the accessor should recognize and
once with a list of all the properties the accessor should recognize.


@wrap_delegate_names(delegate=State,
accessors=["fips"],
typ="method")
@wrap_delegate_names(delegate=State,
accessors=["abbrev"],
typ="property")
class StateDelegate(PandasDelegate):
[...]


We can now pin the `state` accessor to the pd.Series class (we could
alternatively pin it to the pd.Index class with a slightly different
implementation above):

pd.Series.state = accessors.AccessorProperty(StateDelegate)


>>> ser = pd.Series([State('Alabama'), State('California')])
>>> isinstance(ser.state, StateDelegate)
True

>>> ser.state.abbrev
0 AL
1 CA
dtype: object

>>> ser.state.fips()
0 1
1 6

>>> ser.state.abbrev = ['Foo', 'Bar']
>>> ser.state.abbrev
0 Foo
1 Bar
dtype: object



"""
from pandas.core.base import PandasObject
from pandas.core import common as com


class PandasDelegate(PandasObject):
""" an abstract base class for delegating methods/properties
def _make_accessor(cls, data): # pragma: no cover
"""
_make_accessor should implement any necessary validation on the
data argument to ensure that the properties/methods being
accessed will be available.

Usage: To make a custom accessor, start by subclassing `Delegate`.
See example in the module-level docstring.
_make_accessor should return cls(data). If necessary, the arguments
to the constructor can be expanded. In this case, __init__ will
need to be overrided as well.

"""
Parameters
----------
data : the underlying object being accessed, usually Series or Index

def __init__(self, values):
self.values = values
# #self._freeze()
Returns
-------
Delegate : instance of PandasDelegate or subclass

@classmethod
def _make_accessor(cls, data): # pragma: no cover
"""
raise NotImplementedError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use AbstractMethodError

'It is up to subclasses to implement '
'_make_accessor. This does input validation on the object to '
'which the accessor is being pinned. '
'It should return an instance of `cls`.')
# return cls(data)

def _delegate_property_get(self, name, *args, **kwargs):
raise TypeError("You cannot access the "
"property {name}".format(name=name))

def _delegate_property_set(self, name, value, *args, **kwargs):
"""
Overriding _delegate_property_set is discouraged. It is generally
better to directly interact with the underlying data than to
alter it via the accessor.

An example that ignores this advice can be found in
tests.test_accessors.TestVectorizedAccessor
"""
raise TypeError("The property {name} cannot be set".format(name=name))

def _delegate_method(self, name, *args, **kwargs):
Expand Down Expand Up @@ -242,14 +134,8 @@ def create_delegator_method(name, delegate):
def func(self, *args, **kwargs):
return self._delegate_method(name, *args, **kwargs)

if callable(name):
# A function/method was passed directly instead of a name
# This may also render the `delegate` arg unnecessary.
func.__name__ = name.__name__ # TODO: is this generally valid?
func.__doc__ = name.__doc__
else:
func.__name__ = name
func.__doc__ = getattr(delegate, name).__doc__
func.__name__ = name
func.__doc__ = getattr(delegate, name).__doc__
return func

@staticmethod
Expand Down Expand Up @@ -294,13 +180,10 @@ def add_delegate_accessors(cls):
else:
func = Delegator.create_delegator_method(name, delegate)

# Allow for a callable to be passed instead of a name.
title = com._get_callable_name(name)
title = title or name
# don't overwrite existing methods/properties unless
# specifically told to do so
if overwrite or not hasattr(cls, title):
setattr(cls, title, func)
if overwrite or not hasattr(cls, name):
setattr(cls, name, func)

return cls

Expand Down
90 changes: 79 additions & 11 deletions pandas/tests/test_accessors.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,22 @@
An example/recipe/test for implementing custom accessors.

"""
import unittest
import pandas.util.testing as tm

import pandas as pd

from pandas.core.accessors import (wrap_delegate_names,
PandasDelegate, AccessorProperty)

# Example 1:
# An accessor for attributes of custom class in a Series with object dtype.


class State(object):
"""
A dummy class for which only two states have the attributes implemented.
"""
def __repr__(self):
return repr(self.name)

Expand Down Expand Up @@ -72,20 +80,80 @@ def _delegate_property_set(self, name, new_values):
setattr(obj, name, val)


def test_geo_state_accessor():
import pandas.util.testing as tm
class TestVectorizedAccessor(unittest.TestCase):

@classmethod
def setup_class(cls):
pd.Series.state = AccessorProperty(StateDelegate)

cls.ser = pd.Series([State('Alabama'), State('California')])

@classmethod
def teardown_class(cls):
del pd.Series.state
# TODO: is there a nicer way to do this with `mock`?

def test_method(self):
ser = self.ser
fips = pd.Series([1, 6])
tm.assert_series_equal(ser.state.fips(), fips)

def test_property_get(self):
ser = self.ser
abbrev = pd.Series(['AL', 'CA'])
tm.assert_series_equal(ser.state.abbrev, abbrev)

def test_property_set(self):
ser = self.ser.copy()

ser.state.abbrev = ['Foo', 'Bar']
new_abbrev = pd.Series(['Foo', 'Bar'])
tm.assert_series_equal(ser.state.abbrev, new_abbrev)


pd.Series.state = AccessorProperty(StateDelegate)
@wrap_delegate_names(delegate=pd.Series,
accessors=["real", "imag"],
typ="property")
@wrap_delegate_names(delegate=pd.Series,
accessors=["abs"],
typ="method")
class ForgotToOverride(PandasDelegate):
# A case where the relevant methods were not overridden. Everything
# should raise NotImplementedError or TypeError
@classmethod
def _make_accessor(cls, data):
return cls(data)


class TestUnDelegated(unittest.TestCase):

@classmethod
def setup_class(cls):
pd.Series.forgot = AccessorProperty(ForgotToOverride)

cls.ser = pd.Series(range(-2, 2))

@classmethod
def teardown_class(cls):
del pd.Series.forgot

ser = pd.Series([State('Alabama'), State('California')])
def test_get_fails(self):
forgot = self.ser.forgot
with self.assertRaises(TypeError):
forgot.real

abbrev = pd.Series(['AL', 'CA'])
tm.assert_series_equal(ser.state.abbrev, abbrev)
with self.assertRaises(TypeError):
forgot.imag

fips = pd.Series([1, 6])
tm.assert_series_equal(ser.state.fips(), fips)
def test_set_fails(self):
forgot = self.ser.forgot
with self.assertRaises(TypeError):
forgot.real = range(5)

ser.state.abbrev = ['Foo', 'Bar']
# Check that the underlying hasn't been affected
tm.assert_series_equal(self.ser, pd.Series(range(-2, 2)))

new_abbrev = pd.Series(['Foo', 'Bar'])
tm.assert_series_equal(ser.state.abbrev, new_abbrev)
def test_method_fails(self):
forgot = self.ser.forgot
with self.assertRaises(TypeError):
forgot.abs()