-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Add default repr for EAs #23601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add default repr for EAs #23601
Changes from all commits
0fdbfd3
ace62aa
6e76b51
fef04e6
1885a97
ecfcd72
4e0d91f
37638cc
6e64b7b
193747e
5a2e1e4
1635b73
e2b1941
48e55cc
d8e7ba4
b312fe4
445736d
60e0d02
5b07906
ff0c998
2fd3d5d
5d8d2fc
baee6b2
4d343ea
5b291d5
1b93bf0
708dd75
0f4083e
9116930
ebadf6f
e5f6976
221cee9
439f2f8
2364546
62b1e2f
a926dca
fc4279d
27db397
5c253a4
ef390fc
2b5fe25
d84cc02
d9df6bf
a35399e
740f9e5
e7cc2ac
c79ba0b
3825aeb
2a60c15
bccf40d
a7ef104
a3b1c92
e080023
6ad113b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -47,10 +47,12 @@ class ExtensionArray(object): | |
* copy | ||
* _concat_same_type | ||
|
||
An additional method is available to satisfy pandas' internal, | ||
private block API. | ||
A default repr displaying the type, (truncated) data, length, | ||
and dtype is provided. It can be customized or replaced by | ||
by overriding: | ||
|
||
* _formatting_values | ||
* __repr__ : A default repr for the ExtensionArray. | ||
* _formatter : Print scalars inside a Series or DataFrame. | ||
|
||
Some methods require casting the ExtensionArray to an ndarray of Python | ||
objects with ``self.astype(object)``, which may be expensive. When | ||
|
@@ -676,17 +678,70 @@ def copy(self, deep=False): | |
raise AbstractMethodError(self) | ||
|
||
# ------------------------------------------------------------------------ | ||
# Block-related methods | ||
# Printing | ||
# ------------------------------------------------------------------------ | ||
def __repr__(self): | ||
from pandas.io.formats.printing import format_object_summary | ||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
template = ( | ||
u'{class_name}' | ||
u'{data}\n' | ||
u'Length: {length}, dtype: {dtype}' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. lowercase Length There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why lower? It's uppercase in the Series repr. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. its lowercase everywhere (else) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Everywhere else being index? Categorical uses a capital. I'd like to eventually use pieces of this for the categorical repr, and would rather not break that repr, so I think a capital makes more sense here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In index it is as part of the "keywords" in the constructor-resembling repr. There it indeed makes sense to have it lowercase. But here, I think capitalized is much more logical (it's the first item on that line), and consistent with Series and Categorical. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok, I guess if we match the EA's to Series, except for the quoting (e.g. quote in EA, but no change in Series, meaning no quoting), then Index is separate. I am still concerned with these slight differences however. E.g. even here, dtype is lowercase and Length is uppercase There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Keep in mind that any discussion of quoting whether or not to quote is up to the individual array. Are you specifically talking about quoting within PeriodArray here? Do we plan to quote within DatetimeArray and TimedeltaArray? |
||
) | ||
# the short repr has no trailing newline, while the truncated | ||
# repr does. So we include a newline in our template, and strip | ||
# any trailing newlines from format_object_summary | ||
data = format_object_summary(self, self._formatter(), | ||
indent_for_name=False).rstrip(', \n') | ||
class_name = u'<{}>\n'.format(self.__class__.__name__) | ||
return template.format(class_name=class_name, data=data, | ||
length=len(self), | ||
dtype=self.dtype) | ||
|
||
def _formatter(self, boxed=False): | ||
# type: (bool) -> Callable[[Any], Optional[str]] | ||
"""Formatting function for scalar values. | ||
|
||
This is used in the default '__repr__'. The returned formatting | ||
function receives instances of your scalar type. | ||
|
||
Parameters | ||
---------- | ||
boxed: bool, default False | ||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
An indicated for whether or not your array is being printed | ||
within a Series, DataFrame, or Index (True), or just by | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Doesn't Index also use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Index is hypothetical at this point, until we support EA-backed indexes. This isn't currently called by our index reprs. So we can choose to define it how we want. Or we can make this a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For external EAs, yes. But eg PeriodArray is stored in an Index, and there There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It was not. Done in 3825aeb There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would add this arg to the Index formatters as well for compatiblity. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure I follow the need for adding it to the index formatters, since indexes are boxed by definition. |
||
itself (False). This may be useful if you want scalar values | ||
to appear differently within a Series versus on its own (e.g. | ||
quoted or not). | ||
|
||
Returns | ||
------- | ||
Callable[[Any], str] | ||
A callable that gets instances of the scalar type and | ||
returns a string. By default, :func:`repr` is used | ||
when ``boxed=False`` and :func:`str` is used when | ||
``boxed=True``. | ||
""" | ||
if boxed: | ||
return str | ||
return repr | ||
|
||
def _formatting_values(self): | ||
# type: () -> np.ndarray | ||
# At the moment, this has to be an array since we use result.dtype | ||
""" | ||
An array of values to be printed in, e.g. the Series repr | ||
|
||
.. deprecated:: 0.24.0 | ||
|
||
Use :meth:`ExtensionArray._formatter` instead. | ||
""" | ||
return np.array(self) | ||
|
||
# ------------------------------------------------------------------------ | ||
# Reshaping | ||
# ------------------------------------------------------------------------ | ||
|
||
@classmethod | ||
def _concat_same_type(cls, to_concat): | ||
# type: (Sequence[ExtensionArray]) -> ExtensionArray | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,7 +5,7 @@ | |
import numpy as np | ||
|
||
from pandas._libs import lib | ||
from pandas.compat import range, set_function_name, string_types, u | ||
from pandas.compat import range, set_function_name, string_types | ||
from pandas.util._decorators import cache_readonly | ||
|
||
from pandas.core.dtypes.base import ExtensionDtype | ||
|
@@ -20,9 +20,6 @@ | |
from pandas.core import nanops | ||
from pandas.core.arrays import ExtensionArray, ExtensionOpsMixin | ||
|
||
from pandas.io.formats.printing import ( | ||
default_pprint, format_object_attrs, format_object_summary) | ||
|
||
|
||
class _IntegerDtype(ExtensionDtype): | ||
""" | ||
|
@@ -268,6 +265,13 @@ def _from_sequence(cls, scalars, dtype=None, copy=False): | |
def _from_factorized(cls, values, original): | ||
return integer_array(values, dtype=original.dtype) | ||
|
||
def _formatter(self, boxed=False): | ||
def fmt(x): | ||
if isna(x): | ||
return 'NaN' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not? This is used for the Integer data display, where we currently use NaN. (whether we should use rather 'NA' instead of 'NaN', that's another question) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just wondering whether it would be a problem with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, yes, that's a good reason. But in general, this |
||
return str(x) | ||
return fmt | ||
|
||
def __getitem__(self, item): | ||
if is_integer(item): | ||
if self._mask[item]: | ||
|
@@ -301,10 +305,6 @@ def __iter__(self): | |
else: | ||
yield self._data[i] | ||
|
||
def _formatting_values(self): | ||
# type: () -> np.ndarray | ||
return self._coerce_to_ndarray() | ||
|
||
def take(self, indexer, allow_fill=False, fill_value=None): | ||
from pandas.api.extensions import take | ||
|
||
|
@@ -354,25 +354,6 @@ def __setitem__(self, key, value): | |
def __len__(self): | ||
return len(self._data) | ||
|
||
def __repr__(self): | ||
""" | ||
Return a string representation for this object. | ||
|
||
Invoked by unicode(df) in py2 only. Yields a Unicode String in both | ||
py2/py3. | ||
""" | ||
klass = self.__class__.__name__ | ||
data = format_object_summary(self, default_pprint, False) | ||
attrs = format_object_attrs(self) | ||
space = " " | ||
|
||
prepr = (u(",%s") % | ||
space).join(u("%s=%s") % (k, v) for k, v in attrs) | ||
|
||
res = u("%s(%s%s)") % (klass, data, prepr) | ||
|
||
return res | ||
|
||
@property | ||
def nbytes(self): | ||
return self._data.nbytes + self._mask.nbytes | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -503,7 +503,7 @@ def __array_wrap__(self, result, context=None): | |
|
||
@property | ||
def _formatter_func(self): | ||
return lambda x: "'%s'" % x | ||
return self.array._formatter(boxed=False) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why is this the only index sublcass that you need to do this for? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's the only extension-array backed index that's using the formatting right now.
Datetime / TImedelta will use this too, so this will be pushed up into |
||
|
||
def asof_locs(self, where, mask): | ||
""" | ||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -16,11 +16,12 @@ | |||||
from pandas.compat import StringIO, lzip, map, u, zip | ||||||
|
||||||
from pandas.core.dtypes.common import ( | ||||||
is_categorical_dtype, is_datetime64_dtype, is_datetime64tz_dtype, is_float, | ||||||
is_float_dtype, is_integer, is_integer_dtype, is_interval_dtype, | ||||||
is_list_like, is_numeric_dtype, is_period_arraylike, is_scalar, | ||||||
is_categorical_dtype, is_datetime64_dtype, is_datetime64tz_dtype, | ||||||
is_extension_array_dtype, is_float, is_float_dtype, is_integer, | ||||||
is_integer_dtype, is_list_like, is_numeric_dtype, is_scalar, | ||||||
is_timedelta64_dtype) | ||||||
from pandas.core.dtypes.generic import ABCMultiIndex, ABCSparseArray | ||||||
from pandas.core.dtypes.generic import ( | ||||||
ABCIndexClass, ABCMultiIndex, ABCSeries, ABCSparseArray) | ||||||
from pandas.core.dtypes.missing import isna, notna | ||||||
|
||||||
from pandas import compat | ||||||
|
@@ -29,7 +30,6 @@ | |||||
from pandas.core.config import get_option, set_option | ||||||
from pandas.core.index import Index, ensure_index | ||||||
from pandas.core.indexes.datetimes import DatetimeIndex | ||||||
from pandas.core.indexes.period import PeriodIndex | ||||||
|
||||||
from pandas.io.common import _expand_user, _stringify_path | ||||||
from pandas.io.formats.printing import adjoin, justify, pprint_thing | ||||||
|
@@ -842,22 +842,18 @@ def _get_column_name_list(self): | |||||
def format_array(values, formatter, float_format=None, na_rep='NaN', | ||||||
digits=None, space=None, justify='right', decimal='.'): | ||||||
|
||||||
if is_categorical_dtype(values): | ||||||
fmt_klass = CategoricalArrayFormatter | ||||||
elif is_interval_dtype(values): | ||||||
fmt_klass = IntervalArrayFormatter | ||||||
if is_datetime64_dtype(values.dtype): | ||||||
fmt_klass = Datetime64Formatter | ||||||
elif is_timedelta64_dtype(values.dtype): | ||||||
fmt_klass = Timedelta64Formatter | ||||||
elif is_extension_array_dtype(values.dtype): | ||||||
fmt_klass = ExtensionArrayFormatter | ||||||
elif is_float_dtype(values.dtype): | ||||||
fmt_klass = FloatArrayFormatter | ||||||
elif is_period_arraylike(values): | ||||||
fmt_klass = PeriodArrayFormatter | ||||||
elif is_integer_dtype(values.dtype): | ||||||
fmt_klass = IntArrayFormatter | ||||||
elif is_datetime64tz_dtype(values): | ||||||
fmt_klass = Datetime64TZFormatter | ||||||
elif is_datetime64_dtype(values.dtype): | ||||||
fmt_klass = Datetime64Formatter | ||||||
elif is_timedelta64_dtype(values.dtype): | ||||||
fmt_klass = Timedelta64Formatter | ||||||
else: | ||||||
fmt_klass = GenericArrayFormatter | ||||||
|
||||||
|
@@ -1121,39 +1117,22 @@ def _format_strings(self): | |||||
return fmt_values.tolist() | ||||||
|
||||||
|
||||||
class IntervalArrayFormatter(GenericArrayFormatter): | ||||||
|
||||||
def __init__(self, values, *args, **kwargs): | ||||||
GenericArrayFormatter.__init__(self, values, *args, **kwargs) | ||||||
|
||||||
def _format_strings(self): | ||||||
formatter = self.formatter or str | ||||||
fmt_values = np.array([formatter(x) for x in self.values]) | ||||||
return fmt_values | ||||||
|
||||||
|
||||||
class PeriodArrayFormatter(IntArrayFormatter): | ||||||
|
||||||
class ExtensionArrayFormatter(GenericArrayFormatter): | ||||||
def _format_strings(self): | ||||||
from pandas.core.indexes.period import IncompatibleFrequency | ||||||
try: | ||||||
values = PeriodIndex(self.values).to_native_types() | ||||||
except IncompatibleFrequency: | ||||||
# periods may contains different freq | ||||||
values = Index(self.values, dtype='object').to_native_types() | ||||||
|
||||||
formatter = self.formatter or (lambda x: '{x}'.format(x=x)) | ||||||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
fmt_values = [formatter(x) for x in values] | ||||||
return fmt_values | ||||||
|
||||||
values = self.values | ||||||
if isinstance(values, (ABCIndexClass, ABCSeries)): | ||||||
values = values._values | ||||||
|
||||||
class CategoricalArrayFormatter(GenericArrayFormatter): | ||||||
formatter = values._formatter(boxed=True) | ||||||
|
||||||
def __init__(self, values, *args, **kwargs): | ||||||
GenericArrayFormatter.__init__(self, values, *args, **kwargs) | ||||||
if is_categorical_dtype(values.dtype): | ||||||
# Categorical is special for now, so that we can preserve tzinfo | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we need a TODO here? this is until DatetimeArray is fully pushed? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That depends on whether we're willing to change There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. #23569 (comment) for that. |
||||||
array = values.get_values() | ||||||
else: | ||||||
array = np.asarray(values) | ||||||
|
||||||
def _format_strings(self): | ||||||
fmt_values = format_array(self.values.get_values(), self.formatter, | ||||||
fmt_values = format_array(array, | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @TomAugspurger : i'm struggling to resolve some formatting issues. what is the reason for calling There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i guess, to be more succinct, why is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not that familiar with this code, but from a quick look: calling There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Although that most of those custom Formatter classes don't do much special if formatter is specified. Eg pandas/pandas/io/formats/format.py Lines 1174 to 1175 in 181f972
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so if |
||||||
formatter, | ||||||
float_format=self.float_format, | ||||||
na_rep=self.na_rep, digits=self.digits, | ||||||
space=self.space, justify=self.justify) | ||||||
|
Uh oh!
There was an error while loading. Please reload this page.