-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN: Index.append() refactoring #16236
Changes from 2 commits
5029592
d5c7d77
44800a4
528295d
554ee79
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,6 +15,7 @@ | |
is_object_dtype, | ||
is_bool_dtype, | ||
is_dtype_equal, | ||
is_range, | ||
_NS_DTYPE, | ||
_TD_DTYPE) | ||
from pandas.core.dtypes.generic import ( | ||
|
@@ -45,6 +46,8 @@ def get_dtype_kinds(l): | |
# if to_concat contains different tz, | ||
# the result must be object dtype | ||
typ = str(arr.dtype) | ||
elif is_range(arr): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is very strange. why are you adding this here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you are directly calling the routine (and you don't handle typ='range' anywhere), so not sure this is even hit There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is required so that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok I see, then you should be explicit about testing what you need, e.g. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you can simply check There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm fine with using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I had missed this, and hence maybe our misunderstanding: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, if it is for a single call, I would just do the |
||
typ = 'range' | ||
elif is_datetime64_dtype(dtype): | ||
typ = 'datetime' | ||
elif is_timedelta64_dtype(dtype): | ||
|
@@ -559,3 +562,38 @@ def convert_sparse(x, axis): | |
# coerce to object if needed | ||
result = result.astype('object') | ||
return result | ||
|
||
|
||
def _concat_indexes_same_dtype_rangeindex(indexes): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe |
||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you add a comment about what this is doing / guarantees and example would be nice as well. |
||
start = step = next = None | ||
|
||
for obj in indexes: | ||
if not len(obj): | ||
continue | ||
|
||
if start is None: | ||
# This is set by the first non-empty index | ||
start = obj._start | ||
if step is None and len(obj) > 1: | ||
step = obj._step | ||
elif step is None: | ||
# First non-empty index had only one element | ||
if obj._start == start: | ||
return _concat_index_asobject(indexes) | ||
step = obj._start - start | ||
|
||
non_consecutive = ((step != obj._step and len(obj) > 1) or | ||
(next is not None and obj._start != next)) | ||
if non_consecutive: | ||
# Not nice... but currently what happens in NumericIndex: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this comment needed? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see it as a reminder: I would have liked to use return Int64Index._append_same_dtype([ix.astype(int) for ix in indexes]) ... but then, numeric indexes currently do not special-case But I can remove it, and just take a note of this TODO. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see, OK to keep it then, but maybe make it a bit more informative (or remove the 'not nice ', just say that it is what is used by Int64Index._append_same_dtype) (I also don't think it would give that much of gain for making a special cased one of integers) |
||
return _concat_index_asobject(indexes) | ||
|
||
if step is not None: | ||
next = obj[-1] + step | ||
|
||
if start is None: | ||
start = obj._start | ||
step = obj._step | ||
stop = obj._stop if next is None else next | ||
return indexes[0].__class__(start, stop, step) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is ugly... but the only alternative I see (an import inside the function) is uglier. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. its ok here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Ah no, this wouldn't play nice with the case when no range index is returned, ignore this |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1741,10 +1741,9 @@ def append(self, other): | |
names = set([obj.name for obj in to_concat]) | ||
name = None if len(names) > 1 else self.name | ||
|
||
if self.is_categorical(): | ||
# if calling index is category, don't check dtype of others | ||
from pandas.core.indexes.category import CategoricalIndex | ||
return CategoricalIndex._append_same_dtype(self, to_concat, name) | ||
return self._concat(to_concat, name) | ||
|
||
def _concat(self, to_concat, name): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you call this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, I think it would make more sense to change There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. since it is only use by append, I prefer using append in the name, but no strong feelings There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You are right that it's currently used only by append, but usually you expect x.append(y) to concatenate x to y or to elements of y; instead this only concatenates elements of y. So since you don't object I will go with my proposal. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
in the end it is used to concatenate both y to x, just that this is passed like that in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Sure, I don't object to that. We can agree it is a concat operation used to implement appending: the switch happens when |
||
|
||
typs = _concat.get_dtype_kinds(to_concat) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not similar to the other methods, which detect a type. a RangeIndex is not a type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only called in
get_dtype_kinds
, see below, so I could have just usedisinstance(arr, ABCRangeIndex)
. But I tried to be coherent withis_categorical
andis_sparse
... what exactly is a "type"?!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are conflating a type with an Index. is_categorical will detect a categorical type which could be a Categorical (or the dtype=='category'), a CategoricalIndex happens to be of this type as well.
However RangeIndex is simply an Index subclassing Int64Index. its not a type (its dtype is int64). types can be the dtype of an Index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... still,
... so the type returned by
get_dtype_kinds
is already not the dtype, and not even thedtype.kind
.But anyway, I don't particularly care about changing
get_dtype_kinds
. We just need a method which can tell us whether two indexes can be natively concatenated: this is currentlyget_dtype_kinds
, but I can write a new one if you prefer.