-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
REF/DEPR: DatetimeIndex constructor #23675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
f8efbef
9d20bc9
aef3f4c
66ae42b
d0e8ee3
e1f4e17
d18e0df
a4c8c77
5dc5980
7e5587e
e94e826
7464d15
80b5dbe
3c822f1
ba7e5e8
3ba9da7
49c11e1
f1d3fd8
d44055e
1471a2b
11b5f6c
9f56d23
1c3a5aa
be4d472
145772d
7c99105
6b60da2
a7038bb
14d923b
c9dbf24
ce9914d
b3d5bb7
09c88fc
0367d6f
7cc8577
b3a096b
fd5af18
2cdd215
782ca81
03d5b35
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1503,6 +1503,83 @@ def maybe_convert_dtype(data, copy): | |
return data, copy | ||
|
||
|
||
def objects_to_datetime64ns(data, dayfirst, yearfirst, | ||
utc=False, errors="raise", | ||
require_iso8601=False, allow_object=False): | ||
""" | ||
Convert data to array of timestamps. | ||
|
||
Parameters | ||
---------- | ||
data : np.ndarray[object] | ||
dayfirst : bool | ||
yearfirst : bool | ||
utc : bool, default False | ||
Whether to convert timezone-aware timestamps to UTC | ||
errors : {'raise', 'ignore', 'coerce'} | ||
allow_object : bool | ||
Whether to return an object-dtype ndarray instead of raising if the | ||
data contains more than one timezone. | ||
|
||
Returns | ||
------- | ||
result : ndarray | ||
np.int64 dtype if returned values represent UTC timestamps | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just to verify: getting an you'll get an There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. correct |
||
np.datetime64[ns] if returned values represent wall times | ||
object if mixed timezones | ||
inferred_tz : tzinfo or None | ||
|
||
Raises | ||
------ | ||
ValueError : if data cannot be converted to datetimes | ||
""" | ||
assert errors in ["raise", "ignore", "coerce"] | ||
|
||
# if str-dtype, convert | ||
data = np.array(data, copy=False, dtype=np.object_) | ||
|
||
try: | ||
result, tz_parsed = tslib.array_to_datetime( | ||
data, | ||
errors=errors, | ||
utc=utc, | ||
dayfirst=dayfirst, | ||
yearfirst=yearfirst, | ||
require_iso8601=require_iso8601 | ||
) | ||
except ValueError as e: | ||
try: | ||
values, tz_parsed = conversion.datetime_to_datetime64(data) | ||
# If tzaware, these values represent unix timestamps, so we | ||
# return them as i8 to distinguish from wall times | ||
return values.view('i8'), tz_parsed | ||
except (ValueError, TypeError): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why this addtiional level of try/except here? wouldn't this just raise anyhow? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Presumably to re-raise the original exception if the fallback fails. This is the existing behavior. I think @mroeschke and I are vaguely planning to revisit this before long and combine datetime_to_datetime64 into array_to_datetime, fixing the many idiosyncracies of these calls. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe i wasn't clear. I think you can simply remove the try/except and it will work the way it is now. (and same below). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the claim you're making is that it will raise under the same conditions, I agree. If the claim is that it will raise the same exception, I don't. i.e. if the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. really? i don't think that is actually possible. the original exception is re-raised here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jreback did this resolve the miscommunication? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. again if datetime_to_datetime64 raises There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That sounds like a "no" on the resolving miscommunication. Did I at least accurately summarize your suggested change? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jbrockmendel is correct here. side-note, we could clarify all this with python-3 style except (ValueError, TypeError) as e2:
raise e2 from e or six's https://pythonhosted.org/six/#six.raise_from, but it's probably just easier to wait until next month. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
not resolved, but I see from @TomAugspurger which what i was getting at. i guess ok for now. |
||
raise e | ||
|
||
if tz_parsed is not None: | ||
# We can take a shortcut since the datetime64 numpy array | ||
# is in UTC | ||
# Return i8 values to denote unix timestamps | ||
return result.view('i8'), tz_parsed | ||
elif is_datetime64_dtype(result): | ||
# returning M8[ns] denotes wall-times; since tz is None | ||
# the distinction is a thin one | ||
return result, tz_parsed | ||
elif is_object_dtype(result): | ||
# GH#23675 when called via `pd.to_datetime`, returning an object-dtype | ||
# array is allowed. When called via `pd.DatetimeIndex`, we can | ||
# only accept datetime64 dtype, so raise TypeError if object-dtype | ||
# is returned, as that indicates the values can be recognized as | ||
# datetimes but they have conflicting timezones/awareness | ||
if allow_object: | ||
return result, tz_parsed | ||
raise TypeError(result) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe just remove the else and do the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think its worth distinguishing between hittable TypeError and defensive TypeError There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you comment more then. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sure |
||
else: # pragma: no cover | ||
# GH#23675 this TypeError should never be hit, whereas the TypeError | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is not more clear. Can you put the cases where each of these would be hit |
||
# in the object-dtype branch above is reachable. | ||
raise TypeError(result) | ||
|
||
|
||
def _generate_regular_range(cls, start, end, periods, freq): | ||
""" | ||
Generate a range of dates with the spans between dates described by | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,16 +17,16 @@ | |
from pandas.core.dtypes.common import ( | ||
_INT64_DTYPE, _NS_DTYPE, ensure_int64, is_datetime64_dtype, | ||
is_datetime64_ns_dtype, is_datetime64tz_dtype, is_dtype_equal, is_float, | ||
is_integer, is_integer_dtype, is_list_like, is_period_dtype, is_scalar, | ||
is_string_like, pandas_dtype) | ||
is_integer, is_list_like, is_object_dtype, is_period_dtype, is_scalar, | ||
is_string_dtype, is_string_like, pandas_dtype) | ||
import pandas.core.dtypes.concat as _concat | ||
from pandas.core.dtypes.generic import ABCSeries | ||
from pandas.core.dtypes.missing import isna | ||
|
||
from pandas.core.arrays import datetimelike as dtl | ||
from pandas.core.arrays.datetimes import ( | ||
DatetimeArrayMixin as DatetimeArray, _to_m8, maybe_convert_dtype, | ||
maybe_infer_tz) | ||
maybe_infer_tz, objects_to_datetime64ns) | ||
from pandas.core.base import _shared_docs | ||
import pandas.core.common as com | ||
from pandas.core.indexes.base import Index, _index_shared_docs | ||
|
@@ -281,10 +281,19 @@ def __new__(cls, data=None, | |
|
||
# By this point we are assured to have either a numpy array or Index | ||
data, copy = maybe_convert_dtype(data, copy) | ||
if not (is_datetime64_dtype(data) or is_datetime64tz_dtype(data) or | ||
is_integer_dtype(data) or lib.infer_dtype(data) == 'integer'): | ||
data = tools.to_datetime(data, dayfirst=dayfirst, | ||
yearfirst=yearfirst) | ||
|
||
if is_object_dtype(data) or is_string_dtype(data): | ||
# TODO: We do not have tests specific to string-dtypes, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you could just write this as
might be more clear There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The issue with that is np.array(['20160405']) becomes np.array([20160405]) instead of 2016-04-05. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok sure |
||
# also complex or categorical or other extension | ||
copy = False | ||
if lib.infer_dtype(data) == 'integer': | ||
jbrockmendel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
data = data.astype(np.int64) | ||
else: | ||
# data comes back here as either i8 to denote UTC timestamps | ||
# or M8[ns] to denote wall times | ||
data, inferred_tz = objects_to_datetime64ns( | ||
data, dayfirst=dayfirst, yearfirst=yearfirst) | ||
tz = maybe_infer_tz(tz, inferred_tz) | ||
|
||
if is_datetime64tz_dtype(data): | ||
tz = maybe_infer_tz(tz, data.tz) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -171,7 +171,8 @@ def _convert_listlike_datetimes(arg, box, format, name=None, tz=None, | |
- ndarray of Timestamps if box=False | ||
""" | ||
from pandas import DatetimeIndex | ||
from pandas.core.arrays.datetimes import maybe_convert_dtype | ||
from pandas.core.arrays.datetimes import ( | ||
maybe_convert_dtype, objects_to_datetime64ns) | ||
|
||
if isinstance(arg, (list, tuple)): | ||
arg = np.array(arg, dtype='O') | ||
|
@@ -233,8 +234,9 @@ def _convert_listlike_datetimes(arg, box, format, name=None, tz=None, | |
|
||
tz_parsed = None | ||
result = None | ||
try: | ||
if format is not None: | ||
|
||
if format is not None: | ||
try: | ||
# shortcut formatting here | ||
if format == '%Y%m%d': | ||
try: | ||
|
@@ -266,24 +268,22 @@ def _convert_listlike_datetimes(arg, box, format, name=None, tz=None, | |
if errors == 'raise': | ||
raise | ||
result = arg | ||
|
||
if result is None: | ||
assert format is None or infer_datetime_format | ||
result, tz_parsed = tslib.array_to_datetime( | ||
arg, | ||
errors=errors, | ||
utc=tz == 'utc', | ||
dayfirst=dayfirst, | ||
yearfirst=yearfirst, | ||
require_iso8601=require_iso8601 | ||
) | ||
except ValueError as e: | ||
# Fallback to try to convert datetime objects | ||
try: | ||
values, tz = conversion.datetime_to_datetime64(arg) | ||
return DatetimeIndex._simple_new(values, name=name, tz=tz) | ||
except (ValueError, TypeError): | ||
raise e | ||
except ValueError as e: | ||
# Fallback to try to convert datetime objects if timezone-aware | ||
# datetime objects are found without passing `utc=True` | ||
try: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same here, why the extra try/except? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. cam you be slightly more specific on 'fallback', e.g. when would this be triggered There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll flesh out the comment. The gist of it is if tz-aware datetime objects are passed and It's a mess and is a big reason why I've been making array_to_datetime PRs: #24006 is an attempt to fix it, but there are some design questions @mroeschke and I are still batting around. |
||
values, tz = conversion.datetime_to_datetime64(arg) | ||
return DatetimeIndex._simple_new(values, name=name, tz=tz) | ||
except (ValueError, TypeError): | ||
raise e | ||
jbrockmendel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
if result is None: | ||
assert format is None or infer_datetime_format | ||
utc = tz == 'utc' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why is this lowercase? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is how it is implemented in master. We'll improve this in the upcoming passes specific to |
||
result, tz_parsed = objects_to_datetime64ns( | ||
arg, dayfirst=dayfirst, yearfirst=yearfirst, | ||
utc=utc, errors=errors, require_iso8601=require_iso8601, | ||
allow_object=True) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i find this allow_object pretty obtuse There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The alternative is to raise unconditionally in objects_to_datetime64ns, include the result in the exception args, then catch and extract it here. That was actually what I did in the first pass, but this is much more straightforward. |
||
|
||
if tz_parsed is not None: | ||
if box: | ||
|
Uh oh!
There was an error while loading. Please reload this page.