Description
ts = pd.Timestamp("2016-01-01", tz="UTC")
ts2 = ts.tz_convert("US/Pacific")
>>> pd.to_datetime([ts, ts2])
ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True, at position 1
>>> pd.to_datetime([ts.isoformat(), ts2.isoformat()], format="mixed")
<stdin>:1: FutureWarning: In a future version of pandas, parsing datetimes with mixed time zones will raise an error unless `utc=True`. Please specify `utc=True` to opt in to the new behaviour and silence this warning. To create a `Series` with mixed offsets and `object` dtype, please use `apply` and `datetime.datetime.strptime`
Index([2016-01-01 00:00:00+00:00, 2015-12-31 16:00:00-08:00], dtype='object')
If we pass mixed-tz datetime objects, we do a tz-match check at each step of the loop inside array_to_datetime
/array_strptime
(specifically in state.process_datetime
). If we pass mixed-tz strings, the analogous check happens outside the loop. (per #55693 we currently dont have mixed-type checks)
Eventually these checks should be shared, which means we need to decide on the in-loop or after-loop versions. Three differences for users are
- the in-loop version adds the f"at position {i}" to the exception message.
- the in-loop version can be handled differently based on errors=coerce/ignore
- the in-loop version is in-loop and so presumably incurs a performance penalty
The errors=coerce/ignore part is the API part of the issue (though xref #54467 for deprecating ignore). I think it is very likely that the original intent of coerce was to handle invalid individual items, not invalid combinations of items, so would be OK with the API change that would come with moving this outside the loop.