Description
Code Sample, a copy-pastable example if possible
import pandas as pd
df = pd.DataFrame({'foo': ['2000-01-01T00:00:00.000Z+00:00']})
xf = df.copy()
xf.foo = df.foo.astype('datetime64[ns, UTC]') # Works
xf.foo = df.foo.astype('datetime64[us]') # Works
xf.foo = df.foo.astype('datetime64[us, UTC]') # Crashes
Traceback:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/dargueta/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/core/generic.py", line 5691, in astype
**kwargs)
File "/Users/dargueta/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 531, in astype
return self.apply('astype', dtype=dtype, **kwargs)
File "/Users/dargueta/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 395, in apply
applied = getattr(b, f)(**kwargs)
File "/Users/dargueta/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 534, in astype
**kwargs)
File "/Users/dargueta/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 595, in _astype
dtype = pandas_dtype(dtype)
File "/Users/dargueta/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/core/dtypes/common.py", line 2017, in pandas_dtype
dtype))
TypeError: data type 'datetime64[us, UTC]' not understood
data type 'datetime64[us, UTC]' not understood
Problem description
It appears that timestamps require the resolution to be in nanoseconds if you're going to have the Series be timezone-aware. This is confusing because I can cast a string to a timestamp of any valid resolution, but I can't use anything but nanoseconds if I want a tz-aware timestamp.
As for the relevance, I'm aware of #23990 but it doesn't seem to directly apply to astype()
.
Context: This is a problem because there's downstream code that expects microsecond resolution and timezones and will choke otherwise (long story). I also need to use astype
because the code needs to work with any dtype handed to it, and since the dtypes are being loaded from YAML files passing in an object is not an option.
Expected Output
A Series shouldn't require nanosecond resolution to be timezone-aware.
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.2.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.24.2
pytest: 4.3.1
pip: 19.0.3
setuptools: 40.8.0
Cython: 0.29.6
numpy: 1.16.2
scipy: None
pyarrow: 0.12.1
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: 2.7.7 (dt dec pq3 ext lo64)
jinja2: None
s3fs: 0.2.0
fastparquet: 0.2.1
pandas_gbq: None
pandas_datareader: None
gcsfs: None