BUG: Can't change datetime precision in columns/rows #57838

erezinman · 2024-03-14T12:49:04Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

# ONLY WORKING CONVERSION:
df = pd.DataFrame({'time': pd.to_datetime(['2021-01-01 12:00:00', '2021-01-01 12:00:01', '2021-01-01 12:00:02']), 'value': [1, 2, 3]})
df['time'] = df['time'].astype('M8[us]') 
print(df.dtypes)
# time     datetime64[us]
# value             int64
# dtype: object

# NON-WORKING CONVERSIONS

df = pd.DataFrame({'time': pd.to_datetime(['2021-01-01 12:00:00', '2021-01-01 12:00:01', '2021-01-01 12:00:02']),
                           'value': [1, 2, 3]})
df.iloc[:, 0] = df.iloc[:, 0].astype('M8[us]') 
print(df.dtypes)
# time     datetime64[ns]
# value             int64
# dtype: object


df = pd.DataFrame({'time': pd.to_datetime(['2021-01-01 12:00:00', '2021-01-01 12:00:01', '2021-01-01 12:00:02']),
                           'value': [1, 2, 3]})
df.loc[:, ['time']] = df.loc[:, ['time']].astype('M8[us]') 
print(df.dtypes)
# time     datetime64[ns]
# value             int64
# dtype: object

df = pd.DataFrame({'time': pd.to_datetime(['2021-01-01 12:00:00', '2021-01-01 12:00:01', '2021-01-01 12:00:02']),
                           'value': [1, 2, 3]})
idxs = [0]
axis = 1
df.iloc(axis=axis)[idxs] = df.iloc(axis=axis)[idxs].astype('M8[us]')
print(df.dtypes)
# time     datetime64[ns]
# value             int64
# dtype: object

Issue Description

Conversion of columns (/rows) between datetime dtypes with different precision does not change the datatype of the columns (except for in the simplest case).

The absurd is that if I were to change the dtype of the "value" column in the above example, all of these example would've worked.

Expected Behavior

All printouts should be the same as the first:

time     datetime64[us]
value             int64
dtype: object

Installed Versions

INSTALLED VERSIONS

commit : bdc79c1
python : 3.9.18.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.0-91-generic
Version : #101~20.04.1-Ubuntu SMP Thu Nov 16 14:22:28 UTC 2023
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_IL
LOCALE : en_IL.UTF-8
pandas : 2.2.1
numpy : 1.24.4
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 68.2.2
pip : 23.3
Cython : 3.0.6
pytest : 7.4.3
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : 2.8.6
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None

The text was updated successfully, but these errors were encountered:

phofl · 2024-03-18T01:09:15Z

cc @MarcoGorelli is this pdep6 related? It looks like this is a case of upcasting

@erezinman all cases that don't work set inplace instead of swapping out the underlying data, so different semantics can happen.

MarcoGorelli · 2024-03-18T08:21:36Z

thanks for the ping

looks like it's been like this since at least 2.0.2, so I don't think it's related to any pdep-6 work (which only started in 2.1):

In [2]: import pandas as pd

In [3]:
   ...: df = pd.DataFrame({'time': pd.to_datetime(['2021-01-01 12:00:00', '2021-01-01 12:00:01', '2021-01-01 12:00:02'])
   ...: ,
   ...:                            'value': [1, 2, 3]})

In [4]: df.iloc[:, 0] = df.iloc[:, 0].astype('M8[us]')

In [5]: df.dtypes
Out[5]:
time     datetime64[ns]
value             int64
dtype: object

In [6]: pd.__version__
Out[6]: '2.0.2'

tagyieh · 2024-03-28T22:37:05Z

take

tagyieh · 2024-03-30T16:09:59Z

Hello @MarcoGorelli and @phofl

I believe I have corrected this bug, however one of the tests (pandas/tests/copy_view/test_indexing.py::test_subset_set_column_with_loc) seems to be failing with my solution. The output is as follows:

@pytest.mark.parametrize(
        "dtype", ["int64", "float64"], ids=["single-block", "mixed-block"]
    )
    def test_subset_set_column_with_loc(backend, dtype):
        # Case: setting a single column with loc on a viewing subset
        # -> subset.loc[:, col] = value
        _, DataFrame, _ = backend
        df = DataFrame(
            {"a": [1, 2, 3], "b": [4, 5, 6], "c": np.array([7, 8, 9], dtype=dtype)}
        )
        df_orig = df.copy()
        subset = df[1:3]

        subset.loc[:, "a"] = np.array([10, 11], dtype="int64")

        subset._mgr._verify_integrity()
        expected = DataFrame(
            {"a": [10, 11], "b": [5, 6], "c": np.array([8, 9], dtype=dtype)},
            index=range(1, 3),
        )
>       tm.assert_frame_equal(subset, expected)
E       AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="a") are different
E
E       Attribute "dtype" are different
E       [left]:  int64
E       [right]: Int64

If I switch the indexing method to subset["a"] = np.array([10, 11], dtype="int64") (instead of subset.loc[:, "a"]) and run the test with the original code (without my alterations), the test fails with the exact same error as mine.

My question is: if, according to the issue, the only indexing method providing the correct output is using the name of the column itself, i.e. subset["a"], and when running it the test fails, could this test be wrong?

Thank you in advance

asishm · 2024-03-30T18:42:05Z

@MarcoGorelli I think this is a duplicate of #52593 since the int equivalent of

df = pd.DataFrame({'a': [1,2,3]}, dtype='int64')
df.loc[:, 'a'] = df.loc[:, 'a'].astype('int32')
print(df.dtypes) # a is still int64

also doesn't change the dtype

erezinman added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 14, 2024

MarcoGorelli added Datetime Datetime data dtype and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 18, 2024

github-actions bot assigned tagyieh Mar 28, 2024

tagyieh removed their assignment Apr 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Can't change datetime precision in columns/rows #57838

BUG: Can't change datetime precision in columns/rows #57838

erezinman commented Mar 14, 2024 •

edited by MarcoGorelli

Loading

INSTALLED VERSIONS

phofl commented Mar 18, 2024

MarcoGorelli commented Mar 18, 2024 •

edited

Loading

tagyieh commented Mar 28, 2024

tagyieh commented Mar 30, 2024 •

edited

Loading

asishm commented Mar 30, 2024

BUG: Can't change datetime precision in columns/rows #57838

BUG: Can't change datetime precision in columns/rows #57838

Comments

erezinman commented Mar 14, 2024 • edited by MarcoGorelli Loading

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

phofl commented Mar 18, 2024

MarcoGorelli commented Mar 18, 2024 • edited Loading

tagyieh commented Mar 28, 2024

tagyieh commented Mar 30, 2024 • edited Loading

asishm commented Mar 30, 2024

erezinman commented Mar 14, 2024 •

edited by MarcoGorelli

Loading

MarcoGorelli commented Mar 18, 2024 •

edited

Loading

tagyieh commented Mar 30, 2024 •

edited

Loading