Skip to content

0.23.4 changed read_csv parsing for a mixed-timezone datetimes #24987

Closed
@TomAugspurger

Description

@TomAugspurger

Previously, a column in a CSV with mixed timezones would (I think) convert each value to UTC and discard the tzinfo.

import pandas as pd
import io

content = """\
Sat, 22 Apr 2017 15:11:58 -0500
Fri, 21 Apr 2017 14:20:57 -0500
Thu, 9 Mar 2017 11:15:30 -0600"""

df = pd.read_csv(io.StringIO(content), parse_dates=True, header=None, names=['day', 'datetime'], index_col='datetime')

On 0.23.4 that's

In [7]: df.index
Out[7]:
DatetimeIndex(['2017-04-22 20:11:58', '2017-04-21 19:20:57',
               '2017-03-09 17:15:30'],
              dtype='datetime64[ns]', name='datetime', freq=None)

On 0.24 that's

In [7]: df.index
Out[7]:
Index([2017-04-22 15:11:58-05:00, 2017-04-21 14:20:57-05:00,
       2017-03-09 11:15:30-06:00],
      dtype='object', name='datetime')

I'm not sure what the expected behavior is here, but I think the old behavior is as good as any.

I haven't verified, but #22380 seems like a likely candidate for introducing the change.

Metadata

Metadata

Assignees

No one assigned

    Labels

    IO CSVread_csv, to_csvTimezonesTimezone data dtype

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions