Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Support for list of formats in pd.to_datetime() #55226

Open
1 of 3 tasks
NickiForte opened this issue Sep 21, 2023 · 2 comments
Open
1 of 3 tasks

ENH: Support for list of formats in pd.to_datetime() #55226

NickiForte opened this issue Sep 21, 2023 · 2 comments
Assignees
Labels
Datetime Datetime data dtype Enhancement Needs Discussion Requires discussion from core team before further action

Comments

@NickiForte
Copy link

NickiForte commented Sep 21, 2023

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Frequently, pd.to_datetime() needs to be used to convert dates of mixed formats. Currently, it supports the format 'mixed', but as the documentation says, it is risky:

“mixed”, to infer the format for each element individually. This is risky, and you should probably use it along with dayfirst.

Wouldn't it be good if this mixed format interpretation could be done without being "risky"?

Feature Description

This mixed format interpretation could be done without ambiguity by letting format accept also lists of multiple formats, e.g. ["%Y-%m-%d", "%Y-%m-%d %H:%M:%S"]. Then the function would first convert all entries that can be converted using the first rule, and then proceed with trying to convert the remaining unparsed entries using the next rules in the list.

Example usage:

import pandas as pd
data = ['2023-09-21', '2023-09-22 14:30:00', '21/09/2023']
parsed_dates = pd.to_datetime(data, format=["%Y-%m-%d", "%Y-%m-%d %H:%M:%S", "%d/%m/%Y"])

Alternative Solutions

This example illustrates quite well how this could be implemented:

import pandas as pd

def to_datetime(series, format=None, errors="raise", **kwargs):
    temp_series = pd.Series(index=series.index, dtype="datetime64[ns]")
    for rule in format:
        temp_series.loc[temp_series.isnull()] = pd.to_datetime(
            series.loc[temp_series.isnull()], format=rule, errors="coerce", **kwargs
        )
        if temp_series.isnull().sum() == 0:
            series.loc[:] = temp_series
            break
    else:
        if errors == "raise":
            example_date = series[temp_series.isnull()].iloc[0]
            raise ValueError(
                f"Could not parse all dates with the provided rules, for example: {example_date}"
            )
        elif errors == "coerce":
            series.loc[:] = temp_series
    return series


to_datetime(
    pd.Series(["2020-01-01", "2020-01-02 00:00:01", "01.03.2020"]),
    format=[
        "%Y-%m-%d",
        "%Y-%m-%d %H:%M:%S",
    ],
    errors="coerce",
)

Additional Context

No response

@NickiForte NickiForte added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 21, 2023
@MarcoGorelli MarcoGorelli added Needs Discussion Requires discussion from core team before further action Datetime Datetime data dtype and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 21, 2023
@MarcoGorelli
Copy link
Member

thanks for the suggestion - yeah I've had something like this in the back of my mind, might be a good one

@gusthavoMFS
Copy link

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Enhancement Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

3 participants