Skip to content

ENH: Expose date parsing arguments in read_html function #49553

Open
@csala

Description

@csala

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

The current read_html function exposes the same parse_dates argument that read_csv has, but it does not expose the rest of arguments that let the user control how the dates are parsed (infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates).

Other arguments unrelated to date parsing may be in the same situation, so maybe this issue could be extended to cover them all.

Feature Description

These arguments, or at least some of them, could be easily exposed directly in the read_html without much hassle, which would be very convenient for the user.

Alternative Solutions

Right now the only viable solution is to skip date parsing altogether during the data loading step and then manually implement the date parsing over the returned data frame.

The problem with this is that it breaks the API uniformity with read_csv, making the implementation of integrations with different input data sources different depending on the data format (function call with arguments vs function call with arguments + postprocessing), while also potentially skipping any optimizations implemented during the read_csv workflow.

Additional Context

From what I could tell skimming over the code, the read_html function only adds a few layers of code on top of the underlying parser, which already supports all the mentioned arguments, and parse_dates is simply pushed down to it untouched letting the parser use the default values for all the others arguments in the list above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    DatetimeDatetime data dtypeEnhancementIO HTMLread_html, to_html, Styler.apply, Styler.applymap

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions