Skip to content

read_html - how to prevent the conversion of numerical fields #21379

Open
@vzelen

Description

@vzelen

Hi everyone,

Is it possible to use pandas.read_html function in way so it won't convert the numbers in html tables and export them as they are (as strings)?

Assume that there is a html table which contains a value "60,00".

Reading that table using pandas.read_html will lead to an integer 6000.
Adding the flag thousands='.' will result in a string "60.00".
Adding both flags thousands='.' and decimal=',' will result in a float 60.0.

Is it possible to ask pandas.read_html to stop performing conversion of numbers by itself based on the logic of "thousands" and "decimals"? Since the file may contain data in both EU and US formats.

It would be amazing to use pandas as a tool that will just export the data from html to dataframe as it is and leaves the logic of data postprocessing / conversion / etc to the further logic (which could be also implemented with a help of pandas).

A similar issue that I've described here: https://stackoverflow.com/questions/47327966/pandas-converting-numbers-to-strings-unexpected-results

Thank you a lot in advance, please let me know if it's reasonable to attach examples.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Dtype ConversionsUnexpected or buggy dtype conversionsEnhancementIO HTMLread_html, to_html, Styler.apply, Styler.applymap

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions