read_html - how to prevent the conversion of numerical fields

Hi everyone,

Is it possible to use pandas.read_html function in way so it won't convert the numbers in html tables and export them as they are (as strings)?

Assume that there is a html table which contains a value "60,00".

Reading that table using pandas.read_html will lead to an integer 6000.
Adding the flag _thousands='.'_ will result in a string "60.00".
Adding both flags _thousands='.'_ and _decimal=','_ will result in a float 60.0.

Is it possible to ask pandas.read_html to stop performing conversion of numbers by itself based on the logic of "thousands" and "decimals"? Since the file may contain data in both EU and US formats. 

It would be amazing to use pandas as a tool that will just export the data from html to dataframe _as it is_ and leaves the logic of data postprocessing / conversion / etc to the further logic (which could be also implemented with a help of pandas).

A similar issue that I've described here: https://stackoverflow.com/questions/47327966/pandas-converting-numbers-to-strings-unexpected-results

Thank you a lot in advance, please let me know if it's reasonable to attach examples.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

read_html - how to prevent the conversion of numerical fields #21379

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

read_html - how to prevent the conversion of numerical fields #21379

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions