Description
Is there a possible way to convert the field from int
to str
?
I have explored the issues like #10534, #21379, https://github.com/gte620v/pandas/blob/5cb8243f2dd31cc2155627f29cfc89bbf6d4b84b/pandas/io/tests/test_html.py#L715
I do not think
converters
arg fit for our usage since the table is updated everyday and it may add a new column, then we need manually add a new key to the parameter
Here is the entire stacktrace when I used the function
PS C:\Users\Zhenye.na\Desktop> python3 .\dash-prod.py
.\dash-prod.py:4: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
from collections import Mapping, Iterable
Traceback (most recent call last):
File ".\dash-prod.py", line 59, in <module>
df = pd.read_html(response.text, skiprows=1)
File "C:\Users\Zhenye.na\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\pandas\io\html.py", line 1105, in read_html
displayed_only=displayed_only,
File "C:\Users\Zhenye.na\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\pandas\io\html.py", line 915, in _parse
for table in tables:
File "C:\Users\Zhenye.na\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\pandas\io\html.py", line 213, in <genexpr>
return (self._parse_thead_tbody_tfoot(table) for table in tables)
File "C:\Users\Zhenye.na\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\pandas\io\html.py", line 411, in _parse_thead_tbody_tfoot
header = self._expand_colspan_rowspan(header_rows)
File "C:\Users\Zhenye.na\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\pandas\io\html.py", line 459, in _expand_colspan_rowspan
colspan = int(self._attr_getter(td, "colspan") or 1)
ValueError: invalid literal for int() with base 10: '\\"1\\"'
The core usage of read_html
function code is as follows:
response = requests.get(url, headers=hdrs)
df = pd.read_html(response.text, skiprows=1)[0]
print(df)
I would love to use the read_html
function to extract the table in the response returned from the REST API. I have test the function in a small scale table, which contains only digits and it works. But for the data returned from REST API contains characters and digits.
Here is a demo of what the table looks like: (Assume DC1
and Location 1
has one '\n'
symbol separated)
Date | DC 1 Location 1 | DC 2 Location 2 | DC 3 Location 3 |
---|---|---|---|
03/04 | 1.23.4 | 1.23.4 | 1.23.4 |
04/05 | 1.23.4 | 1.23.4 | 1.23.4 |
I assume the error message may because of the '.'
symbol in field like 1.23.4
but I am not sure how to fix it.
Any ideas or thoughts are appreciated!
Thanks!