Skip to content

BUG: 3.4 with current versions of bs4/lxml/html5lib minor parsing errors #7229

Closed
@jreback

Description

@jreback
======================================================================
ERROR: test_thousands_macau_index_col (pandas.io.tests.test_html.TestReadHtml)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-3.4\pandas\io\tests\test_html.py", line 410, in test_thousands_macau_index_col
    dfs = self.read_html(macau_data, index_col=0, header=0)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-3.4\pandas\io\tests\test_html.py", line 96, in read_html
    return read_html(*args, **kwargs)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-3.4\pandas\io\html.py", line 840, in read_html
    parse_dates, tupleize_cols, thousands, attrs)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-3.4\pandas\io\html.py", line 709, in _parse
    raise_with_traceback(retained)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-3.4\pandas\compat\__init__.py", line 705, in raise_with_traceback
    raise exc.with_traceback(traceback)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 6275: character maps to <undefined>

======================================================================
ERROR: test_thousands_macau_stats (pandas.io.tests.test_html.TestReadHtml)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-3.4\pandas\io\tests\test_html.py", line 401, in test_thousands_macau_stats
    attrs={'class': 'style1'})
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-3.4\pandas\io\tests\test_html.py", line 96, in read_html
    return read_html(*args, **kwargs)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-3.4\pandas\io\html.py", line 840, in read_html
    parse_dates, tupleize_cols, thousands, attrs)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-3.4\pandas\io\html.py", line 709, in _parse
    raise_with_traceback(retained)
  File "c:\Users\Jeff Reback\Documents\GitHub\pandas\build\lib.win-amd64-3.4\pandas\compat\__init__.py", line 705, in raise_with_traceback
    raise exc.with_traceback(traceback)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 6275: character maps to <undefined>

----------------------------------------------------------------------
Ran 7205 tests in 459.919s
INSTALLED VERSIONS
------------------
commit: None
python: 3.4.0.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: None
nose: 1.3.0
Cython: 0.20.1
numpy: 1.8.0
scipy: 0.13.3
statsmodels: 0.5.0
IPython: None
sphinx: None
patsy: 0.2.1
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.3
bottleneck: 0.8.0
tables: 3.1.0
numexpr: 2.3
matplotlib: 1.3.1
openpyxl: 2.0.3
xlrd: 0.9.3
xlwt: None
xlsxwriter: 0.5.5
lxml: 3.3.5
bs4: 4.3.2
html5lib: 0.999
bq: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.2
pymysql: None
psycopg2: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIO HTMLread_html, to_html, Styler.apply, Styler.applymap

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions