Skip to content

Python Pandas read_html fails when reading tables from Wikipedia #21499

Open
@dscience7

Description

@dscience7

I am trying to read the tables from a Wikipedia page using the following code:

import pandas as pd
pd.read_html('https://en.wikipedia.org/wiki/2013–14_Premier_League')

Doing that generates the following error:

UnicodeEncodeError: 'ascii' codec can't encode character '\u2013' in    position 14: ordinal not in range(128)

I have tried

pd.read_html('https://en.wikipedia.org/wiki/2013–14_Premier_League', encoding='utf-8')

But still get the same error. The following works:

import requests
r = requests.get('https://en.wikipedia.org/wiki/2017–18_Premier_League')
c = r.content
dfs = pd.read_html(c)

What I want to know is how to get pd.read_html() to work directly on the url without requests. What is it that I don't understand about encoding or is this a problem with Pandas?

I am running an Anaconda distribution of Pandas 0.21.1 and Python 3.5.4. Thanks for any help.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIO HTMLread_html, to_html, Styler.apply, Styler.applymap

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions