Skip to content

[Bug] Understat: KeyError 'statData' on Google Colab (IP Blocking/Cloudflare) #905

@GiorgioMerolla

Description

@GiorgioMerolla

Describe the bug
When attempting to fetch Understat data using Google Colab, the scraper fails with a KeyError: 'statData'.
This appears to be caused by Understat blocking Google Cloud IPs (Cloudflare 403 Forbidden or Challenge Page), resulting in the scraper failing to find the expected JSON data variable.

To Reproduce
Run the following code in a standard Google Colab environment:

import soccerdata as sd
leagues = ['ENG-Premier League']
seasons = ['24-25']
us = sd.Understat(leagues=leagues, seasons=seasons)
df = us.read_team_match_stats()

INFO     Saving cached data to /root/soccerdata/data/Understat
KeyError: 'statData'

Context

OS: Linux (Google Colab Standard Runtime)

Soccerdata Version: [Insert your version here, e.g., 1.8.7]

Observations: * The issue persists even after clearing the cache.

Using proxy='tor' or no_cache=True does not always resolve the issue, suggesting the block is aggressive against Colab IPs.

FBref scraper also returns 403 Forbidden on the same environment.

Suggested Solution / Feature Request Since running scrapers on Colab is a common use case, could we:

Improve the error handling to raise a specific AccessDeniedError instead of KeyError when the response is a Cloudflare block page?

Add documentation on using proxies (or Tor) specifically for Colab users?


### **Why this is better than a "Google Drive" request**
Asking them to "add Google Drive support" might get rejected because the library *already* supports it!
* **The Library's View:** "We already gave you the `data_dir` parameter. You can set that to your Drive folder. We don't need to add code for that."
* **The Real Problem:** The real problem is the **Blocking**. By reporting the `KeyError`, you help them fix the *real* bug (the crash).

### **If you specifically want to propose the "Tor" workaround**
If you want to be very helpful, you can comment on your own Issue saying:
*"I found a workaround for Colab users: Installing `tor` in the notebook and passing `

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions