Skip to content

CBBpy Issue: get_games_season Hangs or Runs Extremely Slow for 2025 Season #64

@jgamblin

Description

@jgamblin

Description

I'm using cbbpy.mens_scraper.get_games_season(2025, box=False, pbp=False) to scrape NCAA Men's Basketball games for the 2025 season. The script starts scraping, but after a few days' worth of games, it hangs or runs extremely slow and never completes. Example output:

Scraping NCAA Men's Basketball games for 2025 season...
Scraping 100 games on 11/04/24:   2%|▎                   | 3 of 182 days scraped in 2.2 sec

After this, the process stalls and does not progress.

Steps to Reproduce

  1. Install latest CBBpy and dependencies.
  2. Run the following code:
    import pandas as pd
    import cbbpy.mens_scraper as s
    
    df_2025 = s.get_games_season(2025, box=False, pbp=False)
  3. Observe that scraping hangs after a few days.

Environment

  • Python 3.13
  • CBBpy 2.1.2
  • macOS (also tested on GitHub Actions Ubuntu 24.04 runner)

Expected Behavior

Scraping should complete for all days in the season or provide an error if a specific date or game is causing the issue.

Actual Behavior

Scraping hangs or runs extremely slow after a few days.

Additional Info

  • No error messages are shown.

Related Upstream Issue

CBBpy is based on the R package ncaahoopR. Over the summer, ncaahoopR was updated to fix a schedule-related bug for the 2024-25 season (commit 7fbf214). The fix was made in the get_master_schedule function, which was updated to handle cases where the schedule data from ESPN had changed format or contained unexpected values, causing scraping to hang or fail.

Details of the ncaahoopR Fix

  • The bug was caused by changes in ESPN's schedule data, which led to incorrect indexing and sometimes infinite loops or hangs.
  • The fix added logic to check the bounds of the schedule index and ensure only valid games are processed:
    • If the index of scheduled games exceeded the length of the schedule data, it would now correctly slice the schedule array and avoid out-of-bounds errors or hangs.
    • This change was made in the file R/get_master_schedule.R and released in version 1.8.7.

Reference:

Request

Please review the schedule scraping logic in CBBpy, especially for the 2025 season, and consider implementing a similar fix to handle changes in ESPN's schedule data format or unexpected values. This may resolve the hanging issue.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions