-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Description
I'm using cbbpy.mens_scraper.get_games_season(2025, box=False, pbp=False) to scrape NCAA Men's Basketball games for the 2025 season. The script starts scraping, but after a few days' worth of games, it hangs or runs extremely slow and never completes. Example output:
Scraping NCAA Men's Basketball games for 2025 season...
Scraping 100 games on 11/04/24: 2%|▎ | 3 of 182 days scraped in 2.2 sec
After this, the process stalls and does not progress.
Steps to Reproduce
- Install latest CBBpy and dependencies.
- Run the following code:
import pandas as pd import cbbpy.mens_scraper as s df_2025 = s.get_games_season(2025, box=False, pbp=False)
- Observe that scraping hangs after a few days.
Environment
- Python 3.13
- CBBpy 2.1.2
- macOS (also tested on GitHub Actions Ubuntu 24.04 runner)
Expected Behavior
Scraping should complete for all days in the season or provide an error if a specific date or game is causing the issue.
Actual Behavior
Scraping hangs or runs extremely slow after a few days.
Additional Info
- No error messages are shown.
Related Upstream Issue
CBBpy is based on the R package ncaahoopR. Over the summer, ncaahoopR was updated to fix a schedule-related bug for the 2024-25 season (commit 7fbf214). The fix was made in the get_master_schedule function, which was updated to handle cases where the schedule data from ESPN had changed format or contained unexpected values, causing scraping to hang or fail.
Details of the ncaahoopR Fix
- The bug was caused by changes in ESPN's schedule data, which led to incorrect indexing and sometimes infinite loops or hangs.
- The fix added logic to check the bounds of the schedule index and ensure only valid games are processed:
- If the index of scheduled games exceeded the length of the schedule data, it would now correctly slice the schedule array and avoid out-of-bounds errors or hangs.
- This change was made in the file
R/get_master_schedule.Rand released in version 1.8.7.
Reference:
Request
Please review the schedule scraping logic in CBBpy, especially for the 2025 season, and consider implementing a similar fix to handle changes in ESPN's schedule data format or unexpected values. This may resolve the hanging issue.