Skip to content

Scraping always pauses and doesn't finish #51

@jnmiller

Description

@jnmiller

Every time I try to scrape a season (men's), the process gets stuck and hangs. Ctrl-C always gives the same stack trace:

Getting data for season 2022
No games on 11/08/21:   4%|███▌                                                                              | 8 of 182 days scraped in 3.3 sec
Scraping 184 games on 11/09/21:   4%|███                                                                   | 8 of 182 days scraped in 204.8 sec
Traceback (most recent call last):
  File "<SNIP>/./scrape.py", line 30, in <module>
    infos, box_scores, pbps = scraper.get_games_season(season, info=True, box=False, pbp=False)
  File "<SNIP>/lib/python3.10/site-packages/cbbpy/mens_scraper.py", line 80, in get_games_season
    return _get_games_season(season, "mens", info, box, pbp)
  File "<SNIP>/lib/python3.10/site-packages/cbbpy/cbbpy_utils.py", line 233, in _get_games_season
    info = _get_games_range(
  File "<SNIP>/lib/python3.10/site-packages/cbbpy/cbbpy_utils.py", line 186, in _get_games_range
    result = Parallel(n_jobs=cpus)(
  File "<SNIP>/lib/python3.10/site-packages/joblib/parallel.py", line 1952, in __call__
    return output if self.return_generator else list(output)
  File "<SNIP>/lib/python3.10/site-packages/joblib/parallel.py", line 1595, in _get_outputs
    yield from self._retrieve()
  File "<SNIP>/lib/python3.10/site-packages/joblib/parallel.py", line 1707, in _retrieve
    time.sleep(0.01)
KeyboardInterrupt

Is the source site just detecting the scraping and blocking my IP address? Or is something else going on?

I can sometimes successfully scrape a very short date range (like a weekend) but immediately after a success, it stops working and hangs again.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions