Gracefully handle PRs that contain over 100 changed files #1
Description
GitHub GraphQL API has a limit to a number of items when fetching groups of them. Each group of objects can only contain 100 items, and the rest need to be fetched via the pagination mechanism (which is why we fetch PRs themselves through several requests, and not with one big request).
This means that when fetching the list of files affected by each PR we only get the first 100 of them. Which is fine for most PRs, but some mega PRs touch more than that. The solution seems to be simple enough:
- Fetch all PRs normally.
- Record all PRs that contain exactly 100 files (can't be more, less doesn't matter).
- For each of those PRs, which should be just a handful, make a series of requests to get the complete list of changed files.
This should keep the number of requests relatively slow, so we should stay well within our API budget. Of course, if some PR affects several thousand files it will take a hot moment to gather that information, but that should be a rare and temporary occasion (such PRs are hard to rebase and are typically done by core maintainers only, doing big passes on something).
Activity