fix: improve account analysis and reporting by mnkiefer · Pull Request #15174 · github/gh-aw

mnkiefer · 2026-02-12T14:16:08Z

Improve Bot detection workflow by:
- Increasing the frequency and accuracy of bot detectio
- Making the workflow more robust in handling edge cases and API limitations
- Improving reporting

Copilot

Pull request overview

This pull request significantly modifies the bot detection workflow to increase detection frequency and improve accuracy. The workflow now runs hourly instead of daily, and analyzes a shorter 6-hour time window instead of 3 days. The changes include improved API pagination, better filtering logic, enhanced error handling, and issue deduplication to prevent creating multiple alert issues per day.

Changes:

Changed execution frequency from daily to hourly with reduced time window (3 days → 6 hours)
Added pagination support for API calls to handle large datasets
Implemented issue deduplication to update existing daily reports instead of creating duplicates
Added filtering for closed issues/PRs to reduce alert fatigue
Fixed API endpoint (listByRepo → listForRepo) and added missing let declaration

Comments suppressed due to low confidence (4)

.github/workflows/bot-detection.yml:105

The pagination logic for review comments fetches all pages without any limit. For PRs with thousands of review comments, this could:

Cause performance issues and long execution times
Significantly increase API usage
Potentially hit API rate limits when combined with hourly execution

Consider adding a reasonable limit on the number of review comments fetched per PR (e.g., first 500 comments) to balance thoroughness with performance. This is especially important given the hourly execution schedule.

              const reviewComments = [];
              if (github.paginate?.iterator) {
                for await (const response of github.paginate.iterator(github.rest.pulls.listReviewComments, {
                  owner: context.repo.owner,
                  repo: context.repo.repo,
                  pull_number: pr.number,
                  per_page: 100,
                })) {
                  reviewComments.push(...response.data);
                }
              } else {
                const { data } = await github.rest.pulls.listReviewComments({
                  owner: context.repo.owner,
                  repo: context.repo.repo,
                  pull_number: pr.number,
                  per_page: 100,
                });
                reviewComments.push(...data);
              }

.github/workflows/bot-detection.yml:6

The cron schedule has changed from daily ("17 3 * * *") to hourly ("0 * * * *"), which represents a 24x increase in workflow execution frequency. This change should be carefully considered for its impact on:

API rate limits: Running hourly will make 24x more API calls to GitHub's API, which could hit rate limits, especially with the pagination logic fetching all comments
Cost implications: More workflow runs means higher GitHub Actions minutes usage
Alert fatigue: Hourly bot detection reports could create excessive noise if many new accounts are detected

Consider whether hourly detection is necessary, or if a less frequent schedule (e.g., every 6 hours) would be more appropriate. Also ensure this aligns with GitHub API rate limit constraints given the number of API calls made per run.

    - cron: "0 * * * *"

.github/workflows/bot-detection.yml:274

The issue deduplication logic searches only the first 100 open issues (per_page: 100, line 268) without pagination. In repositories with more than 100 open issues, this could fail to find an existing bot detection issue with today's title, leading to duplicate issues being created.

Consider either:

Adding pagination to ensure all open issues are checked
Using the Search API with a query filter for the specific title
Adding a distinctive label (e.g., "bot-detection-daily") and searching for issues with that label to reduce the search space

              try {
                const { data: existingIssues } = await github.rest.issues.listForRepo({
                  owner: context.repo.owner,
                  repo: context.repo.repo,
                  state: 'open',
                  per_page: 100,
                });

                const existing = existingIssues.find(i => i.title === title);
                if (existing?.number) {
                  existingIssueNumber = existing.number;
                }

.github/workflows/bot-detection.yml:25

The time window has been reduced from 3 days (72 hours) to 6 hours, which is a 92% reduction. Combined with the change to hourly execution, this represents a fundamental shift in detection strategy:

Previous approach: Daily scan of 3-day activity window
New approach: Hourly scan of 6-hour activity window

This change has significant implications:

Coverage: The new approach only looks at recent 6-hour windows, potentially missing bot activity that occurs outside those windows
Detection delay: Bots that are active but space their activity beyond 6 hours apart could be missed
False negatives: Coordinated bot campaigns that operate in waves could slip through the narrower time window

Consider whether a 6-hour window is sufficient for detecting bot patterns, or if a longer window (e.g., 12-24 hours) would provide better detection while still supporting hourly execution.

            const HOURS_BACK = 6;
            const MAX_PR = 50;
            const MIN_ACCOUNT_AGE_DAYS = 14;

            const cutoff = new Date(Date.now() - HOURS_BACK * 60 * 60 * 1000);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-12T14:21:23Z