Skip to content

Conversation

@shruthilayaj
Copy link
Member

@shruthilayaj shruthilayaj commented Jan 20, 2026

Flagpole evaluations take about 10ms, which lets us evaluate about 60k projects
in 10 min. This is not fast enough. This PR is doing a couple things:

  • checks has_transaction flag since indexer only runs when transactions are enabled
  • splits up the projects over 23 hours (same logic as the seer job last hour is used for cleanup)

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Jan 20, 2026
Copy link
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

continue

if project.id % 23 != django_timezone.now().hour:
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hour sharding uses modulo 23 but hours range 0-23

Medium Severity

The sharding logic project.id % 23 != django_timezone.now().hour has an off-by-one mismatch. project.id % 23 produces values 0-22 (23 distinct values), but django_timezone.now().hour produces values 0-23 (24 distinct values). When the current hour is 23, no projects will ever match because no project ID modulo 23 can equal 23. This means projects are never indexed during hour 23 (11pm) each day, creating a gap in the indexing schedule.

Fix in Cursor Fix in Web

continue

if project.id % 23 != django_timezone.now().hour:
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hour check evaluated repeatedly can cause race condition

Medium Severity

The django_timezone.now().hour is called fresh for each project inside the RangeQuerySetWrapper loop. If the iteration takes time and crosses an hour boundary, projects evaluated before the boundary are checked against hour N, while projects evaluated after are checked against hour N+1. This causes some shard-N projects to be skipped (not processed for another 24 hours) and some shard-N+1 projects to be processed early (then again at the next scheduled run). The hour value needs to be captured once before the loop begins.

Fix in Cursor Fix in Web

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems unlikely to happen in prod but probably good practice in case it cases instability in tests.

Comment on lines 40 to 46
Tuple of (project_id, organization_id)
"""
projects = Project.objects.filter(status=ObjectStatus.ACTIVE).select_related("organization")
current_hour = django_timezone.now().hour

for project in RangeQuerySetWrapper(
projects,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The sharding logic project.id % 23 will never equal current_hour when the hour is 23, causing projects for that hour to be skipped.
Severity: MEDIUM

Suggested Fix

To ensure all 24 hours are utilized, change the modulo divisor from 23 to 24. The condition should be updated to project.id % 24 == current_hour.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/sentry/tasks/seer_explorer_index.py#L40-L46

Potential issue: The project filtering logic uses the expression `project.id % 23 ==
current_hour` to distribute the indexing workload across 23 hours of the day. However,
when the `current_hour` is 23, this condition can never be met because the result of a
modulo 23 operation is always an integer between 0 and 22, inclusive. As a result, any
projects that would have been scheduled for the 23rd hour will be skipped, and their
data will not be indexed by the Seer service. This is a silent failure that affects a
subset of projects.

Did we get this right? 👍 / 👎 to inform future reviews.

continue

if project.id % 23 != django_timezone.now().hour:
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems unlikely to happen in prod but probably good practice in case it cases instability in tests.

@shruthilayaj shruthilayaj merged commit e4603b8 into master Jan 20, 2026
66 checks passed
@shruthilayaj shruthilayaj deleted the shruthi/chore/split-explorer-index-load-over-23-h branch January 20, 2026 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants