Skip to content

[RFC] Enable Merge Queues #3606

@Swiddis

Description

@Swiddis

Problem Statement
Inspired over the past year by Tyler Cipriani's post and using them on other projects: we currently don't have any post-merge protection for our branch. This causes issues with merge skew: two colliding PRs can cause a failing main branch (while passing individually).

Merge queue example: without queues, branches A and B may pass but A+B fails. With queues, B will be tested as A+B instead.

Current State
Because we don't have this protection, we're pretty regularly susceptible to main breaking due to external factors or collisions. If we wanted to be safe, we would need to effectively manually do the same thing as a merge queue: rerun tests right before merging. When we do this, it causes a lot of time to be spent waiting synchronously on CI.

It also means that there's a natural incentive to "just merge," and ignore CI entirely. It's up to maintainer discipline whether the tests always pass or not.

Proposal
We can enable merge queues for our repos (sql and opensearch-spark). Doing this will:

  • Enable us to more confidently merge PRs while protecting the main branch.
  • Speed up merging: the queue can be run in parallel.
  • Guarantee that main is always passing tests.
  • Incentivize fixing flaky tests, as these will have a greater visible impact in the queue.
  • Prevent instances of accidentally merging PRs with failing tests.

The primary drawback is: the queue can get blocked if tests start failing for unrelated reasons (e.g. sonatype outage). It's up to the team whether we want to prioritize always passing tests, or avoiding the inconvenience of waiting on patches for upstream breakage. There's a permissions role that lets you override the merge queue in an emergency. (There are other shortcomings of GitHub's implementation, but I generally don't consider them significant enough to block implementing this.)

Implementation Discussion
We submit a request to the Org's .github. From there, all we need to do is add a merge_group: trigger to relevant CI actions (most obviously the tests, possibly other actions too). Then the queue will be active for all merged PRs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    RFCRequest For CommentsinfrastructureChanges to infrastructure, testing, CI/CD, pipelines, etc.

    Type

    No type

    Projects

    Status

    New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions