Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create auto_merge package #38019

Merged
merged 3 commits into from
May 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions airbyte-ci/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,4 @@ The installation instructions for the `airbyte-ci` CLI tool cal be found here
| [`connectors_qa`](connectors/connectors_qa/) | A tool to verify connectors have sounds assets and metadata. |
| [`metadata_service`](connectors/metadata_service/) | Tools to generate connector metadata and registry. |
| [`pipelines`](connectors/pipelines/) | Airbyte CI pipelines, including formatting, linting, building, testing connectors, etc. Connector acceptance tests live here. |
| [`auto_merge`](connectors/auto_merge/) | A tool to automatically merge connector pull requests. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding it to the index!

Nit: can you also add the live testing tool to this index in here or separate quick PR?

49 changes: 49 additions & 0 deletions airbyte-ci/connectors/auto_merge/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# `Auto merge`

## Purpose

This Python package is made to merge pull requests automatically on the Airbyte Repo. It is used in
the [following workflow](.github/workflows/auto_merge.yml).

A pull request is currently considered as auto-mergeable if:

- It has the `auto-merge` Github label
- It only modifies files in connector-related directories
- All the required checks have passed

We want to auto-merge a specific set of connector pull requests to simplify the connector updates in
the following use cases:

- Pull requests updating Python dependencies or the connector base image
- Community contributions when they've been reviewed and approved by our team but CI is still
running: to avoid an extra review iteration just to check CI status.

## Install and usage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: prettier -w this.


### Get a Github token

You need to create a Github token with the following permissions:

- Read access to the repository to list open pull requests and their statuses
- Write access to the repository to merge pull requests

### Local install and run

```
poetry install
export GITHUB_TOKEN=<your_github_token>
# By default no merge will be done, you need to set the AUTO_MERGE_PRODUCTION environment variable to true to actually merge the PRs
poetry run auto-merge
```

### In CI

```
export GITHUB_TOKEN=<your_github_token>
export AUTO_MERGE_PRODUCTION=true
poetry install
poetry run auto-merge
```

The execution will set the `GITHUB_STEP_SUMMARY` env var with a markdown summary of the PRs that
have been merged.
754 changes: 754 additions & 0 deletions airbyte-ci/connectors/auto_merge/poetry.lock

Large diffs are not rendered by default.

42 changes: 42 additions & 0 deletions airbyte-ci/connectors/auto_merge/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
[tool.poetry]
name = "auto-merge"
version = "0.1.0"
description = ""
authors = ["Airbyte <contact@airbyte.io>"]
readme = "README.md"
packages = [
{ include = "auto_merge", from = "src" },
]

[tool.poetry.dependencies]
python = "^3.10"
pygithub = "^2.3.0"
anyio = "^4.3.0"


[tool.poetry.group.dev.dependencies]
mypy = "^1.10.0"
ruff = "^0.4.3"
pytest = "^8.2.0"
pyinstrument = "^4.6.2"

[tool.ruff.lint]
select = [
"I" # isort
]

[tool.poetry.scripts]
auto-merge = "auto_merge.main:auto_merge"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

[tool.poe.tasks]
test = "pytest tests"
type_check = "mypy src --disallow-untyped-defs"
lint = "ruff check src"

[tool.airbyte_ci]
optional_poetry_groups = ["dev"]
poe_tasks = ["type_check", "lint",]
Empty file.
12 changes: 12 additions & 0 deletions airbyte-ci/connectors/auto_merge/src/auto_merge/consts.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Copyright (c) 2024 Airbyte, Inc., all rights reserved.

from __future__ import annotations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from __future__ import annotations

I think from __future__ import annotations is not needed in Python 3.10?


AIRBYTE_REPO = "airbytehq/airbyte"
AUTO_MERGE_LABEL = "auto-merge"
BASE_BRANCH = "master"
CONNECTOR_PATH_PREFIXES = {
"airbyte-integrations/connectors",
"docs/integrations/sources",
"docs/integrations/destinations",
}
8 changes: 8 additions & 0 deletions airbyte-ci/connectors/auto_merge/src/auto_merge/env.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Copyright (c) 2024 Airbyte, Inc., all rights reserved.

from __future__ import annotations

import os

GITHUB_TOKEN = os.environ["GITHUB_TOKEN"]
PRODUCTION = os.environ.get("AUTO_MERGE_PRODUCTION", "false").lower() == "true"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💎

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not required for this PR, but we have various ways that we define truthy in airbyte-ci. At some point we should probably try to make them consistent.

27 changes: 27 additions & 0 deletions airbyte-ci/connectors/auto_merge/src/auto_merge/helpers.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Copyright (c) 2024 Airbyte, Inc., all rights reserved.

from __future__ import annotations

import time
from typing import TYPE_CHECKING

if TYPE_CHECKING:
from github.PullRequest import PullRequest


def generate_job_summary_as_markdown(merged_prs: list[PullRequest]) -> str:
"""Generate a markdown summary of the merged PRs

Args:
merged_prs (list[PullRequest]): The PRs that were merged

Returns:
str: The markdown summary
"""
summary_time = time.strftime("%Y-%m-%d %H:%M:%S")
header = "# Auto-merged PRs"
details = f"Summary generated at {summary_time}"
if not merged_prs:
return f"{header}\n\n{details}\n\n**No PRs were auto-merged**\n"
merged_pr_list = "\n".join([f"- [#{pr.number} - {pr.title}]({pr.html_url})" for pr in merged_prs])
return f"{header}\n\n{details}\n\n{merged_pr_list}\n"
118 changes: 118 additions & 0 deletions airbyte-ci/connectors/auto_merge/src/auto_merge/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
# Copyright (c) 2024 Airbyte, Inc., all rights reserved.

from __future__ import annotations

import logging
import os
import time
from collections.abc import Iterable, Iterator
from contextlib import contextmanager
from typing import TYPE_CHECKING

from github import Auth, Github

from .consts import AIRBYTE_REPO, AUTO_MERGE_LABEL, BASE_BRANCH, CONNECTOR_PATH_PREFIXES
from .env import GITHUB_TOKEN, PRODUCTION
from .helpers import generate_job_summary_as_markdown
from .pr_validators import ENABLED_VALIDATORS

if TYPE_CHECKING:
from github.Commit import Commit as GithubCommit
from github.PullRequest import PullRequest
from github.Repository import Repository as GithubRepo

logging.basicConfig()
logger = logging.getLogger("auto_merge")
logger.setLevel(logging.INFO)


@contextmanager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💎 Oooh gettting fancy I see

def github_client() -> Iterator[Github]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat!

client = None
try:
client = Github(auth=Auth.Token(GITHUB_TOKEN), seconds_between_requests=0)
yield client
finally:
if client:
client.close()


def check_if_pr_is_auto_mergeable(head_commit: GithubCommit, pr: PullRequest, required_checks: set[str]) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Ok so here is the most important logic in the system. The business logic.

Its important that its

  1. Readable
  2. Easily extendable
  3. Contained to as few locations (files) as possible

I think youve achieved that so this is suggestion is very optional:

What if we used a validator pattern

def _validate_has_auto_merge_label((head_commit: GithubCommit, pr: PullRequest, required_checks: set[str]) -> bool, Optional[str]:
    has_auto_merge_label = any(label.name == AUTO_MERGE_LABEL for label in pr.labels)
    if not has_auto_merge_label:
        return False, f"PR {pr.number} does not have the {AUTO_MERGE_LABEL} label"

    return True, None
    
def check_if_pr_is_auto_mergeable(head_commit: GithubCommit, pr: PullRequest, required_checks: set[str]) -> bool:
    validators = [
        _validate_has_auto_merge_label,
        # ...
    ]

    for pr_validator in validators:
        is_valid, error = pr_validator(head_commit, pr, required_checks)        
        if not is_valid
            logging.info(error)
            return False

(this is an example and could be improved (Named ValidationArgument|Result type, standard error messaging, etc) but the ideas there.)

Why this way?

  1. It lets us conditionally add/remove validations for a connector more easily. You can imagine that critical connectors or third party connectors might want to add or skip certain validations
  2. Validation logic is easier to test
  3. imo easier to skim and understand what validations run when/under what conditions

Anyway again very optional

"""Run all enabled validators and return if they all pass.

Args:
head_commit (GithubCommit): The head commit of the PR
pr (PullRequest): The PR to check
required_checks (set[str]): The set of required passing checks

Returns:
bool: True if the PR is auto-mergeable, False otherwise
"""
for validator in ENABLED_VALIDATORS:
is_valid, error = validator(head_commit, pr, required_checks)
if not is_valid:
if error:
logger.info(f"PR #{pr.number} - {error}")
return False
return True


def process_pr(repo: GithubRepo, pr: PullRequest, required_passing_contexts: set[str], dry_run: bool) -> None | PullRequest:
"""Process a PR to see if it is auto-mergeable and merge it if it is.

Args:
repo (GithubRepo): The repository the PR is in
pr (PullRequest): The PR to process
required_passing_contexts (set[str]): The set of required passing checks
dry_run (bool): Whether to actually merge the PR or not

Returns:
None | PullRequest: The PR if it was merged, None otherwise
"""
logger.info(f"Processing PR #{pr.number}")
head_commit = repo.get_commit(sha=pr.head.sha)
if check_if_pr_is_auto_mergeable(head_commit, pr, required_passing_contexts):
if not dry_run:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💎

pr.merge()
logger.info(f"PR #{pr.number} was auto-merged")
return pr
else:
logger.info(f"PR #{pr.number} is auto-mergeable but dry-run is enabled")
return None


def back_off_if_rate_limited(github_client: Github) -> None:
"""Sleep if the rate limit is reached

Args:
github_client (Github): The Github client to check the rate limit of
"""
remaining_requests, _ = github_client.rate_limiting
if remaining_requests < 100:
logging.warning(f"Rate limit almost reached. Remaining requests: {remaining_requests}")
if remaining_requests == 0:
logging.warning(f"Rate limited. Sleeping for {github_client.rate_limiting_resettime - time.time()} seconds")
time.sleep(github_client.rate_limiting_resettime - time.time())
return None


def auto_merge() -> None:
"""Main function to auto-merge PRs that are candidates for auto-merge.
If the AUTO_MERGE_PRODUCTION environment variable is not set to "true", this will be a dry run.
"""
dry_run = PRODUCTION is False
with github_client() as gh_client:
repo = gh_client.get_repo(AIRBYTE_REPO)
main_branch = repo.get_branch(BASE_BRANCH)
logger.info(f"Fetching required passing contexts for {BASE_BRANCH}")
required_passing_contexts = set(main_branch.get_required_status_checks().contexts)
candidate_issues = gh_client.search_issues(f"repo:{AIRBYTE_REPO} is:pr label:{AUTO_MERGE_LABEL} base:{BASE_BRANCH} state:open")
prs = [issue.as_pull_request() for issue in candidate_issues]
logger.info(f"Found {len(prs)} open PRs targeting {BASE_BRANCH} with the {AUTO_MERGE_LABEL} label")
merged_prs = []
for pr in prs:
back_off_if_rate_limited(gh_client)
if merged_pr := process_pr(repo, pr, required_passing_contexts, dry_run):
merged_prs.append(merged_pr)
if PRODUCTION:
os.environ["GITHUB_STEP_SUMMARY"] = generate_job_summary_as_markdown(merged_prs)
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Copyright (c) 2024 Airbyte, Inc., all rights reserved.

from __future__ import annotations

from typing import TYPE_CHECKING, Optional, Tuple

from .consts import AUTO_MERGE_LABEL, BASE_BRANCH, CONNECTOR_PATH_PREFIXES

if TYPE_CHECKING:
from github.Commit import Commit as GithubCommit
from github.PullRequest import PullRequest


def has_auto_merge_label(head_commit: GithubCommit, pr: PullRequest, required_checks: set[str]) -> Tuple[bool, Optional[str]]:
has_auto_merge_label = any(label.name == AUTO_MERGE_LABEL for label in pr.labels)
if not has_auto_merge_label:
return False, f"does not have the {AUTO_MERGE_LABEL} label"
return True, None


def targets_main_branch(head_commit: GithubCommit, pr: PullRequest, required_checks: set[str]) -> Tuple[bool, Optional[str]]:
if not pr.base.ref == BASE_BRANCH:
return False, f"does not target {BASE_BRANCH}"
return True, None


def only_modifies_connectors(head_commit: GithubCommit, pr: PullRequest, required_checks: set[str]) -> Tuple[bool, Optional[str]]:
modified_files = pr.get_files()
for file in modified_files:
if not any(file.filename.startswith(prefix) for prefix in CONNECTOR_PATH_PREFIXES):
return False, "is not only modifying connectors"
return True, None


def head_commit_passes_all_required_checks(
head_commit: GithubCommit, pr: PullRequest, required_checks: set[str]
) -> Tuple[bool, Optional[str]]:
successful_status_contexts = [commit_status.context for commit_status in head_commit.get_statuses() if commit_status.state == "success"]
successful_check_runs = [check_run.name for check_run in head_commit.get_check_runs() if check_run.conclusion == "success"]
successful_contexts = set(successful_status_contexts + successful_check_runs)
if not required_checks.issubset(successful_contexts):
return False, "not all required checks passed"
return True, None


# A PR is considered auto-mergeable if:
# - it has the AUTO_MERGE_LABEL
# - it targets the BASE_BRANCH
# - it touches only files in CONNECTOR_PATH_PREFIXES
# - the head commit passes all required checks

# PLEASE BE CAREFUL OF THE VALIDATOR ORDERING
# Let's declared faster checks first as the check_if_pr_is_auto_mergeable function fails fast.
ENABLED_VALIDATORS = [has_auto_merge_label, targets_main_branch, only_modifies_connectors, head_commit_passes_all_required_checks]
3 changes: 2 additions & 1 deletion airbyte-ci/connectors/pipelines/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -745,7 +745,8 @@ E.G.: running Poe tasks on the modified internal packages of the current branch:

| Version | PR | Description |
| ------- | ---------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------- |
| 4.13.0 | [#32715](https://github.com/airbytehq/airbyte/pull/32715) | Tag connector metadata with git info |
| 4.13.1 | [#38020](https://github.com/airbytehq/airbyte/pull/38020) | Add `auto_merge` as an internal package to test. |
| 4.13.0 | [#32715](https://github.com/airbytehq/airbyte/pull/32715) | Tag connector metadata with git info |
| 4.12.7 | [#37787](https://github.com/airbytehq/airbyte/pull/37787) | Remove requirements on dockerhub credentials to run QA checks. |
| 4.12.6 | [#36497](https://github.com/airbytehq/airbyte/pull/36497) | Add airbyte-cdk to list of poetry packages for testing |
| 4.12.5 | [#37785](https://github.com/airbytehq/airbyte/pull/37785) | Set the `--yes-auto-update` flag to `True` by default. |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
from pathlib import Path

INTERNAL_POETRY_PACKAGES = [
"airbyte-ci/connectors/auto_merge",
"airbyte-ci/connectors/pipelines",
"airbyte-ci/connectors/base_images",
"airbyte-ci/connectors/common_utils",
Expand Down
2 changes: 1 addition & 1 deletion airbyte-ci/connectors/pipelines/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "poetry.core.masonry.api"

[tool.poetry]
name = "pipelines"
version = "4.13.0"
version = "4.13.1"
description = "Packaged maintained by the connector operations team to perform CI for connectors' pipelines"
authors = ["Airbyte <contact@airbyte.io>"]

Expand Down
Loading