Skip to content

Conversation

@jedcunningham
Copy link
Member

The prek command was comparing github.sha with itself, which checked no files. Now uses merge-base to check all files changed in the PR against the base branch.

This ensures static checks are actually ran in PRs like #60168.

The prek command was comparing `github.sha` with itself, which checked
no files. Now uses `merge-base` to check all files changed in the PR
against the base branch.
@boring-cyborg boring-cyborg bot added area:dev-tools backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch labels Jan 6, 2026
@jedcunningham jedcunningham requested a review from Copilot January 6, 2026 23:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a bug in the basic-tests workflow where static checks were comparing the same commit (github.sha) with itself, resulting in no files being checked. The fix introduces a merge-base calculation to properly identify all changed files in a PR by comparing against the base branch.

Key changes:

  • Implements merge-base calculation to find the common ancestor between PR and base branch
  • Updates static check command to compare merge-base commit against HEAD instead of comparing github.sha with itself
  • Increases fetch depth and adds fallback logic to handle cases where merge-base isn't found in shallow clones

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +231 to +245
git fetch origin "$BASE_BRANCH" --depth=50

# Try to find merge-base, deepen if not found
MERGE_BASE=$(git merge-base "origin/$BASE_BRANCH" HEAD 2>/dev/null || echo "")

if [ -z "$MERGE_BASE" ]; then
echo "Merge-base not found with depth=50, fetching more history..."
git fetch --deepen=450
git fetch origin "$BASE_BRANCH" --deepen=450
MERGE_BASE=$(git merge-base "origin/$BASE_BRANCH" HEAD)
fi

echo "sha=${MERGE_BASE}" >> $GITHUB_OUTPUT
env:
BASE_BRANCH: ${{ github.base_ref || 'main' }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
git fetch origin "$BASE_BRANCH" --depth=50
# Try to find merge-base, deepen if not found
MERGE_BASE=$(git merge-base "origin/$BASE_BRANCH" HEAD 2>/dev/null || echo "")
if [ -z "$MERGE_BASE" ]; then
echo "Merge-base not found with depth=50, fetching more history..."
git fetch --deepen=450
git fetch origin "$BASE_BRANCH" --deepen=450
MERGE_BASE=$(git merge-base "origin/$BASE_BRANCH" HEAD)
fi
echo "sha=${MERGE_BASE}" >> $GITHUB_OUTPUT
env:
BASE_BRANCH: ${{ github.base_ref || 'main' }}
git fetch origin "$BASE_BRANCH" --depth=$FETCH_DEPTH
# Try to find merge-base, deepen if not found
MERGE_BASE=$(git merge-base "origin/$BASE_BRANCH" HEAD 2>/dev/null || echo "")
if [ -z "$MERGE_BASE" ]; then
echo "Merge-base not found with depth=$FETCH_DEPTH, fetching more history..."
git fetch --deepen=$DEEPEN
git fetch origin "$BASE_BRANCH" --deepen=$DEEPEN
MERGE_BASE=$(git merge-base "origin/$BASE_BRANCH" HEAD)
fi
echo "sha=${MERGE_BASE}" >> $GITHUB_OUTPUT
env:
BASE_BRANCH: ${{ github.base_ref || 'main' }}
FETCH_DEPTH: 50
DEEPEN: 450

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit, maybe we can make the values parameterised for both --depth and --deepen from env :)

Copy link
Contributor

@amoghrajesh amoghrajesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, thats a strange one

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange one .. I wonder when it happened and how it did not work (I am sure it worked before). But it can be done way simpler.

All you need it is to add missing ^ in --from-ref

At the moment you do this check - github.sha already contains merge commit - this is how pull-request workflow works.

https://docs.github.com/en/actions/reference/workflows-and-actions/events-that-trigger-workflows#pull_request

Note that GITHUB_SHA for this event is the last merge commit of the pull request merge branch. If you want to get the commit ID for the last commit to the head branch of the pull request, use github.event.pull_request.head.sha instead.

@potiuk
Copy link
Member

potiuk commented Jan 7, 2026

Superseded by #60202

@potiuk potiuk closed this Jan 7, 2026
@potiuk
Copy link
Member

potiuk commented Jan 7, 2026

This is precisely why we needed 2 depth - because in pull_request the github.sha is merge and it's parent is the commit that was in the target branch when the merge was created (i.e. last time the PR was rebased).

@potiuk
Copy link
Member

potiuk commented Jan 7, 2026

I merged my fix and rebased #60168 to verify that it works (it should fail)

@potiuk
Copy link
Member

potiuk commented Jan 7, 2026

Nice catch @jedcunningham BTW :)

@potiuk
Copy link
Member

potiuk commented Jan 7, 2026

@jedcunningham
Copy link
Member Author

jedcunningham commented Jan 7, 2026

Nice, I'd considered just adding "^" but somehow I confused myself thinking that would only go 1 commit up - that if we had many commits on the branch it'd only look at the last one. I guess it must squash? Anyways, glad its working.

edit:

github.sha is merge and it's parent is the commit that was in the target branch when the merge was created (i.e. last time the PR was rebased)

Ah, a merge commit. Confusing since it has 2 parents. Apparently the "first" parent is the commit from the base branch, which is why this works. TIL.

@jedcunningham jedcunningham deleted the fix_inthewild_sorting branch January 7, 2026 18:52
@potiuk
Copy link
Member

potiuk commented Jan 7, 2026

Ah, a merge commit. Confusing since it has 2 parents. Apparently the "first" parent is the commit from the base branch, which is why this works. TIL.

Correct. This is actually what we very heavily bank on (this why I knew it by heart).
Basically whole selective checks code is based on the fact that we have "merge" commit passed as github.sha and it's first parent is base branch.

https://github.com/apache/airflow/blob/main/dev/breeze/src/airflow_breeze/commands/ci_commands.py#L163:

def get_changed_files(commit_ref: str | None) -> tuple[str, ...]:
    if commit_ref is None:
        return ()
    cmd = [
        "git",
        "diff-tree",
        "--no-commit-id",
        "--name-only",
        "-r",
        commit_ref + "^",
        commit_ref,
    ]
    result = run_command(cmd, check=False, capture_output=True, text=True)
    if result.returncode != 0:
        get_console().print(
            f"[warning] Error when running diff-tree command [/]\n{result.stdout}\n{result.stderr}"
        )
        return ()
    changed_files = tuple(result.stdout.splitlines()) if result.stdout else ()
    get_console().print("\n[info]Changed files:[/]\n")
    get_console().print(changed_files)
    get_console().print()
    return changed_files

@potiuk
Copy link
Member

potiuk commented Jan 7, 2026

BTW. There is also another interesting thing I learned with prek

There is a very nice shortcut to not have to find merge-base.

  • git diff main..your_pr_branch
  • git diff main...your_pr_branch

Spot the difference .... there are THREE dots in second case. And the ... is actually showing the changes in your branch from the merge-base

And ... (pun intended) there are few other gotchas: https://darekkay.com/blog/git-commit-ranges/#git-diff

For example git diff main...your-branch is not reverse of git diff your-branch...diff as you would suspect. The first one shows your changes since merge-base, the second shows SURPRISE -> all the changes in main (!) since merge-base 😱 😱 😱 😱 😱 😱 😱 😱 😱

Git is wild.

@potiuk
Copy link
Member

potiuk commented Jan 7, 2026

So basically if you want to have prek to execute on "only your changes" from a branch you have PR to main with - do this (without running any of git merge-base thingies):

prek --from-ref maim

(we have it in our docs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants