Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement invalidate_on_push without pushedDate field #602

Merged
merged 11 commits into from
Jul 24, 2023

Conversation

bluekeyes
Copy link
Member

@bluekeyes bluekeyes commented Jul 10, 2023

This is a draft PR to explore how to implement the invalidate_on_push option without using the pushedDate field that has been removed from the GraphQL API. The basic idea is to use the timestamps of past policy-bot statuses to figure out which comments/reviews/etc. are relevant for a particular evaluation:

  1. If no commits have statuses, it's either the first evaluation or a force push replaced all commits, so ignore anything older than the current time (which needs to be the same for all rule evaluations in a policy)
  2. If some commits have statuses, it's a new push to an existing PR, so ignore anything older than the most recent status timestamp
  3. If all commits have statuses:
    1. If there's only one timestamp, it's a re-evaluation of a PR with a single push, so consider everything
    2. If there's more than one timestamp, it's a re-evaluation of a PR with multiple pushes, so ignore anything older than the second most recent status timestamp

Right now, I've only changed the pull.Context and the logic described above, so it doesn't compile. I still need to finish connecting all the parts and then make sure everything works.

Closes #598.

First step in removing our use of the deleted pushedDate field to
implement invalidate_on_push. Load the last policy-bot status (if it
exists) to approximate the evaluation time instead.
@bluekeyes
Copy link
Member Author

bluekeyes commented Jul 12, 2023

It looks like this works in most situations, but one problem case is if a force-push replaces only some of the commits in the PR. Right now, that moves the evaluation cutoff to the time of the last remaining commit, which can lead to incorrect approval. I'm not sure how to solve this yet, since the force-push removed the state that we'd otherwise use to detect the push.

It looks like force-pushes are recorded in the pull request timeline, but we had some historic issues with that API not returning information fast enough and leading to incorrect results (#47, #54), so I'm a bit hesitant to use it again.

@bluekeyes
Copy link
Member Author

After running in to various edge cases that didn't work in the previous approach, I switched to listing the statuses on the (filtered) head commit and using the creation time of the oldest status as the push time. For the most part this is simpler although it does require new API calls. The new calls can be cached and I think are balanced by the removal of all the pushedDate fallback calls.

This method leaves a bit of a gap during webhook delivery and processing where an approval could be incorrectly ignored, but I think this is acceptable and is better than incorrectly approving a PR that should require a new approval. This should only be an issue if either processing is very delayed or a user leaves an approval within seconds of a new push.

I need to test the interactions with ignored commits a bit more, but otherwise, I think this is working.

@bluekeyes bluekeyes marked this pull request as ready for review July 14, 2023 03:23
@@ -301,10 +301,6 @@ func (r *Rule) filterEditedCandidates(ctx context.Context, prctx pull.Context, c
func (r *Rule) filterInvalidCandidates(ctx context.Context, prctx pull.Context, candidates []*common.Candidate) ([]*common.Candidate, []*common.Dismissal, error) {
log := zerolog.Ctx(ctx)

if !r.Options.InvalidateOnPush {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already checked by the caller

@@ -31,7 +31,6 @@ UI to view the detailed approval status of any pull request.
- [`or`, `and`, and `if` (Rule Predicates)](#or-and-and-if-rule-predicates)
- [Cross-organization Membership Tests](#cross-organization-membership-tests)
- [Update Merges](#update-merges)
- [Private Repositories](#private-repositories)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section was removed a long time ago, but we didn't update the ToC

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not still necessary to maintain a ToC by hand, at least for the github web UI: https://github.blog/changelog/2021-04-13-table-of-contents-support-in-markdown-files/


// sortCommits orders commits in history order starting from head. It must be
// called on the unfiltered set of commits.
func sortCommits(commits []*pull.Commit, head string) []*pull.Commit {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is unnecessary because GitHub already returns the commits in this order, but I'd rather be explicit than have this break strangely if the order changes for some reason - this ordering is important for push detection to work in all cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth adding some small test cases for this ordering function?

@bluekeyes bluekeyes requested a review from asvoboda July 14, 2023 03:27
membership map[string]bool
statuses map[string]string
labels []string
pushedAt map[string]time.Time
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could actually be cached globally: once we know the push time for a specific commit in a specific repository, it shouldn't change. I think for now it's easiest to re-compute it on each request, but if the extra requests become a problem, this might be a good optimization.

@@ -84,12 +83,3 @@ func setStringFromEnv(key, prefix string, value *string) bool {
}
return false
}

func setBoolFromEnv(key, prefix string, value *bool) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't removing the support for global env also not a breaking change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change, we never attempt to load the pushedDate field, so the do_not_load_pushed_date option has no effect. In the YAML configuration, the parser errors if the file contains unknown fields. But for environment variables, there's no issue with setting values that Policy Bot doesn't know about, so I think this should be fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for explaining @bluekeyes

Copy link
Member

@asvoboda asvoboda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea to use past statuses on commits seems like a really good workaround for the missing data problem. Its a bit annoying that we've lost data from GH, but I think this should work in most cases. I'm a bit worried about some race between the few seconds after a push and a new status showing up, but as you mentioned in practice I think this probably won't be an issue.


// sortCommits orders commits in history order starting from head. It must be
// called on the unfiltered set of commits.
func sortCommits(commits []*pull.Commit, head string) []*pull.Commit {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth adding some small test cases for this ordering function?

@asvoboda
Copy link
Member

👍 LGTM

@bluekeyes
Copy link
Member Author

#612 adds a global cache for this data, since I'm worried about the cost of all the API calls to list statuses, particularly as a commit accumulates more statuses over time. I'll merge that separately, since this PR is already making enough changes.

@bluekeyes bluekeyes merged commit efde8be into develop Jul 24, 2023
@bluekeyes bluekeyes deleted the bkeyes/remove-pushed-date branch July 24, 2023 21:42
klutchell added a commit to product-os/policies that referenced this pull request Jul 31, 2023
See: palantir/policy-bot#602
See: product-os/policy-bot#4
See:

Change-type: patch
Signed-off-by: Kyle Harding <kyle@balena.io>
klutchell added a commit to product-os/policies that referenced this pull request Aug 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Policy bot stuck on Commit hash does not have a pushed date
4 participants