Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal/pull checks #5365

Merged
merged 9 commits into from
May 1, 2023
Merged

Proposal/pull checks #5365

merged 9 commits into from
May 1, 2023

Conversation

N-o-Z
Copy link
Member

@N-o-Z N-o-Z commented Feb 28, 2023

No description provided.

@N-o-Z N-o-Z added team/versioning-engine Team versioning engine minor-change Used for PRs that don't require issue attached labels Feb 28, 2023
@N-o-Z N-o-Z requested review from ozkatz, arielshaqed and a team February 28, 2023 19:31
@N-o-Z N-o-Z self-assigned this Feb 28, 2023
@N-o-Z
Copy link
Member Author

N-o-Z commented Feb 28, 2023

Added all reviewers from the pull request and changes PRs

@N-o-Z N-o-Z added the exclude-changelog PR description should not be included in next release changelog label Feb 28, 2023
@@ -0,0 +1,108 @@
# $PULLS Proposal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you rename the name to be dolar_pulls.md

Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, great to see how we're advancing on this!

I would really appreciate a feature comparison between $PULL$ and GitHub PRs; maybe also some additional change request mechanisms. As it stands it seems that there are some subtle differences, and I am not sure why.

Examples:

  • Reviews 🥳
  • What happens when I delete a branch and create it elsewhere?
  • Non-required actions (it is fine not to implement them immediately; I would however like to understand how they will fit in later on. For instance running a non-required action is a good way to gain confidence in an action.)
  • In GH "pull PR" and "merge branch" are related but distinct operations. Here the only way to pull is to merge. Given that some merges succeed without happening, maybe we want to keep the distinction?

Anyway, N-E-A-T-!-:---)

* `Source Branch`
* `Source Ref` - The source branch head at the time of creation
* `Destination Branch`
* `Run ID` - The actions associated run id, may be empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a list of run IDs? More generally, what "run IDs" are "associated" with a PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only one runID per pull? if we allow list of runs we can support re-runs with the current head

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actions run ID - the ID of the actions run triggered by the on-pull hook when creating the $PULLS instance.
Similarly to pull requests in GitHub - we don't really need to save the history of the previous failed runs - the only relevant one is the current.

1. Check for conflicts between source and destination branch - if conflict is found, do not run actions and display an appropriate message
2. Run actions - and associate RunID with instance

This will return the $PULLS instance ID, that can be used later for querying and merging etc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a PULL$ is associated with exactly one <src, dst> pair, why do I need another identifier? Is the intent to identify CLOSED PULL$?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't explicitly delete closed and merged $PULLS, we require an additional identifier


GET `/api/v1/repositories/{repository}/branches/{sourceBranch}/$PULLS/{destinationBranch}/{id}`

Will return the status, corresponding actions RunID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure having to specify branches helps if I anyway need to specify the ID.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, can we follow Github's lead here? For example, getting a PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API conforms to the structure we currently use in many other APIs. Specifically it is similar the merge API.
It also enables us to be more efficient when listing from the ref-store

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also think that the suggested API will have a problem supply repository level information on current $PULLS.
If a pull lives under source branch it means I need to query all branches to understand the list of current pulls.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nopcoder I'm not sure it make sense to list all PRs in a repository (in contrary to GitHub) I think the more common use case is to list all PRs for a given branch - and in that sense this structure is more effecient

Add an optional parameter to the Merge API for the $PULLS id.
When performing a merge with a $PULLS id, perform the following preliminary checks:
1. Verify source reference id with source branch head
2. Verify actions run successful
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this means that a run of an "action" needs to be associated with a particular commit digest. (This is always a good idea, of course!)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually a PR in this case. I can have a pre-commit & post-commit actions that are different than my on-pull triggering. The first 2 are associated with a commit, the later is associated to a PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are both correct :)


### Commit

On commit, we will need to go over all the $PULLS where the given branch is the source branch and update them accordingly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also after a merge, I think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need Delete and Merged? on a new commit you will go over all pulls of the branch, but what if the commit is merged? you will still go over it? who should be responsible for the deletion?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was leftover from a previous design which had a more complicated state-machine - removed.

@eden-ohana not sure what the comment refers to


On creation the following occurs:

1. Check for conflicts between source and destination branch - if conflict is found, do not run actions and display an appropriate message
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How? using diff?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diff is a long operation, not sure we want to wait for it to create a pull

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all other operations are not time consuming, we can run Diff. We have a synchronous API for that.
We also don't have to run the full Diff - we can stop at the first conflict.
Another option is to run it as an action, i.e. don't hold the PR creation request hanging.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't go into the implementation details, but as a rule - it will probably be more time consuming to run the actions and then realize there's a conflict than checking for conflicts and run the actions only after verifying there are none

* `Source Branch`
* `Source Ref` - The source branch head at the time of creation
* `Destination Branch`
* `Run ID` - The actions associated run id, may be empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only one runID per pull? if we allow list of runs we can support re-runs with the current head


### Commit

On commit, we will need to go over all the $PULLS where the given branch is the source branch and update them accordingly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need Delete and Merged? on a new commit you will go over all pulls of the branch, but what if the commit is merged? you will still go over it? who should be responsible for the deletion?

### $PULLS Object

$PULLS will be saved in the ref-store under the key:
`$PULLS/<src-branch>/<dest-branch>/<$PULLS-id>`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a pull request is merged we can create another one? it will still be in the ref store, maybe we need to add to the required changes to manage the pull state on the ref store

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once a $PULLS instance is merged it has reached the end of its life.
Creating a new $PULLS instance using the same source and destination, will trigger the actions on the current source head. There no connection between the two instances and both are saved in the ref-store with their state and ids.

On creation the following occurs:

1. Check for conflicts between source and destination branch - if conflict is found, do not run actions and display an appropriate message
2. Run actions - and associate RunID with instance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these the same actions we run when wanting to merge pull?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - merge actions run on the pre-merge and post-merge hooks .
These actions run on on-pull hook which is triggered when creating a new $PULLS

Copy link
Contributor

@itaiad200 itaiad200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting closer 💪

## How it will work

Introduce a new entity called $PULLS at the repository level. The $PULLS entity is identified by a source branch and destination branch.
Each source-destination pair can have only a single `OPEN` $PULLS instance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the harm in starting without this limitation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose was to save on resources, and prevent redundant execution of a possibly very long process (action runs)
I don't think implementing it is a big issue


### Creating a new $PULLS instance

POST `/api/v1/repositories/{repository}/branches/{sourceBranch}/$PULLS/{destinationBranch}`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the source be a commit?

Suggested change
POST `/api/v1/repositories/{repository}/branches/{sourceBranch}/$PULLS/{destinationBranch}`
POST `/api/v1/repositories/{repository}/branches/{sourceRef}/$PULLS/{destinationBranch}`

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - I think we discussed it previously and agreed that similarly to pull request the operation is done on a source branch and not a specific reference


On creation the following occurs:

1. Check for conflicts between source and destination branch - if conflict is found, do not run actions and display an appropriate message
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all other operations are not time consuming, we can run Diff. We have a synchronous API for that.
We also don't have to run the full Diff - we can stop at the first conflict.
Another option is to run it as an action, i.e. don't hold the PR creation request hanging.


GET `/api/v1/repositories/{repository}/branches/{sourceBranch}/$PULLS/{destinationBranch}/{id}`

Will return the status, corresponding actions RunID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, can we follow Github's lead here? For example, getting a PR


### Re-run Actions

In the event of failed actions run, it will be possible to re-run the job via the Actions view
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"From the Actions view" suggests that all actions are now retryable. Is that what we want?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is part of the required changes for actions - unless we implement "re-run" in the $PULLS context


In the event of failed actions run, it will be possible to re-run the job via the Actions view
TBD: How to update runID of $PULLS instance on re-run
TBD: Re-run of `on-pull` should be in the context of the branch and not of the specific ref. This is in contrast to how we would like to re-run
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the conflict checking? Do we run that again?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conflict checking is performed in 2 places:

  1. The first time you create a $PULLS instance - to save redundant execution of actions
  2. As part of the merge process - exactly how it works today

In case of merge to source - rebase - conflict resolution the source head will change and the user will be required to re-run the actions

TBD: How to update runID of $PULLS instance on re-run
TBD: Re-run of `on-pull` should be in the context of the branch and not of the specific ref. This is in contrast to how we would like to re-run
all other types of hooks which should run on the same reference.
**Optional solution:** Do not implement re-run in actions context, instead implement rerun on $PULLS which will in fact,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if I haven't change my HEAD and I still want to retry? We don't have to support that at first.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a real use-case of re-running tests if they succeed.
If we want to initially support re-runs only at the $PULLS context, then we will be decide whatever policy we want to create a new actions run from the $PULLS instance


### Commit

On commit, we will need to go over all the $PULLS where the given branch is the source branch and update them accordingly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftovers from previous state machine - removing

@@ -0,0 +1,1377 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need the state machine?

Co-authored-by: itaiad200 <itaiad200@gmail.com>
@N-o-Z
Copy link
Member Author

N-o-Z commented Mar 6, 2023

Thanks, great to see how we're advancing on this!

I would really appreciate a feature comparison between PULL and GitHub PRs; maybe also some additional change request mechanisms. As it stands it seems that there are some subtle differences, and I am not sure why.

Examples:

  • Reviews partying_face
  • What happens when I delete a branch and create it elsewhere?
  • Non-required actions (it is fine not to implement them immediately; I would however like to understand how they will fit in later on. For instance running a non-required action is a good way to gain confidence in an action.)
  • In GH "pull PR" and "merge branch" are related but distinct operations. Here the only way to pull is to merge. Given that some merges succeed without happening, maybe we want to keep the distinction?

Anyway, N-E-A-T-!-:---)

@arielshaqed Thanks for the valuable feedback!
What we were trying to do was to create the same CI/CD and branch protection functionality while deferring the collaboration features.
To answer some of the questions (and update the PR accordingly):

  1. Will add a section of deferred / non-goals describing the collaboration features.
  2. Deleting/recreating a branch - can go either way. We delete the $PULLS in the background when deleting a branch, or we can keep them. Keeping them is fairly safe, as they will all become invalid once the new branch with a new head is created. So the only possible operations will be to either close the open instance or re-run it
  3. Will add to the deferred section
  4. Not quite sure what you meant - but basically I did try to separate these operations, while trying to not introduce a new API for "pull". If you think this should be completely separated in a manner where the merge hooks won't even run, I think its worth a discussion

Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I really like suggestion 2, it seems much leaner. I think we might make it even leaner by separating concerns.

Once defined in a lakeFS action, it will be possible to trigger the action
manually with a mandatory reference (branch/commit/tag).The lakeFS action
will run on the provided reference and the results will be available in the
just like any other action run.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like manually-run actions. They are really useful for many things, such as manually exporting data or just re-running some test that is known to be flaky. But isn't this orthogonal to the branch protection rule?

In detail: Not sure I understand why I need to trigger an action run. Why can't it be run as a postcommit hook or whatever? The only difference I can see is that such an action is non-blocking for commits -- so we need to implement a nonblocking / "start-only-but-don't-wait" action type. I think we would want it in any case, and the implementation for manual-trigger will in any case need to include this. Can we separate the two and make manual-run a separate topic?

And then we don't even need to implement manual-run actions at first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right about post-commit hooks being enough for this task. In fact, today we don't wait for post hooks to complete, i.e. they are asynchronous. BUT - I think that some of the context is missing from this page. You can find more context here. TLDR: The requirements are that users don't want long running processes to execute with every commit. Potentially these executions are costly and should start explicitly by the user:
image

@itaiad200 itaiad200 merged commit 726c2ab into master May 1, 2023
@itaiad200 itaiad200 deleted the proposal/pull-checks branch May 1, 2023 09:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exclude-changelog PR description should not be included in next release changelog minor-change Used for PRs that don't require issue attached team/versioning-engine Team versioning engine
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants