Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track PR merge times #56304

Open
keith-zephyr opened this issue Mar 28, 2023 · 22 comments
Open

Track PR merge times #56304

keith-zephyr opened this issue Mar 28, 2023 · 22 comments
Assignees
Labels
Process Tracked by the process WG

Comments

@keith-zephyr
Copy link
Contributor

Proposal
With the recent publication of Zephyr's Contributor and Reviewer Expectations, it would be valuable to measure and track PR merge times. This data could inform whether policy changes are helping or hindering the velocity of the project.

LFX insights provides some of this data, but I don't know the methodology used, and the numbers don't look correct. The times seem far too low and the most recent data might be incomplete.

Time To Merge-small

Time in Review-small

GitHub's Insights view only shows the total number of commits per week, without any data for merge and review times.

Tracking the median time for PRs to get the first review and to merge would be a good place to start. This tracking could later be enhanced to distinguish between bug fixes and new features. The expectation is that bug fixes will merge fairly quickly, while new features go through additional review.

@keith-zephyr keith-zephyr added the Process Tracked by the process WG label Mar 28, 2023
@nordicjm
Copy link
Collaborator

I imagine you'd probably need to have a bot scrape through the PRs to get this information. @kartben

@kartben
Copy link
Collaborator

kartben commented Mar 30, 2023

It looks indeed like the LFX data might not be accurate. I have it on my agenda for next week to discuss with the LFX team how to fix the discrepancies we're seeing there.

In the mean time, I've pulled the entire PR history and played with the data a bit:

image

Note that this probably needs to be looked at together with number of outstanding PRs

image

@kartben
Copy link
Collaborator

kartben commented Mar 30, 2023

Just realized I looked at all closed PRs, not those that were closed by merging.

@kartben
Copy link
Collaborator

kartben commented Mar 30, 2023

Closed+Merged PRs only:

image

This doesn't look too far off from the LFX data now.
My guess is they might be showing things in a counter-intuitive way i.e. is the "The average time in days required for pull request/changeset to be merged over the last 90 days" only looking at PRs created in that period, or does it also include PRs closed during the period but that might be much older?

@nordicjm
Copy link
Collaborator

The list of PRs used for this would need to be set somehow, because things like hotfixes should be excluded as they will completely skew the results

@nordicjm
Copy link
Collaborator

And maybe remove doc only changes too, since those also aren't relevant and have very short merge times, especially around release times

@kartben
Copy link
Collaborator

kartben commented Mar 30, 2023

The list of PRs used for this would need to be set somehow, because things like hotfixes should be excluded as they will completely skew the results

Could you clarify what you mean by hotfixes? And what would be the criteria to flag them?

And maybe remove doc only changes too, since those also aren't relevant and have very short merge times, especially around release times

Mmmh ya, maybe, but in the other hand I would argue that looking at median/high-percentile/low-percentile might be enough to automagically not put too much emphasis on some of the outliers, while still allowing us to identify long-term trends.
At the end of the day, it's not really the actual "number of days to close that matters", IMO, but rather to know if it's getting better or worse. So while those fixes with short merge times might skew the absolute numbers (which again, I don't think matter all that much -- do we know what would be a "good" average time to merge? a median one?), they probably don't matter as much when looking at the big picture?

@nordicjm
Copy link
Collaborator

Could you clarify what you mean by hotfixes? And what would be the criteria to flag them?

Anything with the hotfix label: https://github.com/zephyrproject-rtos/zephyr/pulls?q=is%3Apr+label%3AHotfix+is%3Aclosed

Mmmh ya, maybe, but in the other hand I would argue that looking at median/high-percentile/low-percentile might be enough to automagically not put too much emphasis on some of the outliers, while still allowing us to identify long-term trends.

From my view, it's more important to see merge times for features or additions, nor minor changes. Anyone can review documentation, not everyone can review code or code for specific area/driver/system. Maybe even grouping things e.g. average time for a board to get merged, a driver, etc.

@keith-zephyr
Copy link
Contributor Author

Could you clarify what you mean by hotfixes? And what would be the criteria to flag them?

Anything with the hotfix label: https://github.com/zephyrproject-rtos/zephyr/pulls?q=is%3Apr+label%3AHotfix+is%3Aclosed

It could be useful to break out PRs with the "Trivial" label as well: https://github.com/zephyrproject-rtos/zephyr/pulls?q=is%3Apr+is%3Amerged+label%3Atrivial+

Mmmh ya, maybe, but in the other hand I would argue that looking at median/high-percentile/low-percentile might be enough to automagically not put too much emphasis on some of the outliers, while still allowing us to identify long-term trends.

From my view, it's more important to see merge times for features or additions, nor minor changes. Anyone can review documentation, not everyone can review code or code for specific area/driver/system. Maybe even grouping things e.g. average time for a board to get merged, a driver, etc.

+1: tracking PR velocity for new features is the main reason I opened this issue.

@keith-zephyr
Copy link
Contributor Author

@kartben - Your chart in #56304 (comment) looks like a great start on this effort. I'm curious how much the "days to close" increases once trivial and hotfixes are excluded.

This issue has been added to the next Process Working Group agenda. @kartben and @nordicjm, do you think you'll be able to attend that meeting?

@nordicjm
Copy link
Collaborator

This issue has been added to the next Process Working Group agenda. @kartben and @nordicjm, do you think you'll be able to attend that meeting?

That's this Wednesday at 9:00 (UTC-7)?

@keith-zephyr
Copy link
Contributor Author

This issue has been added to the next Process Working Group agenda. @kartben and @nordicjm, do you think you'll be able to attend that meeting?

That's this Wednesday at 9:00 (UTC-7)?

The meeting is at 10:00 (UTC-7). Here's a link to the invite (requires a login): https://lists.zephyrproject.org/g/tsc/viewevent?repeatid=17893&eventid=1726025&calstart=2023-04-05

@kartben
Copy link
Collaborator

kartben commented Mar 31, 2023

@kartben - Your chart in #56304 (comment) looks like a great start on this effort. I'm curious how much the "days to close" increases once trivial and hotfixes are excluded.

This issue has been added to the next Process Working Group agenda. @kartben and @nordicjm, do you think you'll be able to attend that meeting?

yes I plan on attending.

@keith-zephyr
Copy link
Contributor Author

@kartben I just saw your post on the tsc channel. Did you use https://github.com/kartben/zephyr-repo-metrics to generate this data?

@kartben
Copy link
Collaborator

kartben commented Apr 3, 2023

Hey @keith-zephyr -- I actually put the scripts for the PR "dump" here: https://github.com/kartben/gh-pr-stats
As it's probably never gonna be Zephyr or "open source RTOS" specific, I thought it could live in a different repo.
I've committed the CSV file if you want to play with it directly.

Edit: there's also Excel and PowerBI files that can be used as a starting point

@mbolivar-nordic
Copy link
Contributor

Process WG:

  • @nashif and @kartben to get an initial data set into elasticsearch + kibana
  • we will reconvene and look at data when it is available, then start refining our questions and data collection

  • @keith-zephyr I want to brainstorm what metrics we should be tracking
  • @kartben I would add that we want to track things, but also have things that are actionable. I'm talking to the LFX dashboard people tomorrow to discuss the data. We have some interesting dashboards already monitoring open issues, tracking footprint, etc., but do we look at these? Do we have quarter-over-quarter improvement tracking
  • @dleach02 additional aspects to this: 1. getting reviewer to engage, 2. getting submitter to respond, 3. reviewers might not engage due to red CI results. These can all cause delays
  • @nashif I gave up on the LFX dashboard. It used to be useful, but it's very difficult to get accurate data out of it now. Difficult to know where they're getting their information. In my opinion we should track this on our own, with our own data, using elasticsearch or something like that.
  • @nashif three things to track: 1. 90th percent (some PRs are always going to take a long time), 2. 10th percent (need to make sure people have time to review), 3. median.
  • @kartben Curious if we think our overall times are bad or good. They've been relatively flat. Curious if we want to split things by area and surface that to the community. Another thing it's not showing is what happens to really old PRs when they get closed, which will pull the averages up
  • @galak what is the problem we're trying to solve? We get this sort of data every so often. I think we should start there and then figure out the data we need
  • @keith-zephyr What I see is that new features take a really long time to merge. This is where our discussion about how merging large features can be made more efficient which led to our new contributor and reviewer expectations documentation. Anecdotally I think things have gotten better, so I wanted to know if that's actually true. At a high level I want to make sure the project health is good, but I think that getting some data by module/maintenance area would be helpful in seeing if we have outliers and therefore under-maintained areas.
  • @nashif I think we need to start finding out where maintainers are overloaded and either splitting things up or having more maintainers. The first problem statement is having this data available regularly so we can monitor it, so we can then see if process changes result in changes over time.
  • @galak to @keith-zephyr 's point about features taking a long time to merge, I don't think this data is useful to deciding that. Would be curious to get a definition of what we mean by 'new feature' -- e.g. new subsystems like system devicetree, or reworks like sysbuild.
  • @nashif let's take zbus as an example. It's difficult to do this automatically, since existing scripts don't know what new features are. Maybe we should label things as 'new features'.
  • @galak I think finding specific questions we want answers to, like overloaded maintainers or unmaintained areas, would be important.
  • @nashif I think if we start with pull request merge times, we can start looking at data and trends will pop out that will be interesting and we can refine our searches based on it. I think we will want to start getting information on a per-maintainer basis. Sometimes our tagging and scripts are not always good.
  • @keith-zephyr next steps? 1. tracking per subsystem, 2. coming up with a metric for features (label for features), maybe historical data by crude heuristics like LoC added thresholds, or potentially scraping the comments for PRs referencing RFCs
  • @nashif I think step 1 is to create a dashboard, put it in elasticsearch + kibana, and start looking at it.
  • @kartben I agree. I can upload the CSV dump. The more specific we get at what we look at, the more we're going to just be looking at a few samples. I think it would be interesting to track the directories and the files that haven't been touched in a long time and try to track technical debt that way -- surfacing this information is what is needed.
  • @nashif I can work with you on getting this set up. We own this infrastructure so we can do whatever we want. If we come up with something we can recommend to LF or use as a reference, that will be good, but I think maintaining this in parallel will be better

@kartben
Copy link
Collaborator

kartben commented May 19, 2023

I updated and ran some of my scripts to extract more info re: first-time contributors, and thought I would share an update -- both in terms of charts and words (since I feel like my dataviz fu could be improved :))

What you're looking at is (still) the age of a merged PR at the time it is being merged. The repetition of the word "merge" is intentional, as it's important to keep in mind that these charts don't give any insight into "rotting" PRs -i.e. still opened- or PRs that were closed without being merged, and there would probably be some work to be done to extract insights related to these categories of PRs.

Ways to read the data:

All PRs:

image

PRs for first-time contributors:

image

/cc @carlescufi

@kartben
Copy link
Collaborator

kartben commented Aug 11, 2023

I thought folks would be interested in seeing the recent trend in PR merge time -- looking at 12 full months,fro Aug' 22 to Jul' 23

All PRs:

image

PRs for first-time contribtuors:

image

In words:

  • While number of merged PRs from first time contributors keeps increasing (except for June -- release month -- and July -- summer vacation), the average time to merge seems to be going down: 14 days in July vs. 40 days in April -- that's a significant improvement! (again, amount of PRs from first time contributors has not gone down in the same period, quite the opposite)
  • 90th percentile also following a good trend, for both first PRs and all PRs -- i.e the 10% of PRs that take a really long time are now apparently taking less time to get merged
    • not surprisingly looks like summer may have an impact on normal / non-trivial PRs (unlike the ones that would typically come from first-time contributors), as first graph seems to show some "debt" accumulating in July, whereas PRs from first time contributors in the same period were actually quickly tackled.

Let me know if you have comments/questions - I am looking forward to seeing the evolution of the trends after the summer break is over.

@yperess
Copy link
Collaborator

yperess commented Sep 21, 2023

Hey @keith-zephyr -- I actually put the scripts for the PR "dump" here: https://github.com/kartben/gh-pr-stats As it's probably never gonna be Zephyr or "open source RTOS" specific, I thought it could live in a different repo. I've committed the CSV file if you want to play with it directly.

Edit: there's also Excel and PowerBI files that can be used as a starting point

How hard would it be to add 3 more columns to this data? I would love to see:

  • Number of force pushes
  • Number of comments
  • Absolute value of the line diff (lines removed + added)

This would also allow us to start looking at other factors for PR lifespans such as churn. I know I dread getting a comment like "you missed a period in the documentation" after I already got 1+ approvals and throwing the approval away for a typo.

@kartben
Copy link
Collaborator

kartben commented Sep 21, 2023

Hey @keith-zephyr -- I actually put the scripts for the PR "dump" here: https://github.com/kartben/gh-pr-stats As it's probably never gonna be Zephyr or "open source RTOS" specific, I thought it could live in a different repo. I've committed the CSV file if you want to play with it directly.

Edit: there's also Excel and PowerBI files that can be used as a starting point

How hard would it be to add 3 more columns to this data? I would love to see:

  • Number of force pushes
  • Number of comments
  • Absolute value of the line diff (lines removed + added)

This would also allow us to start looking at other factors for PR lifespans such as churn. I know I dread getting a comment like "you missed a period in the documentation" after I already got 1+ approvals and throwing the approval away for a typo.

The last 2 are pretty easy but not sure there is even a way to get #1, but I'll look into it.

Edit: number of force pushes is probably available through https://docs.github.com/en/rest/issues/timeline?apiVersion=2022-11-28

FWIW if we do think about trying to make something useful out of the number of force pushes we need to find a way to exclude the ones made while a PR might have been in draft state.

@keith-zephyr
Copy link
Contributor Author

@kartben - can you provide some updated charts on this issue? It would be helpful to revisit this data, determine if there are any conclusions we can determine from the data and if there are action items to fix problems.

@keith-zephyr
Copy link
Contributor Author

@keith-zephyr - suggests running this once a quarter
@kartben - it would good to integrate this with the PR stats work Anas has done
@nashif - slow PR merge times may not be an issue overall
@kartben - knowing what the PR times for worst 10%, and first time contributors is still useful to track.
@dleach02 - What's the velocity of the project? Internally has created a doc to help his team be more effective at getting PRs submitted.
@nashif - last month merged 1000+ PRs, with good distribution
@nashif 20-30% of PRs in the queue are stale
@keith-zephyr ask is to track this data to monitor health of the project
@nashif - Need to be mindful not to go to deep with the analysis.
@dleach02 - The dashboards created by @nashif and @fabiobaltieri are really helpful.
@keith-zephyr growth of the project improves the signal we can get by monitoring these merge times.
@nashif - agrees - but we should also make sure the project continues to scale (slow CI, etc)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Process Tracked by the process WG
Projects
Status: On hold
Development

No branches or pull requests

5 participants