Track PR merge times #56304

keith-zephyr · 2023-03-28T21:04:10Z

Proposal
With the recent publication of Zephyr's Contributor and Reviewer Expectations, it would be valuable to measure and track PR merge times. This data could inform whether policy changes are helping or hindering the velocity of the project.

LFX insights provides some of this data, but I don't know the methodology used, and the numbers don't look correct. The times seem far too low and the most recent data might be incomplete.

GitHub's Insights view only shows the total number of commits per week, without any data for merge and review times.

Tracking the median time for PRs to get the first review and to merge would be a good place to start. This tracking could later be enhanced to distinguish between bug fixes and new features. The expectation is that bug fixes will merge fairly quickly, while new features go through additional review.

nordicjm · 2023-03-29T09:05:28Z

I imagine you'd probably need to have a bot scrape through the PRs to get this information. @kartben

kartben · 2023-03-30T11:43:49Z

It looks indeed like the LFX data might not be accurate. I have it on my agenda for next week to discuss with the LFX team how to fix the discrepancies we're seeing there.

In the mean time, I've pulled the entire PR history and played with the data a bit:

Note that this probably needs to be looked at together with number of outstanding PRs

kartben · 2023-03-30T11:59:17Z

Just realized I looked at all closed PRs, not those that were closed by merging.

kartben · 2023-03-30T12:45:42Z

Closed+Merged PRs only:

This doesn't look too far off from the LFX data now.
My guess is they might be showing things in a counter-intuitive way i.e. is the "The average time in days required for pull request/changeset to be merged over the last 90 days" only looking at PRs created in that period, or does it also include PRs closed during the period but that might be much older?

nordicjm · 2023-03-30T12:47:28Z

The list of PRs used for this would need to be set somehow, because things like hotfixes should be excluded as they will completely skew the results

nordicjm · 2023-03-30T12:48:19Z

And maybe remove doc only changes too, since those also aren't relevant and have very short merge times, especially around release times

kartben · 2023-03-30T12:58:03Z

The list of PRs used for this would need to be set somehow, because things like hotfixes should be excluded as they will completely skew the results

Could you clarify what you mean by hotfixes? And what would be the criteria to flag them?

And maybe remove doc only changes too, since those also aren't relevant and have very short merge times, especially around release times

Mmmh ya, maybe, but in the other hand I would argue that looking at median/high-percentile/low-percentile might be enough to automagically not put too much emphasis on some of the outliers, while still allowing us to identify long-term trends.
At the end of the day, it's not really the actual "number of days to close that matters", IMO, but rather to know if it's getting better or worse. So while those fixes with short merge times might skew the absolute numbers (which again, I don't think matter all that much -- do we know what would be a "good" average time to merge? a median one?), they probably don't matter as much when looking at the big picture?

nordicjm · 2023-03-30T13:01:33Z

Could you clarify what you mean by hotfixes? And what would be the criteria to flag them?

Anything with the hotfix label: https://github.com/zephyrproject-rtos/zephyr/pulls?q=is%3Apr+label%3AHotfix+is%3Aclosed

Mmmh ya, maybe, but in the other hand I would argue that looking at median/high-percentile/low-percentile might be enough to automagically not put too much emphasis on some of the outliers, while still allowing us to identify long-term trends.

From my view, it's more important to see merge times for features or additions, nor minor changes. Anyone can review documentation, not everyone can review code or code for specific area/driver/system. Maybe even grouping things e.g. average time for a board to get merged, a driver, etc.

keith-zephyr · 2023-03-30T14:39:40Z

Could you clarify what you mean by hotfixes? And what would be the criteria to flag them?

Anything with the hotfix label: https://github.com/zephyrproject-rtos/zephyr/pulls?q=is%3Apr+label%3AHotfix+is%3Aclosed

It could be useful to break out PRs with the "Trivial" label as well: https://github.com/zephyrproject-rtos/zephyr/pulls?q=is%3Apr+is%3Amerged+label%3Atrivial+

Mmmh ya, maybe, but in the other hand I would argue that looking at median/high-percentile/low-percentile might be enough to automagically not put too much emphasis on some of the outliers, while still allowing us to identify long-term trends.

From my view, it's more important to see merge times for features or additions, nor minor changes. Anyone can review documentation, not everyone can review code or code for specific area/driver/system. Maybe even grouping things e.g. average time for a board to get merged, a driver, etc.

+1: tracking PR velocity for new features is the main reason I opened this issue.

keith-zephyr · 2023-03-30T14:47:55Z

@kartben - Your chart in #56304 (comment) looks like a great start on this effort. I'm curious how much the "days to close" increases once trivial and hotfixes are excluded.

This issue has been added to the next Process Working Group agenda. @kartben and @nordicjm, do you think you'll be able to attend that meeting?

nordicjm · 2023-03-31T08:12:29Z

This issue has been added to the next Process Working Group agenda. @kartben and @nordicjm, do you think you'll be able to attend that meeting?

That's this Wednesday at 9:00 (UTC-7)?

keith-zephyr · 2023-03-31T16:44:03Z

This issue has been added to the next Process Working Group agenda. @kartben and @nordicjm, do you think you'll be able to attend that meeting?

That's this Wednesday at 9:00 (UTC-7)?

The meeting is at 10:00 (UTC-7). Here's a link to the invite (requires a login): https://lists.zephyrproject.org/g/tsc/viewevent?repeatid=17893&eventid=1726025&calstart=2023-04-05

kartben · 2023-03-31T17:00:40Z

@kartben - Your chart in #56304 (comment) looks like a great start on this effort. I'm curious how much the "days to close" increases once trivial and hotfixes are excluded.

This issue has been added to the next Process Working Group agenda. @kartben and @nordicjm, do you think you'll be able to attend that meeting?

yes I plan on attending.

keith-zephyr · 2023-04-03T17:24:31Z

@kartben I just saw your post on the tsc channel. Did you use https://github.com/kartben/zephyr-repo-metrics to generate this data?

kartben · 2023-04-03T17:48:10Z

Hey @keith-zephyr -- I actually put the scripts for the PR "dump" here: https://github.com/kartben/gh-pr-stats
As it's probably never gonna be Zephyr or "open source RTOS" specific, I thought it could live in a different repo.
I've committed the CSV file if you want to play with it directly.

Edit: there's also Excel and PowerBI files that can be used as a starting point

mbolivar-nordic · 2023-04-05T16:49:31Z

Process WG:

@nashif and @kartben to get an initial data set into elasticsearch + kibana
we will reconvene and look at data when it is available, then start refining our questions and data collection

@keith-zephyr I want to brainstorm what metrics we should be tracking
@kartben I would add that we want to track things, but also have things that are actionable. I'm talking to the LFX dashboard people tomorrow to discuss the data. We have some interesting dashboards already monitoring open issues, tracking footprint, etc., but do we look at these? Do we have quarter-over-quarter improvement tracking
@dleach02 additional aspects to this: 1. getting reviewer to engage, 2. getting submitter to respond, 3. reviewers might not engage due to red CI results. These can all cause delays
@nashif I gave up on the LFX dashboard. It used to be useful, but it's very difficult to get accurate data out of it now. Difficult to know where they're getting their information. In my opinion we should track this on our own, with our own data, using elasticsearch or something like that.
@nashif three things to track: 1. 90th percent (some PRs are always going to take a long time), 2. 10th percent (need to make sure people have time to review), 3. median.
@kartben Curious if we think our overall times are bad or good. They've been relatively flat. Curious if we want to split things by area and surface that to the community. Another thing it's not showing is what happens to really old PRs when they get closed, which will pull the averages up
@galak what is the problem we're trying to solve? We get this sort of data every so often. I think we should start there and then figure out the data we need
@keith-zephyr What I see is that new features take a really long time to merge. This is where our discussion about how merging large features can be made more efficient which led to our new contributor and reviewer expectations documentation. Anecdotally I think things have gotten better, so I wanted to know if that's actually true. At a high level I want to make sure the project health is good, but I think that getting some data by module/maintenance area would be helpful in seeing if we have outliers and therefore under-maintained areas.
@nashif I think we need to start finding out where maintainers are overloaded and either splitting things up or having more maintainers. The first problem statement is having this data available regularly so we can monitor it, so we can then see if process changes result in changes over time.
@galak to @keith-zephyr 's point about features taking a long time to merge, I don't think this data is useful to deciding that. Would be curious to get a definition of what we mean by 'new feature' -- e.g. new subsystems like system devicetree, or reworks like sysbuild.
@nashif let's take zbus as an example. It's difficult to do this automatically, since existing scripts don't know what new features are. Maybe we should label things as 'new features'.
@galak I think finding specific questions we want answers to, like overloaded maintainers or unmaintained areas, would be important.
@nashif I think if we start with pull request merge times, we can start looking at data and trends will pop out that will be interesting and we can refine our searches based on it. I think we will want to start getting information on a per-maintainer basis. Sometimes our tagging and scripts are not always good.
@keith-zephyr next steps? 1. tracking per subsystem, 2. coming up with a metric for features (label for features), maybe historical data by crude heuristics like LoC added thresholds, or potentially scraping the comments for PRs referencing RFCs
@nashif I think step 1 is to create a dashboard, put it in elasticsearch + kibana, and start looking at it.
@kartben I agree. I can upload the CSV dump. The more specific we get at what we look at, the more we're going to just be looking at a few samples. I think it would be interesting to track the directories and the files that haven't been touched in a long time and try to track technical debt that way -- surfacing this information is what is needed.
@nashif I can work with you on getting this set up. We own this infrastructure so we can do whatever we want. If we come up with something we can recommend to LF or use as a reference, that will be good, but I think maintaining this in parallel will be better

kartben · 2023-05-19T10:51:06Z

I updated and ran some of my scripts to extract more info re: first-time contributors, and thought I would share an update -- both in terms of charts and words (since I feel like my dataviz fu could be improved :))

What you're looking at is (still) the age of a merged PR at the time it is being merged. The repetition of the word "merge" is intentional, as it's important to keep in mind that these charts don't give any insight into "rotting" PRs -i.e. still opened- or PRs that were closed without being merged, and there would probably be some work to be done to extract insights related to these categories of PRs.

Ways to read the data:

50% of PRs merged in April had been opened less than 3 days before being merged.
10% of PRs merged in April were 40 days old or more
It's worth noting that the median "time-to-merge" seems to be pretty constant over the months: 50% of all the merged PRs end up being merged in 4 (calendar!) days or less, which sounds pretty good to me (note to self: it might be interesting to update the script so that "PR age" is counted in terms of business days, not calendar days)
With regards to first-time contributors:
- 48 people had their first PR merged into the zephyr repo in April (🥳)
- 50% of these PRs (i.e. 24 ppl) had been opened for less than 8 days before being merged.
- 5 of these 48 people (90th percentile) had submitted their PR over 145 days prior (almost 5 months). Here are the associated PRs, should there be useful lessons to learn, although that's of course a very small sample set, starting with the one that took the longest to merge:
  - Support Hamamatsu Photonics S11059 Color Sensor #48810 (252 days, i.e. 8+ months)
  - boards: arm: add PhyBOARD Polis (i.MX8MM) #49212 (228 days)
  - drivers: sensor: Vishay VEML7700 ambient light sensor #51433 (189 days)
  - drivers: console: uart_console.c: suggestion to add new line for unix or linux host #51805 (158 days)
  - drivers: sensor: bmp388: Add I2C-base or SPI-base interface in build … #51878 (141 days)

All PRs:

PRs for first-time contributors:

/cc @carlescufi

kartben · 2023-08-11T17:43:55Z

I thought folks would be interested in seeing the recent trend in PR merge time -- looking at 12 full months,fro Aug' 22 to Jul' 23

All PRs:

PRs for first-time contribtuors:

In words:

While number of merged PRs from first time contributors keeps increasing (except for June -- release month -- and July -- summer vacation), the average time to merge seems to be going down: 14 days in July vs. 40 days in April -- that's a significant improvement! (again, amount of PRs from first time contributors has not gone down in the same period, quite the opposite)
90th percentile also following a good trend, for both first PRs and all PRs -- i.e the 10% of PRs that take a really long time are now apparently taking less time to get merged
- not surprisingly looks like summer may have an impact on normal / non-trivial PRs (unlike the ones that would typically come from first-time contributors), as first graph seems to show some "debt" accumulating in July, whereas PRs from first time contributors in the same period were actually quickly tackled.

Let me know if you have comments/questions - I am looking forward to seeing the evolution of the trends after the summer break is over.

yperess · 2023-09-21T18:29:57Z

Hey @keith-zephyr -- I actually put the scripts for the PR "dump" here: https://github.com/kartben/gh-pr-stats As it's probably never gonna be Zephyr or "open source RTOS" specific, I thought it could live in a different repo. I've committed the CSV file if you want to play with it directly.

Edit: there's also Excel and PowerBI files that can be used as a starting point

How hard would it be to add 3 more columns to this data? I would love to see:

Number of force pushes
Number of comments
Absolute value of the line diff (lines removed + added)

This would also allow us to start looking at other factors for PR lifespans such as churn. I know I dread getting a comment like "you missed a period in the documentation" after I already got 1+ approvals and throwing the approval away for a typo.

kartben · 2023-09-21T18:35:17Z

Hey @keith-zephyr -- I actually put the scripts for the PR "dump" here: https://github.com/kartben/gh-pr-stats As it's probably never gonna be Zephyr or "open source RTOS" specific, I thought it could live in a different repo. I've committed the CSV file if you want to play with it directly.

Edit: there's also Excel and PowerBI files that can be used as a starting point

How hard would it be to add 3 more columns to this data? I would love to see:

Number of force pushes

Number of comments

Absolute value of the line diff (lines removed + added)

This would also allow us to start looking at other factors for PR lifespans such as churn. I know I dread getting a comment like "you missed a period in the documentation" after I already got 1+ approvals and throwing the approval away for a typo.

The last 2 are pretty easy but not sure there is even a way to get #1, but I'll look into it.

Edit: number of force pushes is probably available through https://docs.github.com/en/rest/issues/timeline?apiVersion=2022-11-28

FWIW if we do think about trying to make something useful out of the number of force pushes we need to find a way to exclude the ones made while a PR might have been in draft state.

keith-zephyr · 2024-01-17T01:41:27Z

@kartben - can you provide some updated charts on this issue? It would be helpful to revisit this data, determine if there are any conclusions we can determine from the data and if there are action items to fix problems.

keith-zephyr · 2024-06-05T16:56:51Z

@keith-zephyr - suggests running this once a quarter
@kartben - it would good to integrate this with the PR stats work Anas has done
@nashif - slow PR merge times may not be an issue overall
@kartben - knowing what the PR times for worst 10%, and first time contributors is still useful to track.
@dleach02 - What's the velocity of the project? Internally has created a doc to help his team be more effective at getting PRs submitted.
@nashif - last month merged 1000+ PRs, with good distribution
@nashif 20-30% of PRs in the queue are stale
@keith-zephyr ask is to track this data to monitor health of the project
@nashif - Need to be mindful not to go to deep with the analysis.
@dleach02 - The dashboards created by @nashif and @fabiobaltieri are really helpful.
@keith-zephyr growth of the project improves the signal we can get by monitoring these merge times.
@nashif - agrees - but we should also make sure the project continues to scale (slow CI, etc)

keith-zephyr added the Process Tracked by the process WG label Mar 28, 2023

keith-zephyr assigned keith-zephyr and kartben and unassigned keith-zephyr Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track PR merge times #56304

Track PR merge times #56304

keith-zephyr commented Mar 28, 2023

nordicjm commented Mar 29, 2023

kartben commented Mar 30, 2023

kartben commented Mar 30, 2023

kartben commented Mar 30, 2023

nordicjm commented Mar 30, 2023

nordicjm commented Mar 30, 2023

kartben commented Mar 30, 2023

nordicjm commented Mar 30, 2023

keith-zephyr commented Mar 30, 2023

keith-zephyr commented Mar 30, 2023

nordicjm commented Mar 31, 2023

keith-zephyr commented Mar 31, 2023

kartben commented Mar 31, 2023

keith-zephyr commented Apr 3, 2023

kartben commented Apr 3, 2023 •

edited

Loading

mbolivar-nordic commented Apr 5, 2023

kartben commented May 19, 2023 •

edited

Loading

kartben commented Aug 11, 2023

yperess commented Sep 21, 2023

kartben commented Sep 21, 2023 •

edited

Loading

keith-zephyr commented Jan 17, 2024

keith-zephyr commented Jun 5, 2024

Track PR merge times #56304

Track PR merge times #56304

Comments

keith-zephyr commented Mar 28, 2023

nordicjm commented Mar 29, 2023

kartben commented Mar 30, 2023

kartben commented Mar 30, 2023

kartben commented Mar 30, 2023

nordicjm commented Mar 30, 2023

nordicjm commented Mar 30, 2023

kartben commented Mar 30, 2023

nordicjm commented Mar 30, 2023

keith-zephyr commented Mar 30, 2023

keith-zephyr commented Mar 30, 2023

nordicjm commented Mar 31, 2023

keith-zephyr commented Mar 31, 2023

kartben commented Mar 31, 2023

keith-zephyr commented Apr 3, 2023

kartben commented Apr 3, 2023 • edited Loading

mbolivar-nordic commented Apr 5, 2023

kartben commented May 19, 2023 • edited Loading

kartben commented Aug 11, 2023

yperess commented Sep 21, 2023

kartben commented Sep 21, 2023 • edited Loading

keith-zephyr commented Jan 17, 2024

keith-zephyr commented Jun 5, 2024

kartben commented Apr 3, 2023 •

edited

Loading

kartben commented May 19, 2023 •

edited

Loading

kartben commented Sep 21, 2023 •

edited

Loading