Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROPOSAL] Author CHANGELOG in each PR instead of collecting them in the last days before a release #1868

Closed
dblock opened this issue Jan 7, 2022 · 80 comments
Assignees
Labels
discuss Issues intended to help drive brainstorming and decision making feedback needed Issue or PR needs feedback

Comments

@dblock
Copy link
Member

dblock commented Jan 7, 2022

What kind of business use case are you trying to solve? What are your requirements?

Release notes are of varying quality.

For example, #2489 is completely unreadable.

What is the problem? What is preventing you from meeting the requirements?

  • Release notes aren't always super useful and may lack context.
  • Release notes are only available at release time, and not during development, so it's unclear what has changed.
  • Each plugin team is responsible for release notes, and the person assembling the release notes may not be the best one to write about a specific change, so YMMV.

What are you proposing? What do you suggest we do to solve the problem or improve the existing situation?

  1. Replace authoring release notes at the end before a release with authoring release notes in each PR.
  2. Require that release notes be updated via something like danger-changelog with every change.
  3. Automate collection of release notes across all projects.
    Add release notes collection to build workflow assembly opensearch-build#438
    [Workflow] Automation validating release notes opensearch-build#698
@dblock
Copy link
Member Author

dblock commented Jan 7, 2022

cc: @elfisher

@dblock dblock transferred this issue from opensearch-project/project-meta Jan 7, 2022
@dblock dblock added discuss Issues intended to help drive brainstorming and decision making feedback needed Issue or PR needs feedback labels Jan 7, 2022
@stockholmux
Copy link
Member

I see how this addresses Release notes are only available at release time, which I like, but I'm failing to understand how it address the other two points.

It seems like the output of danger-changelog is not too dissimilar to the existing release notes (last good example - subsequent 1.2.x are all log4j fixes), except without the context provided by the categories.

I will add that the release notes for a release do go through a human spot check. Often times that spot check will catch things like incorrect terminology, typos, or inconsistencies. Does the script driven approach allow for this?

@dblock
Copy link
Member Author

dblock commented Jan 7, 2022

It seems like the output of danger-changelog is not too dissimilar to the existing release notes (last good example - subsequent 1.2.x are all log4j fixes), except without the context provided by the categories.

danger-changelog just checks that the PR includes a well-formatted entry for the changelog, it doesn't actually write it. The change I am proposing is that humans are required to author the changelog line and it gets reviewed as part of the PR

I will add that the release notes for a release do go through a human spot check. Often times that spot check will catch things like incorrect terminology, typos, or inconsistencies. Does the script driven approach allow for this?

It doens't. I am proposing to remove any script-driven approach completely and move the authoring of release notes/changelog to the PRs, sorry if this wasn't clear. I've edited the issue and removed the CHANGELOG part, it just muddies the waters

@dblock dblock changed the title [PROPOSAL] Replace release notes with a CHANGELOG [PROPOSAL] Author replace release in each PR instead of collecting them in the last days before a release Jan 7, 2022
@elfisher
Copy link

elfisher commented Jan 7, 2022

Would there be an expectation as part of the code review to review the release notes in the PR as well? That might help with quality.

@dblock
Copy link
Member Author

dblock commented Jan 10, 2022

Would there be an expectation as part of the code review to review the release notes in the PR as well? That might help with quality.

Yes.

@dblock dblock changed the title [PROPOSAL] Author replace release in each PR instead of collecting them in the last days before a release [PROPOSAL] Author release notes in each PR instead of collecting them in the last days before a release Jan 11, 2022
@dblock dblock changed the title [PROPOSAL] Author release notes in each PR instead of collecting them in the last days before a release [PROPOSAL] Author CHANGELOG in each PR instead of collecting them in the last days before a release Apr 22, 2022
@peternied
Copy link
Member

How about a format of the CHANGELOG that allows for cross checked against commits since last release? This allows maintainers or contributors to add entries as needed or independently clean up the notes. This avoids adding a barrier to contributions by adding another criteria on the PR to merge.

As part of health of the repo a tool can verify how many 'undocumented' items are not in the CHANGELOG. This can be integrated into the release process during the post-code-freeze - prerelease time it can be required that maintainers drive that number to zero. This allows for time to group similar changes and have dialogs about terminology/presentation.

@dblock
Copy link
Member Author

dblock commented Apr 22, 2022

@peternied Why would we need that if every PR must have a CHANGELOG entry?

@peternied
Copy link
Member

peternied commented Apr 22, 2022

@dblock You are right, I am not thinking of chronological account of the changes in a codebase.

I want a reference of Most important -> Least important changes since last release. Here is a contrived example of what I'd prefer vs changelog.


Release Clatu Verata Nictu

Breaking

  • Removed indexing operators because they are not thread safe (#1337)

Bug Fixes

  • Thread.current() doesn't crash when called twice in a row (#4567)

Features

  • Added new lambda expressions, map(...), reduce(...), sum(...). (#44, #45, #46, #4834)

vs

Release Clatu Verata Nictu

  • Refactor sum to use reduce instead of its own implementation (#4834)
  • Thread.current() doesn't crash when called twice in a row (#4567)
  • Removed indexing operators because they are not thread safe (#1337)
  • Added new lambda expression, reduce(...), (#46)
  • Added lambda expressions, sum(...). (#45)
  • Added new lambda expressions, map(...), (#44)

@dblock
Copy link
Member Author

dblock commented Apr 22, 2022

@peternied We would adopt a standard, such as https://keepachangelog.com/en/1.0.0/

@peternied
Copy link
Member

peternied commented Apr 22, 2022

@peternied We would adopt a standard, such as https://keepachangelog.com/en/1.0.0/

That standard format looks great, I am sold.

Types of changes

  • Added for new features.
  • Changed for changes in existing functionality.
  • Deprecated for soon-to-be removed features.
  • Removed for now removed features.
  • Fixed for any bug fixes.
  • Security in case of vulnerabilities.

@getsaurabh02
Copy link
Member

Thanks @dblock for the proposal. Another key area would be to standardize the various buckets or groups created for the release notes entries such as Enhancements, Bug fixes, Maintenance, Refactoring, Infrastructure etc. These are pretty insightful for readers to extract focussed information. Some of the recent release notes have these :

Do we expect the CHANGELOG to have these groups pre-defined, that will allow authors/reviewers to add/review the CHANGELOG entry, added under the right group, along with every PR?

@dblock
Copy link
Member Author

dblock commented Apr 26, 2022

Do we expect the CHANGELOG to have these groups pre-defined, that will allow authors/reviewers to add/review the CHANGELOG entry, added under the right group, along with every PR?

Yes, I expect those to be exactly from the list in https://keepachangelog.com/en/1.0.0/, at least to start.

@kartg
Copy link
Member

kartg commented Apr 26, 2022

Require that release notes be updated via something like danger-changelog with every change.

I'm guessing a "change" here could consist of multiple commits, as described by keepachangelog.org:

The purpose of a commit is to document a step in the evolution of the source code. Some projects clean up commits, some don't.
The purpose of a changelog entry is to document the noteworthy difference, often across multiple commits, to communicate them clearly to end users.

In the case of multiple commits that gradually build up to a full feature, @dblock is there a good mechanism we can put in place to make sure the changelog is reliably updated?

Alternatively, we could implements Conventional Commits as a precommit hook. That would give us a mechanism, but it still requires a changelog to be built and curated at release time. Thoughts?

@dblock
Copy link
Member Author

dblock commented Apr 26, 2022

@kartg I think we do simple: every PR requires a CHANGELOG update. Humans ensure quality in code reviews. Multiple commits can be merged in CHANGELOG as we make progress.

@kartg
Copy link
Member

kartg commented May 3, 2022

Works for me. In that case, we should define what makes a good Changelog entry - it shouldn't end up being just a repeat of the commit message.

peternied added a commit to peternied/security that referenced this issue May 3, 2022
Added new human readable changelog for 2.0.0-rc1 and including
unreleased commits.

The release process we've been using for release-notes has not been
producing useful results, there has been discussion in
opensearch-project/OpenSearch#1868 to come up
with a practice.  This is an attempt to start following the suggestion
and come up with feedback for the OpenSearch organization.

Signed-off-by: Peter Nied <petern@amazon.com>
@peternied
Copy link
Member

@kartg @Poojita-Raj @dblock @getsaurabh02 @elfisher @stockholmux I've opened opensearch-project/security#1821 - would love your thoughts on how this looks

@rursprung
Copy link
Contributor

@kartg I think we do simple: every PR requires a CHANGELOG update. Humans ensure quality in code reviews. Multiple commits can be merged in CHANGELOG as we make progress.

from personal experience (we use this format in-house) i can tell you that, while this is a nice idea, it does result in a lot of merge conflicts (as they're all congesting on the same lines in CHANGELOG.md). i don't have a good solution for this though.

i do however 100% vote to follow Keep a Changelog for the actual changelog! i've added some comments to the draft

@kotwanikunal
Copy link
Member

@kotwanikunal I am seeing that we have both Unreleased and 2.x sections in main and 2.x. Was that intentional? Should everything be unreleased? where do backports go?

It was supposed to be Unreleased for main, 2.x for 2.x. Backporting has led to a mess and inconsistency, with folks not adding the change to 2.x.
Currently Unreleased is the only one which has all the correct changes, and I was hoping to get main + backporting in line since that will bring the other branches to a correct state automatically with the helper.

@seraphjiang
Copy link
Member

@dblock this is how it looks without adding labels like feature/bug/enhancement/break

https://github.com/opensearch-project/dashboards-anywhere/releases/tag/v0.8.0

here is release.yml
https://github.com/opensearch-project/dashboards-anywhere/blob/main/.github/release.yml

@seraphjiang
Copy link
Member

seraphjiang commented Oct 25, 2022

@seraphjiang
Copy link
Member

this is github's public rest api to generate release note, which we could integrate with our release automation

https://docs.github.com/en/rest/releases/releases#generate-release-notes-content-for-a-release

@rursprung
Copy link
Contributor

@kotwanikunal I am seeing that we have both Unreleased and 2.x sections in main and 2.x. Was that intentional? Should everything be unreleased? where do backports go?

It was supposed to be Unreleased for main, 2.x for 2.x. Backporting has led to a mess and inconsistency, with folks not adding the change to 2.x. Currently Unreleased is the only one which has all the correct changes, and I was hoping to get main + backporting in line since that will bring the other branches to a correct state automatically with the helper.

tbh, that sounds wrong. the changelog for 2.x should only live on the 2.x branch - the main branch should only have the changelog for main (i.e. the moment 2.x is branched away the two changelogs start to diverge). otherwise you have a nightmare in maintaining the changelog (as you have now).

i know that i was the one who advocated for the currently used changelog and still think that the format is correct. i however don't think that the way it's currently being done is good (esp. after i've now hassled with it a few times when creating PRs):

  • i think it's wrong to require an entry for every commit - as i've said from the beginning, the changelog should be for the users of the software; they don't care about reformatting the code, changing test coverage, etc.
  • creating the entries with every PR leads to a ton of merge conflicts (in smaller projects this generally works fine as they have less PRs landing all the time, here it's just a merging/rebasing nightmare) - though i think if only the relevant things are added then this will also be less of an issue as there'd be much fewer entries

right now i've nearly never managed to get in a PR directly: either it failed with a false reject (lots of flaky tests around; not the subject of this issue here) and/or it ran into a merge conflict due to the changelog by the time it got an approval (or the tests were re-run). since i don't work every day (and sign all my commits with a PGP key which is of course different for work & private) the PR then has to sit around for a while (usually a week) until i get around to rebasing it again, which is a huge loss of time and might mean that it doesn't land in a specific release in the worst case.

@andrross
Copy link
Member

i think it's wrong to require an entry for every commit

I agree with this. Assuming the changelog is targeted towards users/operators of the software, listing every commit seems like a lot of noise. Even beyond reformating/test changes, if a release introduces a new feature, that feature may consist of dozens of commits but the relevant information for the changelog is probably just a single entry. As a developer, every commit is important but I'll use the tooling from git to navigate the commits as opposed to a changelog (and I probably wouldn't trust a hand-edited changelog anyway).

@rursprung
Copy link
Contributor

just as a minor clarification: when i wrote "for every commit" i actually meant "for every PR". but that doesn't change a lot as larger features still consist of multiple PRs (leading to multiple entries for one new feature instead of a single entry) and there are lots of PRs which are not relevant for users/operators.

@dblock
Copy link
Member Author

dblock commented Oct 25, 2022

@rursprung @andrross I have a skip-changelog label in https://github.com/dblock/create-a-github-issue that could be implemented here for dependabot and backports, see https://github.com/dblock/create-a-github-issue/pull/16/files. WDYT? Anyone wants to make that change here and cleanup the CHANGELOG in main and 2.x?

@dblock
Copy link
Member Author

dblock commented Oct 25, 2022

@seraphjiang Want to try to PR what you're suggesting into OpenSearch main, including removing existing functionality? Let's discuss a concrete change proposal with @rursprung @kotwanikunal and @andrross? I still think we would absolutely need a way to force some standards on PR titles though like conventional commits, too.

@andrross
Copy link
Member

andrross commented Nov 1, 2022

From keepachangelog.com:

Can changelogs be bad?

Yes. Here are a few ways they can be less than useful.
Commit log diffs
Using commit log diffs as changelogs is a bad idea: they're full of noise. Things like merge commits, commits with obscure titles, documentation changes, etc.

I think our 1:1 requirement that all PRs have a changelog entry results in us implementing this bad practice. The changelog ends up being slightly more readable than a commit log (entries are categorized) but we have no way to filter out the commits that are not useful to users. Ultimately "useful to users" is a judgment call, so I think we need a process here that allows for the developer and maintainer to make a judgment about whether to include a changelog entry for a given PR.

@ashwin-pc
Copy link
Member

Having read through as much of this thread as possible, I dont think i've seen anyone suggest this, so apologies if this was already discussed. Cant we decouple the change-log edits from the PR's responsible for the change (with some automation to make sure that tracking the two becomes easier).

  1. We add an automated label changelog to every PR that needs a changelog entry (new entry or update). It automatically creates a new issue in the repo thats assigned to the author of the PR.
  2. The new Issue also has a link to the PR, A summary of the change and other goodies to make creating a Changelog change as easy and traceable as possible.
  3. If the PR has any backport tags, the changelog issue will also have those tags called out for tracking

Pros:

  • Decouples fixes and changes from changelog (Less chances for merge conflicts)
  • Changelog is more intentional
  • Goes better with the dev flow of raising a PR and adding a changelog afterwards for the change (No need to force push a change to update just the changelog)
  • Changelog PR's are quick followups since all the tests need not run for those PR's (Markdown only PR's are wayyy quicker than other PR's)

Cons:

  • Changes and Changelog entries are in 2 separate PR's. This is mitigated by having references in both the Original PR and Changelog issue to each other.

@ashwin-pc
Copy link
Member

Another suggestion i want to drop in here is GitLabs approach that they documented in this blog post: https://about.gitlab.com/blog/2018/07/03/solving-gitlabs-changelog-conflict-crisis/

@dblock
Copy link
Member Author

dblock commented Nov 2, 2022

  1. We add an automated label changelog to every PR that needs a changelog entry (new entry or update). It automatically creates a new issue in the repo thats assigned to the author of the PR.

I think you can address the cons by adding a skip-changelog label (easily implemented in dblock/create-a-github-issue#16 for example) to label those PRs that don't need an entry.

@andrross
Copy link
Member

andrross commented Nov 4, 2022

Here's the PR (#5067) for creating the release notes for 2.4 in OpenSearch. I manually merged entries into a single "feature" entry where appropriate, which is desirable in my opinion (breaking the 1-entry-for-every-PR constraint currently enforced). We had some duplicate/incorrect entries split between the [Unreleased] and [2.x] section of the file. Overall it wasn't too bad to clean up.

That PR was on the 2.4 branch. I believe the CHANGELOG on the 2.x branch can be emptied out now (all changes from 2.x were released into 2.4). The main branch is a mess though. It contains all changes that have been released, interleaved with changes that have not been backported and will remain unreleased until 3.0. This process is not working well with our branching strategy, specifically the fact that we're committing next-major-version changes alongside next-minor-version changes that get backported. All next-minor-version changes should not have an entry on the main branch because they will never be "unreleased" from the perspective of the next-major-version.

@dblock
Copy link
Member Author

dblock commented Nov 7, 2022

@andrross Yes, I agree it's not working :( Let's keep looking at @kotwanikunal to figure out best path forward until he tells us otherwise?

@andrross
Copy link
Member

andrross commented Nov 7, 2022

@dblock @kotwanikunal

I've written up an FAQ to be included in the CHANGELOG.md file in #5092

This defines a slight change in the process. First, it specifies not every PR should include a changelog entry (and assumes we'll have a skip-changelog label. Second, it defines how the changelog should work with our branching strategy. Please take a look.

@andrross
Copy link
Member

Just an update after the 2.5 release...unfortunately the latest process is also not working great. The problems seemed to mostly arise around where should I put my CHANGELOG entry. In an ideal world, the [Unreleased 2.x] section on the main branch should be identical to the [Unreleased 2.x] section on the 2.x branch. The process to create release notes should be: move the [Unreleased 2.x] section on main to a release-notes-2.5.0.md file and then backport to the 2.x and 2.5 branches. In practice that did not work because some changes on main were put into the [Unreleased 3.0] section but then backported and put into the [Unreleased 2.x] section on the 2.x branch (this is easy for reviewers to miss because it is unintuitive you have to expand out the CHANGELOG file to even see what section the entry is in). @kotwanikunal ultimately ended up scanning through commits and manually auditing to create the release notes file.

@nknize
Copy link
Collaborator

nknize commented Jan 25, 2023

I mentioned this in a PR comment and will post it here since I haven't followed this moving target closely.

I suggest we switch the changelog to adopt a more standard approach like other projects and use the issue number instead of the PR. It would enforce having an issue for each PR, and stop this burden of pushing a changelog commit that updates the url in the changelog for each PR.

@dblock
Copy link
Member Author

dblock commented Jan 26, 2023

@nknize Not everything has an issue, but the additional commit is actually a relatively small problem (I personally don't mind doing a git commit --amend and a force push) compared to the fact that CHANGELOG in every PR creates a ton of additional work and backport merge conflicts, even if done right. It solved one problem, but created a different one.

Another idea is to adopt a different mechanism for PR titles, and settle on something that generates CHANGELOGs in every branch automatically from PR titles that can be changed after merge. Check out https://github.com/aws/aws-cdk/pulls for what that could look like.

We also need to do this times N plugins, because the distribution release notes are all over the place in terms of quality.

@nknize
Copy link
Collaborator

nknize commented Jan 26, 2023

Not everything has an issue

If it doesn't have an issue it doesn't need to be in the changelog.

@andrross
Copy link
Member

Not everything has an issue

If it doesn't have an issue it doesn't need to be in the changelog.

I'm in favor of linking issues instead of PRs, particularly since we updated that guidance that not all changes require a CHANGELOG entry and added the skip-changelog label.

It solved one problem, but created a different one.

Honestly it didn't really even solve a problem (at least for the OpenSearch repo) for this latest release as the CHANGELOG was a bit of a mess between the two branches and required a lot of manual reconciliation to create the release notes. I've been nagging folks to make sure entries go in the right place since that release but I don't think we've arrived at a sustainable solution.

@joshuarrrr
Copy link
Member

Although this proposal is already rolled out across the org, due to some of the problems it creates, we've decided to move forward with the alternative proposal opensearch-project/.github#156 instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues intended to help drive brainstorming and decision making feedback needed Issue or PR needs feedback
Projects
None yet
Development

No branches or pull requests