Description
Summary
This issue concerns implementing a new process and tooling for generating the release notes for an Elasticsearch release. Instead of pulling data from GitHub directly, each PR will add a file to the repository that contains all the required data.
- Document in the team repo:
- The new process for generating the notes
- Changes to the version bump / release day processes
- The changelog file format
- Build new tooling to verify the changelog files and generate the outputs
Background
Our current process for generating the release notes / changelog for Elasticsearch relies on pulling information directly from GitHub. There are issues with this approach:
- If a change is backed out of a release branch but we forget to remove a label from the original PR, then the release notes can include a change that was actually removed
- There's no opportunity to review the associated release notes, release highlights or breaking changes description while reviewing the code changes in a PR
- PRs can be missed from the release notes if they are merged after the release notes are generated and we don't back back to add the PR
Some years ago, we experimented with including a changelog file in the ES repository, which was updated with each PR. This approach was quickly abandoned due to the number of merge conflicts that it generated.
Instead, we now propose that each release branch has a dedicated directory (for example changelog
but it doesn't have to be that) which is populated with a file per PR and contains all the information necessary to generate the release notes. This needs to use a structured format so that it can easily processed with tools.
File format
Each changelog file must contain all the information required to generate the notes. This probably includes, but is not limited to, the following.
- PR number
- Associated issues numbers
- Type of change e.g
enhancement
,feature
,bugfix
etc - The change area e.g
Core/Features
orSearch/Mapping
- Whether it is a breaking change
- One-line summary of the change
We could simply have a labels
field that mirrors the PR's GitHub labels. However, the current process uses the GitHub labels directly, which means the ES release point person often has to decide what area and change type to select for a number of PRs, where they are labelled for multiple areas and change types. It would be better to move this burden to PR authors, who are better placed to make these decisions.
The obvious file formats are JSON or YAML. YAML has the advantage of being easier for humans to read and edit.
We should enforce the existence and validity of changelog entries for PRs whose labels do not exempt them from the changelog (e.g. test fixes, build-related work).
A prototype generator exists, written in Python. We should rewrite this in Java as a Gradle task, so that any team member can update it. This task must carefully verify the input files to ensure that all requires fields are present and there are no typos in the key names.
Open questions
- How should we handle backports? Should the changelog entries go only to the earliest release branch? Or should each changelog file contain the list of release at which the PR was targeted, so that the generation process can omit changelog entries that have already been released?
- When should the process for generating the asciidoc documents from the changelog files be run?
- We could, for instance, check in the
precommit
task that running the generation step results in no changes in the current checkout. This would require whoever adds a changelog file to also run the generation step and commit the result, but this will result in merge conflicts in the generated outputs, leaving us in the same position as having a single changelog file. - We could run the generation step as part of the unified build. This would mean that the build was committing to the repo, and therefore changing the commit hash that is released.
- The unified build could regenerate the release notes and check they are unmodified - if they are, the build would fail and someone would need to regenerate and commit the files.
- Note that the prototype generator appears to delete changelog files after generating the release notes. This would make it impossible to re-run the generator e.g. on successive build candidates. We should instead remove all changelog files from a branch once further commits for a given version are impossible. This applies to the development branch(es) after a release branch is taken, and to release branches once a release has happens and the git tag is pushed.
- We could, for instance, check in the
Possible file example
pr: 63899
issues: [63055]
area: Machine Learning
type: enhancement
summary: Add new flag `exclude_generated` that removes generated fields in GET config APIs
version: [7.11.0, 8.0.0]
highlight:
notable: true
title: New API flag `exclude_generated` when fetching ML configs
body: |
When exporting and cloning ML configurations in a cluster it can be
frustrating to remove all the fields that were generated by
the plugin. Especially as the number of these fields change
from version to version.
This flag, `exclude_generated`, allows the GET config APIs to return
configurations with these generated fields removed.
breaking:
area: ML API changes
title: `for_export` parameter changed to `exclude_generated`
notable: true
anchor: get_config_param_exclude_generated
body: |
Some descriptive text about the change of parameter name. Blah blah
blah, you know, for release notes.