-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add serverless emergency release quality gate pipeline #186833
Add serverless emergency release quality gate pipeline #186833
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM. Since QA pipeline skips manual verification for an emergency release, the added time of running QA first before staging should be minimal as the RM won't be a bottleneck. If cp e2e test times ever start to creep up, we can reconsider.
My one question is whether we definitely want to include a 24h bake period by default for an emergency release. If it is a true emergency, I would expect us to never wait a full 24h before releasing to noncanary. But I suppose the consistency with the normal release process makes this easier to understand, and gives the RM more latitude to determine how long we want changes to sit in canary. Overall this is a question I wanted to raise -- not a concern -- so I'm fine with proceeding as-is.
Thanks for your feedback @lukeelmers!
I was thinking about the same. And I agree we will probably never use the full 24 hours in an emergency release.
This was my main reason for keeping a long bake time here. Elasticsearch is running with 1 hour bake time in production-canary for emergency releases. This allows them to be more hands-off and just let the release roll out if automated checks are passing. However, if some investigation in canary takes more time (which, to be fair, didn't happen so far), it would require stopping the release and kicking the last stage off again. So I thought we'd start a bit more defensive here even though this will require one more manual RM step (cancel the bake time early). |
I agree with your logic of being more defensive and starting with a longer bake time, with the understanding that RMs will need to be (as they already have been) very involved in any emergency release process. We can document short-circuiting the bake time as the expected course of action, with the exact bake period determined by the RM based on their judgment of the situation. We can revisit later and shorten the bake period once we get more comfortable with the new process. |
@elasticmachine merge upstream |
@elasticmachine merge upstream |
💛 Build succeeded, but was flaky
Failed CI StepsMetrics [docs]
History
To update your PR or re-run it, just comment with: cc @pheyos |
…)" This reverts commit cbedb5f.
## Summary This PR adds separately quality gate pipelines for the emergency release process. More details in the original PR #186833, which is split into the creation of the new pipeline (this PR) and moving existing pipelines from `catalog-info.yaml` to `.buildkite/pipeline-resource-definitions` (#187253).
## Summary This PR moves the definitions of the following pipelines from `catalog-info.yaml` to `.buildkite/pipeline-resource-definitions`: - `buildkite-pipeline-kibana-emergency-release` -> `.buildkite/pipeline-resource-definitions/kibana-serverless-emergency-release.yml` - `kibana-tests-pipeline` -> `.buildkite/pipeline-resource-definitions/kibana-serverless-quality-gates.yml` More details in the original PR #186833, which is split into the creation of the new pipeline (#187251) and moving existing pipelines from catalog-info.yaml to .buildkite/pipeline-resource-definitions (this PR).
Summary
This PR adds separately quality gate pipelines for the emergency release process.
This gives us the opportunity to run a different set of checks during an emergency release compared to a regular release.
Details
.buildkite/pipelines/quality-gates/emergency
. These are copies of the regular quality gates pipeline files with the following adjustments:kibana-serverless-quality-gates-emergency.yml
has an adjustedQG_PIPELINE_LOCATION
and commentpipeline.tests-qa.yaml
is reduced to just the CP e2e tests.buildkite/pipeline-resource-definitions/kibana-serverless-quality-gates-emergency.yml
is added that will trigger the emergency version of the quality gates.Other changes
In order to have things around the serverless quality gates and the emergency release consistent, I've taken the opportunity and moved the definitions of the following pipelines from
catalog-info.yaml
to.buildkite/pipeline-resource-definitions
buildkite-pipeline-kibana-emergency-release
->.buildkite/pipeline-resource-definitions/kibana-serverless-emergency-release.yml
kibana-tests-pipeline
->.buildkite/pipeline-resource-definitions/kibana-serverless-quality-gates.yml