Skip to content

Conversation

@swiatekm
Copy link
Contributor

@swiatekm swiatekm commented Oct 15, 2025

What does this PR do?

Conditionally remove upgrade marker when an elastic-agent upgrade is rolled back to a version that does not contain #8407
This PR will delete the upgrade marker by default (in order to maintain backward compatibility).

Why is it important?

There's a race condition if the rolled back agent starts slowly for the old (i.e. rolled back) watcher to pick up the .upgrade-marker file when the watcher that triggered the rollback has already terminated.
The old watcher code will interpret the presence of the upgrade marker as an ongoing upgrade for which it will have perform the watching of the agent for the grace period without checking agent version or upgrade state.

This may lead to elastic-agent trying to delete itself after the rollback has already been performed.

Checklist

  • I have read and understood the pull request guidelines of this project.
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • [ ] I have added an entry in ./changelog/fragments using the changelog tool
  • [ ] I have added an integration test or an E2E test

Disruptive User Impact

How to test this PR locally

See the additional unit tests in watch_test.go and rollback_test.go.
Manual testing requires an agent that is slow to restart in case of rollback, may need some custom code modifications.

Related issues

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...

@mergify
Copy link
Contributor

mergify bot commented Oct 15, 2025

This pull request does not have a backport label. Could you fix it @swiatekm? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-./d./d is the label that automatically backports to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

@swiatekm swiatekm added backport-9.2 Automated backport to the 9.2 branch skip-changelog labels Oct 15, 2025
@pchila pchila changed the title Try to fix the upgrade bug Remove upgrade marker if rolling back to versions older than 9.2.0 Oct 15, 2025
@elasticmachine
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

cc @swiatekm

@swiatekm swiatekm requested a review from pchila October 15, 2025 14:27
@pchila pchila self-assigned this Oct 15, 2025
@swiatekm swiatekm marked this pull request as ready for review October 15, 2025 14:31
@swiatekm swiatekm requested a review from a team as a code owner October 15, 2025 14:31
@pchila pchila removed their request for review October 15, 2025 14:39
@swiatekm swiatekm added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Oct 15, 2025
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@swiatekm swiatekm requested a review from blakerouse October 15, 2025 14:59
@pchila pchila merged commit 8dc3e0e into main Oct 15, 2025
23 checks passed
@pchila pchila deleted the fix/rollback-to-older-version branch October 15, 2025 15:34
mergify bot pushed a commit that referenced this pull request Oct 15, 2025
…10579)

* Add WithRemoveMarker rollback option

* Add test for watchCmd removing upgrade marker for rollbacks < 9.2.0

---------

Co-authored-by: Paolo Chila <paolo.chila@elastic.co>
(cherry picked from commit 8dc3e0e)
cmacknz pushed a commit that referenced this pull request Oct 15, 2025
…10579) (#10595)

* Add WithRemoveMarker rollback option

* Add test for watchCmd removing upgrade marker for rollbacks < 9.2.0

---------


(cherry picked from commit 8dc3e0e)

Co-authored-by: Mikołaj Świątek <mail@mikolajswiatek.com>
Co-authored-by: Paolo Chila <paolo.chila@elastic.co>
@pchila pchila mentioned this pull request Oct 20, 2025
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-9.2 Automated backport to the 9.2 branch skip-changelog Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants