Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/4890 detect fail early upgrade #5864

Merged

Conversation

kaanyalti
Copy link
Contributor

@kaanyalti kaanyalti commented Oct 26, 2024

  • Enhancement

What does this PR do?

This PR restricts using the cli to upgrade fleet managed elastic agents. With this implementation, the only way to upgrade a fleet managed agent using the cli is only if the agent is running in privileged mode and the user provides a force flag.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • [ ] I have added an integration test or an E2E test

Disruptive User Impact

If users are upgrading fleet-managed agents using the cli for some reason, they should be aware that the upgrade command will stop working for agents running in unprivileged mode. For any fleet managed agents running in privileged mode, users will need to go through an interactive step and use the --force flag to be able to upgrade an agent.

How to test this PR locally

Setup

  • Create an ESS deployment
  • Go to fleet and click add agent
  • Create a new policy if there isn't already one
  • Build the agent locally by running DEV=true SNAPSHOT=true mage build
  • Uncompress the tar
  • If there isn't an agent on your machine already just execute the provided command on fleet ui
  • If there is already an agent on your machine add --develop flag to your installation command

Test cases

Unprivileged

  • Add --unprivileged flag to your installation command
  • Ensure that the agent is enrolled
  • Run sudo -u elastic-agent-user elastic-development-agent upgrade <VERSION>. The command should not execute.
  • Make sure that --force flag does not change the behavior

Privileged

  • Install the agent and make sure that it is enrolled
  • Run sudo elastic-development-agent upgrade <VERSION>. The command should not execute
  • Run the same command with --force flag sudo elastcit-development-agent upgrade <VERSION> --force.
  • Confirm the prompted message. Agent upgrade should start

Related issues

@kaanyalti kaanyalti requested a review from a team as a code owner October 26, 2024 19:31
Copy link
Contributor

mergify bot commented Oct 26, 2024

This pull request does not have a backport label. Could you fix it @kaanyalti? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-./d./d is the label to automatically backport to the 8./d branch. /d is the digit

Copy link
Contributor

mergify bot commented Oct 26, 2024

backport-v8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Oct 26, 2024
@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Oct 28, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@kaanyalti kaanyalti force-pushed the feature/4890_detect_fail_early_upgrade branch from b8ea9bb to b3c85b0 Compare October 28, 2024 08:54
internal/pkg/agent/cmd/upgrade.go Outdated Show resolved Hide resolved
internal/pkg/agent/cmd/upgrade.go Outdated Show resolved Hide resolved
internal/pkg/agent/cmd/upgrade_test.go Outdated Show resolved Hide resolved
internal/pkg/agent/cmd/upgrade.go Outdated Show resolved Hide resolved
@kaanyalti kaanyalti force-pushed the feature/4890_detect_fail_early_upgrade branch from bcabfba to 022c342 Compare October 28, 2024 13:46
@kaanyalti kaanyalti force-pushed the feature/4890_detect_fail_early_upgrade branch from 69a78a0 to f6ed834 Compare October 29, 2024 10:41
@kaanyalti kaanyalti requested a review from cmacknz October 29, 2024 10:43
internal/pkg/agent/cmd/upgrade.go Outdated Show resolved Hide resolved
internal/pkg/agent/cmd/upgrade.go Outdated Show resolved Hide resolved
@kaanyalti kaanyalti force-pushed the feature/4890_detect_fail_early_upgrade branch from 8a76882 to 621c3fb Compare October 31, 2024 10:10
@kaanyalti kaanyalti requested a review from pchila October 31, 2024 15:06
@cmacknz
Copy link
Member

cmacknz commented Nov 1, 2024

I tried to manually test that upgrading as a non-root user fails, I did the following:

  1. AGENT_PACKAGE_VERSION=1.0.0 EXTERNAL=true SNAPSHOT=true PACKAGES=targz PLATFORMS=darwin/arm64 mage package to force the agent version to 1.0.0 since in Fleet if you are 9.0 you can't upgrade again to 9.0 or a lower version.
  2. Install a development, unprivileged agent:
sudo ./elastic-agent install -f --unprivileged --develop
Unprivileged installation mode enabled; this feature is currently in beta.
Installing into development namespace; this is an experimental and currently unsupported feature.
[=== ] Service Started  [6s] Elastic Agent successfully installed, starting enrollment.
[=== ] Done  [6s]
Elastic Agent has been successfully installed.
  1. Check the version:
sudo -u elastic-agent-user elastic-development-agent version
Binary: 1.0.0-SNAPSHOT (build: 269031487ee30d3ec60058c1743e01eedd21dd2b at 2024-11-01 18:23:57 +0000 UTC)
Daemon: 1.0.0-SNAPSHOT (build: 269031487ee30d3ec60058c1743e01eedd21dd2b at 2024-11-01 18:23:57 +0000 UTC)
  1. Upgrade to 8.15.3 and it lets me do it:
sudo -u elastic-agent-user elastic-development-agent upgrade 8.15.3
Upgrade triggered to version 8.15.3, Elastic Agent is currently restarting

sudo -u elastic-agent-user elastic-development-agent version
Binary: 8.15.3 (build: 61975895b1409449db21ddca0405e7b71bfc1c46 at 2024-10-08 14:25:20 +0000 UTC)
Daemon: 8.15.3 (build: 61975895b1409449db21ddca0405e7b71bfc1c46 at 2024-10-08 14:25:20 +0000 UTC)

I was expecting to be stopped, but maybe this is a bug in how we detect we are root when doing sudo -u elastic-agent-user which should run the command as a non-zero UID.

@cmacknz
Copy link
Member

cmacknz commented Nov 1, 2024

OK, if I enroll in Fleet it works I see what has happened:

❯ sudo -u elastic-agent-user elastic-development-agent upgrade 8.15.3
Error: aborting upgrade: upgrade command needs to be executed as root for fleet managed agents
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/9.0/fleet-troubleshooting.html

@cmacknz
Copy link
Member

cmacknz commented Nov 1, 2024

We only require being root to use the CLI with --force if enrolled with Fleet, the standalone check comes first.

We could require being root to upgrade an unprivileged agent, it makes sense to me as a system management action, but it would be a breaking change for it today.

It does appear that the escape hatch for upgrades doesn't work on a Mac though for unprivileged agents:

❯ sudo -u elastic-agent-user elastic-development-agent upgrade 8.15.3 --force
Error: aborting upgrade: upgrade command needs to be executed as root for fleet managed agents
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/9.0/fleet-troubleshooting.html
~/Downloads/elastic-agent-9.0.0-SNAPSHOT-darwin-aarch64 ·································· 02:58:47 PM
❯ sudo elastic-development-agent upgrade 8.15.3 --force
Error: failed to retrieve agent info while tring to upgrade the agent: could not get agent info from store: fail to read configuration /Library/Elastic/Agent-Development/fleet.enc for the agent: fail to decode bytes: cipher: message authentication failed
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/9.0/fleet-troubleshooting.html

I can't upgrade as root or as elastic-agent-user. I can't upgrade at all through the CLI when installed as a unprivileged agent on a Mac and enrolled in Fleet.

Edit: I'm using 9.0.0 know but it's the same commit from this PR being tested:

sudo elastic-development-agent version
Binary: 9.0.0-SNAPSHOT (build: 269031487ee30d3ec60058c1743e01eedd21dd2b at 2024-11-01 18:11:36 +0000 UTC)
Daemon: 9.0.0-SNAPSHOT (build: 269031487ee30d3ec60058c1743e01eedd21dd2b at 2024-11-01 18:11:36 +0000 UTC)

@kaanyalti
Copy link
Contributor Author

I can't upgrade as root or as elastic-agent-user. I can't upgrade at all through the CLI when installed as a unprivileged agent on a Mac and enrolled in Fleet.

Looking into this right away

kaanyalti and others added 22 commits November 7, 2024 17:23
Co-authored-by: Paolo Chilà <paolo.chila@elastic.co>
Co-authored-by: Paolo Chilà <paolo.chila@elastic.co>
Co-authored-by: Craig MacKenzie <craig.mackenzie@elastic.co>
- updated agentinfo proto
- updated relevant generated code
- implemented state call in the upgrade cmd
Co-authored-by: Andrzej Stencel <andrzej.stencel@elastic.co>
@kaanyalti kaanyalti force-pushed the feature/4890_detect_fail_early_upgrade branch from 33f9e17 to 3de964e Compare November 7, 2024 14:23
@kaanyalti kaanyalti merged commit 8579474 into elastic:main Nov 8, 2024
14 checks passed
mergify bot pushed a commit that referenced this pull request Nov 8, 2024
* feature(4890): added shouldUpgrade function in the upgrade cli file

* feature(4890): added shouldUpgrade check into the upgrade command

* feature(4890): ran gofmt

* feature(4890): added a "force" flag, marked it as hidden

* feature(4890): removed dpkg, rpm and container logic

* feature(4890): ran gofmt

* feature(4890): updated the function signature of the upgrade command, updated tests, added new tests

* feature(4890): update comments

* feature(4890): added changelog fragment

* feature(4890): added fatal log in case there is an error while marking force flag as hidden

* feature(4890): added error checks in tests

* feature(4890): updated the summary in the changelog fragment

* feature(4890): removed the shorthand flag for the force flag

* feature(4890): updated synchronization in the tests

* Update internal/pkg/agent/cmd/upgrade_test.go

Co-authored-by: Paolo Chilà <paolo.chila@elastic.co>

* feature(4890): using streams err output instead of defaulting to stderr

* feature(4890): use EXPECT instead of On

* feature(4890): moved unconfirmed upgrade error to a package var

* feature(4890): removed confirmation from upgrade check for when force flag is set

* Update internal/pkg/agent/cmd/upgrade.go

Co-authored-by: Paolo Chilà <paolo.chila@elastic.co>

* Update internal/pkg/agent/cmd/upgrade.go

Co-authored-by: Paolo Chilà <paolo.chila@elastic.co>

* feature(4890): fix errors

* Update internal/pkg/agent/cmd/upgrade.go

Co-authored-by: Craig MacKenzie <craig.mackenzie@elastic.co>

* feature(4890): update test

* fearure(4890): replace ageninfo with state call

- updated agentinfo proto
- updated relevant generated code
- implemented state call in the upgrade cmd

* feature(4890): updated proto, client and server implementation

* feature(4890): fix struct tag

* feature(4890): added skip-verify checks

* feature(4890): ran addLicenseHeaders

* feature(4890): ran mage clean

* feature(4890): fix typo

* feature(4890): added timeout to connection

* feature(4890): changed condition check order

* feature(4890): fix unit tests

* feature(4890): refactored tests, using mock client

* Update internal/pkg/agent/cmd/upgrade.go

Co-authored-by: Andrzej Stencel <andrzej.stencel@elastic.co>

* feature(4890): use lower case "f" in error messages to be more consistent

* feature(4890): remove duplicate line

* feature(4890): ran mage controlProto with correct protoc version

---------

Co-authored-by: Paolo Chilà <paolo.chila@elastic.co>
Co-authored-by: Craig MacKenzie <craig.mackenzie@elastic.co>
Co-authored-by: Andrzej Stencel <andrzej.stencel@elastic.co>
(cherry picked from commit 8579474)
kaanyalti added a commit that referenced this pull request Nov 13, 2024
* feature(4890): added shouldUpgrade function in the upgrade cli file

* feature(4890): added shouldUpgrade check into the upgrade command

* feature(4890): ran gofmt

* feature(4890): added a "force" flag, marked it as hidden

* feature(4890): removed dpkg, rpm and container logic

* feature(4890): ran gofmt

* feature(4890): updated the function signature of the upgrade command, updated tests, added new tests

* feature(4890): update comments

* feature(4890): added changelog fragment

* feature(4890): added fatal log in case there is an error while marking force flag as hidden

* feature(4890): added error checks in tests

* feature(4890): updated the summary in the changelog fragment

* feature(4890): removed the shorthand flag for the force flag

* feature(4890): updated synchronization in the tests

* Update internal/pkg/agent/cmd/upgrade_test.go

Co-authored-by: Paolo Chilà <paolo.chila@elastic.co>

* feature(4890): using streams err output instead of defaulting to stderr

* feature(4890): use EXPECT instead of On

* feature(4890): moved unconfirmed upgrade error to a package var

* feature(4890): removed confirmation from upgrade check for when force flag is set

* Update internal/pkg/agent/cmd/upgrade.go

Co-authored-by: Paolo Chilà <paolo.chila@elastic.co>

* Update internal/pkg/agent/cmd/upgrade.go

Co-authored-by: Paolo Chilà <paolo.chila@elastic.co>

* feature(4890): fix errors

* Update internal/pkg/agent/cmd/upgrade.go

Co-authored-by: Craig MacKenzie <craig.mackenzie@elastic.co>

* feature(4890): update test

* fearure(4890): replace ageninfo with state call

- updated agentinfo proto
- updated relevant generated code
- implemented state call in the upgrade cmd

* feature(4890): updated proto, client and server implementation

* feature(4890): fix struct tag

* feature(4890): added skip-verify checks

* feature(4890): ran addLicenseHeaders

* feature(4890): ran mage clean

* feature(4890): fix typo

* feature(4890): added timeout to connection

* feature(4890): changed condition check order

* feature(4890): fix unit tests

* feature(4890): refactored tests, using mock client

* Update internal/pkg/agent/cmd/upgrade.go

Co-authored-by: Andrzej Stencel <andrzej.stencel@elastic.co>

* feature(4890): use lower case "f" in error messages to be more consistent

* feature(4890): remove duplicate line

* feature(4890): ran mage controlProto with correct protoc version

---------

Co-authored-by: Paolo Chilà <paolo.chila@elastic.co>
Co-authored-by: Craig MacKenzie <craig.mackenzie@elastic.co>
Co-authored-by: Andrzej Stencel <andrzej.stencel@elastic.co>
(cherry picked from commit 8579474)

Co-authored-by: Kaan Yalti <kaan.yalti@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Detect and fail early if user attempts to upgrade Agent using the CLI in unsupported scenarios
7 participants