Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added skip audit/unenroll flag to uninstall command #6206

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

Rohit-code14
Copy link

@Rohit-code14 Rohit-code14 commented Dec 4, 2024

  • Enhancement

What does this PR do?

Added a flag to skip fleet audit/unenroll while uninstalling elastic-agent.

Why is it important?

While uninstalling elastic-agent it tries to notify fleet server about the uninstall. But in somecases users might know that the fleet server is unreachable and this notification logs multiple failures continuously. Therefore having an option to skip this would enhance the end user experience.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • [ ] I have added an integration test or an E2E test

Disruptive User Impact

No disruptive user impact

How to test this PR locally

  • Setup ES and Kibana.
  • Proceed with Fleetserver installation.
  • Build the agent locally by running SNAPSHOT=true PLATFORMS=linux/arm64 PACKAGES=tar.gz mage -v package
  • Extract the agent tar.
  • Go to Kibana -> Fleet page -> Add Agent gives installation command. Use the same to install elastic-agent.
  • After successful installation, Check in fleet page for the installed agent.
  • Kill the fleet servers process.
  • Try uninstalling elastic-agent using sudo elastic-agent uninstall
  • Now as the fleet server is unreachable, the notify calls will fail. You will get the error logs like below
    [== ] notify Fleet: network error: fail to notify audit/unenroll on fleet-server: all hosts failed: requester 0/1 to host https://192.168.1.10:8220/ errored: Post "https://192.168.1.10:8220/api/fleet/agents/a4888371-ef7b-4cc6-8df7-cc92b6c25990/audit/unenroll?": dial tcp 192.168.1.10:8220: connect:
  • Now restart the fleet server and install the elastic agent again.
  • After successful installation, kill the fleet server.
  • Now try uninstalling elastic-agent using sudo elastic-agent uninstall --skip-fleet-audit. The audit/unenroll call will be skipped and agent uninstall will happen without any errors.

Related issues

Copy link
Contributor

mergify bot commented Dec 4, 2024

This pull request does not have a backport label. Could you fix it @Rohit-code14? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-./d./d is the label to automatically backport to the 8./d branch. /d is the digit

Copy link
Contributor

mergify bot commented Dec 4, 2024

backport-v8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Dec 4, 2024
@Rohit-code14 Rohit-code14 marked this pull request as ready for review December 4, 2024 03:59
@Rohit-code14 Rohit-code14 requested a review from a team as a code owner December 4, 2024 03:59
@pkoutsovasilis
Copy link
Contributor

/test

Copy link
Contributor

@kaanyalti kaanyalti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rohit-code14 Could you please add tests relevant to what you are doing here? One suggestion is to inject notifyFleetAuditUninstall function. You can create a function type and create a mock with it using mockery which you can use in your test.

@Rohit-code14
Copy link
Author

@Rohit-code14 Could you please add tests relevant to what you are doing here? One suggestion is to inject notifyFleetAuditUninstall function. You can create a function type and create a mock with it using mockery which you can use in your test.

Hi @kaanyalti , Even if we inject notifyFleetAuditUninstall into Uninstall and try mocking the same to test it. To test it we need to directly invoke Uninstall function by passing the mock notify function during the test run. Which is not hard to do. I think we can refactor the pre-validations that are done to before invoking notifyFleetAuditUninstall function into a separate function and have a test case that mocks the new refactored function.
Something like this

func notifyFleetIfNeeded(ctx context.Context, log *logp.Logger, pt *progressbar.ProgressBar, cfg *configuration.Configuration, ai *info.AgentInfo, skipFleetAudit bool, notifyFleetAuditUninstall NotifyFleetAuditUninstall) {
// Once the root-cause is identified then this can be re-enabled on Windows.
if ai != nil && cfg != nil && !skipFleetAudit && runtime.GOOS != "windows" {
notifyFleetAuditUninstall(ctx, log, pt, cfg, ai) //nolint:errcheck // ignore the error as we can't act on it)
}
}

Please share your opinion on this

@kaanyalti
Copy link
Contributor

@Rohit-code14 I am having a hard time visualizing your suggestion based on the example you provided. Can you either implement your suggestion or provide a more concrete example so it is clearer?

@Rohit-code14
Copy link
Author

@Rohit-code14 I am having a hard time visualizing your suggestion based on the example you provided. Can you either implement your suggestion or provide a more concrete example so it is clearer?

@kaanyalti I have added changes and testcase, kindly review the same.

Copy link
Contributor

@kaanyalti kaanyalti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested this on mac and linux, works well. Changes look good to me.

@pkoutsovasilis
Copy link
Contributor

/test

@Rohit-code14
Copy link
Author

Hi @kaanyalti , I cannot see the reasons for these failures, Can you please help me with this?

@pkoutsovasilis
Copy link
Contributor

Hi @kaanyalti , I cannot see the reasons for these failures, Can you please help me with this?

Hey @Rohit-code14 this is the failure 🙂

log_level_test.go:140:
--
  | Error Trace:	/opt/buildkite-agent/builds/bk-agent-prod-gcp-1734443215713220624/elastic/elastic-agent-extended-testing-bk/testing/integration/log_level_test.go:140
  | /opt/buildkite-agent/builds/bk-agent-prod-gcp-1734443215713220624/elastic/elastic-agent-extended-testing-bk/testing/integration/log_level_test.go:76
  | Error:      	Condition never satisfied
  | Test:       	TestSetLogLevelFleetManaged
  | Messages:   	agent never communicated agent-specific log level "debug" to Fleet

@Rohit-code14
Copy link
Author

Hi @kaanyalti , I cannot see the reasons for these failures, Can you please help me with this?

Hey @Rohit-code14 this is the failure 🙂

log_level_test.go:140:
--
  | Error Trace:	/opt/buildkite-agent/builds/bk-agent-prod-gcp-1734443215713220624/elastic/elastic-agent-extended-testing-bk/testing/integration/log_level_test.go:140
  | /opt/buildkite-agent/builds/bk-agent-prod-gcp-1734443215713220624/elastic/elastic-agent-extended-testing-bk/testing/integration/log_level_test.go:76
  | Error:      	Condition never satisfied
  | Test:       	TestSetLogLevelFleetManaged
  | Messages:   	agent never communicated agent-specific log level "debug" to Fleet

Hey, I don't think its due to my changes. What should i do now🤔

@pkoutsovasilis
Copy link
Contributor

Hey, I don't think its due to my changes. What should i do now🤔

I don’t believe this is caused by your changes either, but let me double-check with the team and get back to you. 🙂

@Rohit-code14
Copy link
Author

Hey, I don't think its due to my changes. What should i do now🤔

I don’t believe this is caused by your changes either, but let me double-check with the team and get back to you. 🙂

Thanks!
One small query..we should / shouldn't pull changes from main branch after raising PR. As its triggering the workflow actions everytime i pull changes. I would usually pull main branch changes but not sure on what is followed here. Sorry it may be a dumb question, new to opensource contribution.

@ycombinator
Copy link
Contributor

One small query..we should / shouldn't pull changes from main branch after raising PR.

It's acceptable to merge / rebase main into PRs, particularly when CI is failing due to reasons unrelated to the changes in your PR. There's a chance the failure is also happening on main and, if it has been fixed there, it would make sense to pull that fix into your PR's branch.

@kaanyalti
Copy link
Contributor

@Rohit-code14 Could you please resolve the merge conflicts?

@Rohit-code14
Copy link
Author

@Rohit-code14 Could you please resolve the merge conflicts?

Done!

@pkoutsovasilis
Copy link
Contributor

/test

@Rohit-code14
Copy link
Author

Quality Gate failed Quality Gate failed

Failed conditions 12.5% Coverage on New Code (required ≥ 40%)

See analysis details on SonarQube

Should anything be done from my side for this?

@kaanyalti
Copy link
Contributor

@Rohit-code14 I'm going to look into why sonarqube is complaining, you may have to increase code coverage, although right now what you have implemented looks good to me.

@Rohit-code14
Copy link
Author

@Rohit-code14 I'm going to look into why sonarqube is complaining, you may have to increase code coverage, although right now what you have implemented looks good to me.

Okay let me know once you have checked the it.

@Rohit-code14
Copy link
Author

Rohit-code14 commented Dec 23, 2024

@kaanyalti @pkoutsovasilis

Please trigger tests for this.

@pkoutsovasilis
Copy link
Contributor

/test

@Rohit-code14
Copy link
Author

@pkoutsovasilis sorry to ping you again. One of the buildkite testcase failed. I have pulled changes from main. So please retrigger the tests once. And also the sonarqube coverage test failed once again how can we proceed with it?

@pkoutsovasilis
Copy link
Contributor

@pkoutsovasilis sorry to ping you again. One of the buildkite testcase failed. I have pulled changes from main. So please retrigger the tests once. And also the sonarqube coverage test failed once again how can we proceed with it?

No worries at all, and sorry for this getting stuck. We are currently discussing with the team to make the SonarQube step a non-required one, so hopefully, once that change is in place, your PR will be merged smoothly. I’ll retrigger the tests now. 🙂

@pkoutsovasilis
Copy link
Contributor

/test

Copy link

Quality Gate failed Quality Gate failed

Failed conditions
12.5% Coverage on New Code (required ≥ 40%)

See analysis details on SonarQube

@Rohit-code14
Copy link
Author

Rohit-code14 commented Dec 24, 2024

@pkoutsovasilis sorry to ping you again. One of the buildkite testcase failed. I have pulled changes from main. So please retrigger the tests once. And also the sonarqube coverage test failed once again how can we proceed with it?

No worries at all, and sorry for this getting stuck. We are currently discussing with the team to make the SonarQube step a non-required one, so hopefully, once that change is in place, your PR will be merged smoothly. I’ll retrigger the tests now. 🙂

Thanks @pkoutsovasilis . I am very much excited as it's my first PR. Looking forward on merging this.

Copy link
Contributor

@michalpristas michalpristas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for addressing changes in test code, looks much better now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add skip audit/unenroll flag to uninstall command
5 participants