Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call fleet-server audit/unenroll endpoint on uninstall #5302

Merged
merged 23 commits into from
Oct 3, 2024

Conversation

michel-laterman
Copy link
Contributor

@michel-laterman michel-laterman commented Aug 14, 2024

What does this PR do?

Uninstalling a fleet-managed elastic-agent instance will now do a best-effort attempt to notify fleet-server of the agent removal so the agent may not appear as offiline in the UI.

Requires fleet-server PR elastic/fleet-server#3818 to be merged in order for integration tests to succeed.

Why is it important?

Uninstalling the agent leaves inactive (offline) entries in the UI that clutter up the list.
This is an attempt to treat these agents similarly to agents that are unenrolled.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

Disruptive User Impact

No disruptive impact, the change is a best-effort api call.

How to test this PR locally

TODO

Related issues

@michel-laterman michel-laterman added enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team labels Aug 14, 2024
Copy link
Contributor

mergify bot commented Aug 14, 2024

This pull request does not have a backport label. Could you fix it @michel-laterman? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

Uninstalling a fleet-managed elastic-agent instance will now do a
best-effort attempt to notify fleet-server of the agent removal so the
agent may not appear as offiline.
Copy link
Contributor

mergify bot commented Aug 27, 2024

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b uninstall-notify-fleet upstream/uninstall-notify-fleet
git merge upstream/main
git push upstream uninstall-notify-fleet

@ycombinator
Copy link
Contributor

@michel-laterman now that elastic/fleet-server#3818 has been merged, are you able to make progress on this PR here?

@michel-laterman michel-laterman marked this pull request as ready for review August 28, 2024 22:39
@michel-laterman michel-laterman requested a review from a team as a code owner August 28, 2024 22:39
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

internal/pkg/agent/install/uninstall.go Outdated Show resolved Hide resolved
internal/pkg/agent/install/uninstall.go Outdated Show resolved Hide resolved
Comment on lines +47 to +57
func (e *AuditUnenrollRequest) Validate() error {
if e.Timestamp.IsZero() {
return &ReqError{fmt.Errorf("request timestamp not set")}
}
switch e.Reason {
case ReasonUninstall:
default:
return &ReqError{fmt.Errorf("unsupported reason: %s", e.Reason)}
}
return nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this validation be performed on the server side ? What happens if different version of agent and fleet have different validations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also request validation done server side; currently the agent's is more restrictive (only allowing one reason)

internal/pkg/fleetapi/audit_unenroll_cmd.go Outdated Show resolved Hide resolved
mux := http.NewServeMux()
path := fmt.Sprintf(auditUnenrollPath, agentInfo.AgentID())
mux.HandleFunc(path, authHandler(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we write a 200 OK header before validating the request body with the requires ?
maybe this handler can verify the request with asserts and return a 200 if everything is ok and something else (maybe a 400) if something is not what we expect ?

Copy link
Contributor Author

@michel-laterman michel-laterman Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a dummy handler; i don't expect that that the fleetapi level we care about the response (as the uninstall function controls retries), i'll move the write header to the last step

Comment on lines +342 to +352
response, err := info.ESClient.Get(".fleet-agents", agentID, info.ESClient.Get.WithContext(ctx))
require.NoError(t, err)
defer response.Body.Close()
p, err := io.ReadAll(response.Body)
require.NoError(t, err)
require.Equalf(t, http.StatusOK, response.StatusCode, "ES status code expected 200, body: %s", p)
var res struct {
Source struct {
AuditUnenrolledReason string `json:"audit_unenrolled_reason"`
} `json:"_source"`
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there a fleet endpoint to check the audit reason for the agent? I would prefer not to query directly a fleet index from agent...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an earlier attempt I tried using info.KibanaClient.SendWithContext(ctx, http.MethodGet, "/api/fleet/agents/"+agentID, nil, nil, nil) to get the agent, however the fleet api's return information does not include the audit_unenrolled_reason attribute

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @pchila that we should avoid directly querying ES indices from Agent. I've created elastic/kibana#194884 for GET /api/fleet/agents/:id to return audit_unenrolled_reason so we can update this test in the future to use that API when it's ready.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also created #5694 to add a TODO comment here to make the switch when the Fleet API is ready.

internal/pkg/agent/install/uninstall.go Outdated Show resolved Hide resolved
internal/pkg/agent/install/uninstall.go Outdated Show resolved Hide resolved
internal/pkg/agent/install/uninstall.go Outdated Show resolved Hide resolved
internal/pkg/agent/install/uninstall.go Outdated Show resolved Hide resolved
internal/pkg/agent/install/uninstall.go Outdated Show resolved Hide resolved
internal/pkg/agent/install/uninstall.go Outdated Show resolved Hide resolved
case http.StatusOK:
pt.Describe("notify fleet-server success")
return
case http.StatusBadRequest, http.StatusUnauthorized, http.StatusConflict:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not retry on StatusConflict?

Added comment on why each of these are not retryable.

internal/pkg/agent/install/uninstall.go Outdated Show resolved Hide resolved
internal/pkg/agent/install/uninstall.go Outdated Show resolved Hide resolved
internal/pkg/agent/install/uninstall.go Outdated Show resolved Hide resolved
michel-laterman and others added 4 commits September 3, 2024 10:00
Co-authored-by: Paolo Chilà <paolo.chila@elastic.co>
Co-authored-by: Blake Rouse <blake.rouse@elastic.co>
Copy link
Contributor

mergify bot commented Sep 10, 2024

backport-v8.x has been added to help with the transition to the new branch 8.x.

Copy link
Contributor

mergify bot commented Sep 11, 2024

backport-v8.x has been added to help with the transition to the new branch 8.x.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Sep 11, 2024
@v1v v1v removed the backport-v8.x label Sep 11, 2024
}

if cfg != nil && !configuration.IsStandalone(cfg.Fleet) {
ai, err = info.NewAgentInfo(ctx, false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The context here is set as ctx := context.Background(), the context should probably be an input into the Uninstall and tied to the lifetime of the uninstall command.

If you are 30s into a retry loop for the uninstall endpoint does CTRL-C correctly cause it to terminate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 , now passing cmd.Context() to Uninstall

Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like all the review comments have been resolved, you have added the cmd.Context() in the places required to allow the ctrl-c to work.

Looks good!

@michel-laterman michel-laterman merged commit 07c2a92 into elastic:main Oct 3, 2024
14 checks passed
@michel-laterman michel-laterman deleted the uninstall-notify-fleet branch October 3, 2024 17:02
mergify bot pushed a commit that referenced this pull request Oct 3, 2024
* Call fleet-server audit/unenroll endpoint on uninstall

Uninstalling a fleet-managed elastic-agent instance will now do a
best-effort attempt to notify fleet-server of the agent removal so the
agent may not appear as offiline.

---------

Co-authored-by: Paolo Chilà <paolo.chila@elastic.co>
Co-authored-by: Blake Rouse <blake.rouse@elastic.co>
(cherry picked from commit 07c2a92)
jlind23 pushed a commit that referenced this pull request Oct 4, 2024
* Call fleet-server audit/unenroll endpoint on uninstall

Uninstalling a fleet-managed elastic-agent instance will now do a
best-effort attempt to notify fleet-server of the agent removal so the
agent may not appear as offiline.

---------

Co-authored-by: Paolo Chilà <paolo.chila@elastic.co>
Co-authored-by: Blake Rouse <blake.rouse@elastic.co>
(cherry picked from commit 07c2a92)

Co-authored-by: Michel Laterman <82832767+michel-laterman@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify backport-skip enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants