-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Call fleet-server audit/unenroll endpoint on uninstall #5302
Call fleet-server audit/unenroll endpoint on uninstall #5302
Conversation
This pull request does not have a backport label. Could you fix it @michel-laterman? 🙏
NOTE: |
Uninstalling a fleet-managed elastic-agent instance will now do a best-effort attempt to notify fleet-server of the agent removal so the agent may not appear as offiline.
34c721e
to
bb4d789
Compare
This pull request is now in conflicts. Could you fix it? 🙏
|
@michel-laterman now that elastic/fleet-server#3818 has been merged, are you able to make progress on this PR here? |
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
changelog/fragments/1723675050-Call-fleet-server-audit-unenroll-endpoint-on-uninstall.yaml
Outdated
Show resolved
Hide resolved
func (e *AuditUnenrollRequest) Validate() error { | ||
if e.Timestamp.IsZero() { | ||
return &ReqError{fmt.Errorf("request timestamp not set")} | ||
} | ||
switch e.Reason { | ||
case ReasonUninstall: | ||
default: | ||
return &ReqError{fmt.Errorf("unsupported reason: %s", e.Reason)} | ||
} | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this validation be performed on the server side ? What happens if different version of agent and fleet have different validations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also request validation done server side; currently the agent's is more restrictive (only allowing one reason)
mux := http.NewServeMux() | ||
path := fmt.Sprintf(auditUnenrollPath, agentInfo.AgentID()) | ||
mux.HandleFunc(path, authHandler(func(w http.ResponseWriter, r *http.Request) { | ||
w.WriteHeader(http.StatusOK) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we write a 200 OK header before validating the request body with the require
s ?
maybe this handler can verify the request with assert
s and return a 200 if everything is ok and something else (maybe a 400) if something is not what we expect ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a dummy handler; i don't expect that that the fleetapi
level we care about the response (as the uninstall function controls retries), i'll move the write header to the last step
response, err := info.ESClient.Get(".fleet-agents", agentID, info.ESClient.Get.WithContext(ctx)) | ||
require.NoError(t, err) | ||
defer response.Body.Close() | ||
p, err := io.ReadAll(response.Body) | ||
require.NoError(t, err) | ||
require.Equalf(t, http.StatusOK, response.StatusCode, "ES status code expected 200, body: %s", p) | ||
var res struct { | ||
Source struct { | ||
AuditUnenrolledReason string `json:"audit_unenrolled_reason"` | ||
} `json:"_source"` | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there a fleet endpoint to check the audit reason for the agent? I would prefer not to query directly a fleet index from agent...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In an earlier attempt I tried using info.KibanaClient.SendWithContext(ctx, http.MethodGet, "/api/fleet/agents/"+agentID, nil, nil, nil)
to get the agent, however the fleet api's return information does not include the audit_unenrolled_reason
attribute
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @pchila that we should avoid directly querying ES indices from Agent. I've created elastic/kibana#194884 for GET /api/fleet/agents/:id
to return audit_unenrolled_reason
so we can update this test in the future to use that API when it's ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also created #5694 to add a TODO comment here to make the switch when the Fleet API is ready.
case http.StatusOK: | ||
pt.Describe("notify fleet-server success") | ||
return | ||
case http.StatusBadRequest, http.StatusUnauthorized, http.StatusConflict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not retry on StatusConflict
?
Added comment on why each of these are not retryable.
Co-authored-by: Paolo Chilà <paolo.chila@elastic.co> Co-authored-by: Blake Rouse <blake.rouse@elastic.co>
|
|
} | ||
|
||
if cfg != nil && !configuration.IsStandalone(cfg.Fleet) { | ||
ai, err = info.NewAgentInfo(ctx, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The context here is set as ctx := context.Background()
, the context should probably be an input into the Uninstall and tied to the lifetime of the uninstall command.
If you are 30s into a retry loop for the uninstall endpoint does CTRL-C correctly cause it to terminate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 , now passing cmd.Context()
to Uninstall
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like all the review comments have been resolved, you have added the cmd.Context()
in the places required to allow the ctrl-c to work.
Looks good!
* Call fleet-server audit/unenroll endpoint on uninstall Uninstalling a fleet-managed elastic-agent instance will now do a best-effort attempt to notify fleet-server of the agent removal so the agent may not appear as offiline. --------- Co-authored-by: Paolo Chilà <paolo.chila@elastic.co> Co-authored-by: Blake Rouse <blake.rouse@elastic.co> (cherry picked from commit 07c2a92)
* Call fleet-server audit/unenroll endpoint on uninstall Uninstalling a fleet-managed elastic-agent instance will now do a best-effort attempt to notify fleet-server of the agent removal so the agent may not appear as offiline. --------- Co-authored-by: Paolo Chilà <paolo.chila@elastic.co> Co-authored-by: Blake Rouse <blake.rouse@elastic.co> (cherry picked from commit 07c2a92) Co-authored-by: Michel Laterman <82832767+michel-laterman@users.noreply.github.com>
What does this PR do?
Uninstalling a fleet-managed elastic-agent instance will now do a best-effort attempt to notify fleet-server of the agent removal so the agent may not appear as offiline in the UI.
Requires fleet-server PR elastic/fleet-server#3818 to be merged in order for integration tests to succeed.
Why is it important?
Uninstalling the agent leaves inactive (offline) entries in the UI that clutter up the list.
This is an attempt to treat these agents similarly to agents that are unenrolled.
Checklist
I have made corresponding changes to the documentationI have made corresponding change to the default configuration files./changelog/fragments
using the changelog toolDisruptive User Impact
No disruptive impact, the change is a best-effort api call.
How to test this PR locally
TODO
Related issues