-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RC][Fleet] Change Updater to use newly implemented RC CDN client #29219
base: main
Are you sure you want to change the base?
Conversation
Regression DetectorRegression Detector ResultsRun ID: 4fb1a34a-aaf9-43c1-98fd-0ee0b9f81ac5 Metrics dashboard Target profiles Baseline: 60d4126 Performance changes are noted in the perf column of each table:
No significant changes in experiment optimization goalsConfidence level: 90.00% There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.
|
perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
---|---|---|---|---|---|---|
➖ | pycheck_lots_of_tags | % cpu utilization | +1.02 | [-1.58, +3.62] | 1 | Logs |
➖ | basic_py_check | % cpu utilization | +0.56 | [-2.27, +3.40] | 1 | Logs |
➖ | uds_dogstatsd_to_api_cpu | % cpu utilization | +0.19 | [-0.57, +0.95] | 1 | Logs |
➖ | tcp_dd_logs_filter_exclude | ingress throughput | -0.00 | [-0.01, +0.01] | 1 | Logs |
➖ | uds_dogstatsd_to_api | ingress throughput | -0.00 | [-0.02, +0.02] | 1 | Logs |
➖ | idle | memory utilization | -0.13 | [-0.17, -0.09] | 1 | Logs |
➖ | otel_to_otel_logs | ingress throughput | -0.18 | [-1.01, +0.65] | 1 | Logs |
➖ | tcp_syslog_to_blackhole | ingress throughput | -0.54 | [-0.60, -0.47] | 1 | Logs |
➖ | file_tree | memory utilization | -0.79 | [-0.91, -0.68] | 1 | Logs |
Bounds Checks
perf | experiment | bounds_check_name | replicates_passed |
---|---|---|---|
✅ | idle | memory_usage | 10/10 |
Explanation
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
accd42b
to
0afc917
Compare
Test changes on VMUse this command from test-infra-definitions to manually test this PR changes on a VM: inv create-vm --pipeline-id=45052760 --os-family=ubuntu Note: This applies to commit dbaae27 |
33f012a
to
286e008
Compare
rcType: rcType, | ||
uptane: uptaneClient, | ||
db: db, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something still feels a little awkward here even with the revert of that one refactor. For both the CoreAgentService and the HTTPClient the uptaneClient
in the Service is the same as the client used in the containing struct. I think I see the abstraction layer we are trying to draw, but the fact that the identical reference to the client is in Service -and- CoreAgentService feels like maybe the boundaries aren't quite right. Maybe the methods of Service could accept an uptaneClient as a parameter?
pkg/config/remote/service/service.go
Outdated
return &HTTPClient{ | ||
Service: Service{ | ||
rcType: "CDN", | ||
uptane: uptaneHTTPClient, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here as with the Core Agent Client. Both uptane's are the same.
return c.transactionalStore.commit() | ||
} | ||
|
||
// update updates the uptane client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought: The fact that this is effectively identical makes me wonder if we can slightly restructure the abstraction so that we don't have to have two distinct types. The uptane check process should be identical in both cases. The only thing different is the remote store / local store mechanisms. I wonder if we can take how we populate the remote store for the RC backend and pull it out of our current abstraction so we don't have to make two types just to handle that specific case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't wait to try it out end to end! Maybe we should wait for the test before merging that though. We can do it with a dev version.
Go Package Import DifferencesBaseline: 60d4126
|
a97fd26
to
2313bdb
Compare
err = c.update(ctx) | ||
if err != nil { | ||
span.SetTag("cache_update_error", true) | ||
_ = log.Warn(fmt.Sprintf("Error updating CDN config repo: %v", err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's error out here, if it's a real TUF issue (like expired timestamp or whatnot) we'll have no config and evict everything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked offline. Takeaway is that:
- this will stay the same, and continue to mimic the behavior of the core agent cache update behavior (i.e. "rollback" to most recent stable cache state in the event of an update failure)
- there is a separate issue that will be addressed separately from this PR, where the installer daemon and cmd both have different caches, meaning if one succeeds and the following fails to do updates, an operation could have unexpected behaviors that wouldn't be present if a single client were performing all the operations.
What does this PR do?
Adds an implementation for an RC CDN HTTPS client that can fetch configs from the RC CDN, and hooks up the updater to it when fetching config layers from the AGENT_CONFIG RC product.
Motivation
Improve installer resiliency by fetching RC data from the CDN.
Additional Notes
For reviewers:
config
repo and one for thedirector
repo. Both are updated as part of an overarching update process.Possible Drawbacks / Trade-offs
Describe how to test/QA your changes
There should be no functional changes. To test, ensure that the installer is able to run and fetch AGENT_CONFIG configs from the CDN instead of the RC backend