-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CONTP-356] feat(admission server): implement ValidatingAdmissionWebhook #28512
base: main
Are you sure you want to change the base?
[CONTP-356] feat(admission server): implement ValidatingAdmissionWebhook #28512
Conversation
82bb957
to
b88e005
Compare
Regression DetectorRegression Detector ResultsRun ID: cf236f87-6a57-43d4-945f-dd7b808ba63b Metrics dashboard Target profiles Baseline: 456feb4 Performance changes are noted in the perf column of each table:
No significant changes in experiment optimization goalsConfidence level: 90.00% There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.
|
perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
---|---|---|---|---|---|---|
➖ | pycheck_lots_of_tags | % cpu utilization | +2.12 | [-0.40, +4.65] | 1 | Logs |
➖ | uds_dogstatsd_to_api_cpu | % cpu utilization | +1.63 | [+0.88, +2.39] | 1 | Logs |
➖ | otel_to_otel_logs | ingress throughput | +1.12 | [+0.31, +1.93] | 1 | Logs |
➖ | tcp_dd_logs_filter_exclude | ingress throughput | +0.00 | [-0.01, +0.01] | 1 | Logs |
➖ | uds_dogstatsd_to_api | ingress throughput | -0.00 | [-0.09, +0.08] | 1 | Logs |
➖ | file_tree | memory utilization | -0.18 | [-0.26, -0.09] | 1 | Logs |
➖ | basic_py_check | % cpu utilization | -0.28 | [-3.04, +2.48] | 1 | Logs |
➖ | tcp_syslog_to_blackhole | ingress throughput | -0.47 | [-0.53, -0.42] | 1 | Logs |
➖ | idle | memory utilization | -0.70 | [-0.74, -0.65] | 1 | Logs |
Bounds Checks
perf | experiment | bounds_check_name | replicates_passed |
---|---|---|---|
✅ | idle | memory_usage | 10/10 |
Explanation
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
b88e005
to
7572022
Compare
29a988a
to
c8ef9f9
Compare
Test changes on VMUse this command from test-infra-definitions to manually test this PR changes on a VM: inv create-vm --pipeline-id=45110908 --os-family=ubuntu Note: This applies to commit c7d9777 |
97d0dec
to
2972b9b
Compare
pkg/clusteragent/admission/controllers/webhook/controller_base.go
Outdated
Show resolved
Hide resolved
|
||
return err | ||
mutatingWebhook, err := c.mutatingWebhooksLister.Get(c.config.getWebhookName()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This err
overrides the one for the validating webhooks lister. So, if there was an error for those, we won't return it. Is this intentional?
baf7a28
to
15d0cdb
Compare
dd660ed
to
3aa0e8c
Compare
@@ -42,7 +52,7 @@ func SyncInformers(informers map[InformerName]cache.SharedInformer, extraWait ti | |||
end := time.Now() | |||
cacheSyncTimeouts.Inc() | |||
log.Warnf("couldn't sync informer %s in %v (kube_cache_sync_timeout_seconds: %v)", name, end.Sub(start), timeoutConfig) | |||
return fmt.Errorf("couldn't sync informer %s in %v", name, end.Sub(start)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason to drop the time in the error, because it's in the log above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's to allow us to handle the error more easily here:
https://github.com/DataDog/datadog-agent/pull/28512/files#diff-7a585acdf88c85cf7bf7b21828503aa376757b0927f5cd6394f8b44bbd65bebeR474-R481
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When catching the error, it should be possible to catch any SyncInformersError
(and only those types of error) what ever the time
field is.
Look at this example: https://play.golang.com/p/Zuro7uQFNBH
We’re able to catch ErrorB
without having to assume anything about the value of its fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I actually want to only catch SyncInformersError
that are specific to the ValidatingWebhooksInformer
while at the same time not having to define another error type like SyncValidatingInformersError
, as this would also complexify the code of the SyncInformers
method.
2997203
to
d3ab880
Compare
bb8edb5
to
2a2575e
Compare
Signed-off-by: Wassim DHIF <wassim.dhif@datadoghq.com>
Signed-off-by: Wassim DHIF <wassim.dhif@datadoghq.com>
Signed-off-by: Wassim DHIF <wassim.dhif@datadoghq.com>
Signed-off-by: Wassim DHIF <wassim.dhif@datadoghq.com>
Co-authored-by: Lénaïc Huard <L3n41c@users.noreply.github.com>
Signed-off-by: Wassim DHIF <wassim.dhif@datadoghq.com>
… slice Signed-off-by: Wassim DHIF <wassim.dhif@datadoghq.com>
Signed-off-by: Wassim DHIF <wassim.dhif@datadoghq.com>
Signed-off-by: Wassim DHIF <wassim.dhif@datadoghq.com>
Signed-off-by: Wassim DHIF <wassim.dhif@datadoghq.com>
Signed-off-by: Wassim DHIF <wassim.dhif@datadoghq.com>
Signed-off-by: Wassim DHIF <wassim.dhif@datadoghq.com>
2a2575e
to
c7d9777
Compare
What does this PR do?
This PR implements ValidatingAdmissionWebhook in the Cluster Agent.
Motivation
Support for
ValidatingAdmissionWebhook
is needed for future Cluster Agent features.Additional Notes
This PR also refactors heavily the Admission Controller to avoid code duplication between
ValidatingAdmissionWebhook
andMutatingAdmissionWebhook
.Describe how to test/QA your changes
QA was done by using a custom version of this PR that contains a basic Validation Webhook called
alwaysadmit
that works the same way as this default webhook. Note that I've sometime modified this webhook to actually also be analwaysdeny
webhook for testing purposes. It will not be a part of the Datadog Agent.I also used this custom branch of the Operator that give the Cluster Agent the correct RBACs to be able to create, edit and delete the Validation Admission Webhooks. This will be merged and done for the Datadog Helm Charts too.
Without the correct RBACs for
ValidatingWebhookConfigurations
The MutatingAdmissionWebhook still works as expected but the ValidatingAdmissionWebhook are disabled.
With the correct RBACs for
ValidatingWebhookConfigurations
With both ValidatingAdmissionWebhook and MutatingAdmissionWebhook disabled.
By default or with it enabled:
When
alwaysadmit
returnstrue
and log a specific message:When modifying the Cluster Agent with an
alwaysadmit
webhook that returnsfalse
and log a specific message, and re-deploying the Agent:When keeping the same behaviour but changing the endpoint of the
alwaysadmit
webhook to validate webhook updates.Before
After
Webhook still works as expected, in this case, by always denying.