-
Notifications
You must be signed in to change notification settings - Fork 468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[target-allocator] Introduce "per node" allocation strategy to target allocator #2430
Conversation
Signed-off-by: Matej Gera <matejgera@gmail.com>
Signed-off-by: Matej Gera <matejgera@gmail.com>
Signed-off-by: Matej Gera <matejgera@gmail.com>
Signed-off-by: Matej Gera <matejgera@gmail.com>
I also noticed we are reusing a lot of code between allocation strategies. I was thinking about adjusting the abstraction (perhaps allocator does not need to be an interface, we could maybe have more specific interface that would implement only methods that differ across strategies). But I did not want to introduce too many changes at once. |
Signed-off-by: Matej Gera <matejgera@gmail.com>
Yeah, we can do that afterwards in a separate PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation looks good at first glance. Can you add a E2E test for it? I think it'd be easiest to check by collecting kubelet metrics via the following scrape config:
- job_name: kubelet
scheme: https
authorization:
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
honor_labels: true
kubernetes_sd_configs:
- role: node
metric_relabel_configs:
- action: keep
regex: "kubelet_running_pods"
source_labels: [__name__]
Have a look at the existing E2E test for Prometheus CR for reference.
Signed-off-by: Matej Gera <matejgera@gmail.com>
Signed-off-by: Matej Gera <matejgera@gmail.com>
Signed-off-by: Matej Gera <matejgera@gmail.com>
Thanks for the pointer @swiatekm-sumo, I have added an E2E test and also reorganized the tests a bit so that TA E2E test can be run separately under separate target (hopefully you don't mind 🙂). Also regarding logging unassignable jobs / targets (#1828 (comment)), I tried to make it in a way that will not lead to overly noisy logs, i.e. avoid reporting on each targets loading and only do it on diff. |
Signed-off-by: Matej Gera <matejgera@gmail.com>
01e3c23
to
acf7644
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I really like the improvement here. I have one suggestion which may help with making it more efficient, interested in hearing your thoughts. Thank you!
Is there a reason to do this? If you only want the target allocator tests, you can use kuttl's And if you really want to, you should make sure they're run in the CI as well. |
Signed-off-by: Matej Gera <matejgera@gmail.com>
Signed-off-by: Matej Gera <matejgera@gmail.com>
I think it makes sense with respect to how other test cases are organized, since we have usually different test directories for different components (OpAMP bridge, auto-instrumentation). But you're right, it was missing from the list in the action config, I fixed that now. |
Signed-off-by: Matej Gera <matejgera@gmail.com>
Signed-off-by: Matej Gera <matejgera@gmail.com>
Signed-off-by: Matej Gera <matejgera@gmail.com>
@@ -190,12 +190,8 @@ func (allocator *perNodeAllocator) handleTargets(diff diff.Changes[*target.Item] | |||
|
|||
// Check for unassigned targets | |||
if len(unassignedTargetsForJobs) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can just become an integer that's incremented on line 186, i don't think there's a need for a map, unless we care about deduping (which should happen already in the filter strategy)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if there can be a scenario where we have multiple targets per job, that cannot be assigned. But I see the point, I simplified this to just count the number of targets, that should hopefully give enough info when debugging.
Signed-off-by: Matej Gera <matejgera@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. I've left some comments around trying to simplify the bookeeping code in the strategy implementation, but I don't think any of it is necessarily blocking - if you'd like to clean it up in a following PR, I'm ok with that as well.
Hey @swiatekm-sumo, thanks for the detailed review. I'm planning to address all of it, not sure how fast I will get to it. I'm happy to do it in this PR or the next one, whatever will be easier for maintainers to review. I'm also in no rush to merge this (I'm using a fork for my needs), but I'm not sure if there are other users waiting for this feature. |
Signed-off-by: Matej Gera <matejgera@gmail.com>
Signed-off-by: Matej Gera <matejgera@gmail.com>
Hey folks,
Hopefully this is ready for the final review, thank you @swiatekm-sumo @jaronoff97 🙇 |
Signed-off-by: Matej Gera <matejgera@gmail.com>
@matej-g thank you! what was the reason for changing the metric type? that may break anyone who has existing dashboards for the TA, we should just rename it too if we do want it to be a gauge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one concern around the metric type change, otherwise this LGTM.
Hey @jaronoff97, this was based off of the suggestion in #2430 (comment) - this is a new metric we're introducing with this change, so it should not be breaking |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that's right. Looks good to me! I'll wait on @swiatekm-sumo 's review to merge. Thank you so much!
@matej-g thanks for this awesome work!! I really appreciate it. |
👏 I think it's now possible to carry out nearly all data collection in K8s from a DaemonSet, which is pretty significant. |
My pleasure, thanks you both for helping me move it over the finish line! |
Description:
Resolves #1828.
Introduces new allocation strategy "per node" that is intended to distribute targets to collectors on the basis of the node on which the targets reside, i.e. each collector will be instructed to scrape targers running on the collector's node.
This strategy is intended to be only used with a collector in
deamonset
mode (running as agent on each node). It's not suitable for other modes.Link to tracking Issue: #1828
Testing:
Documentation: Updated API documentation.