Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Deletes scan jobs if their pods have been deleted and logs can't be c… #956

Merged

Conversation

jw-s
Copy link
Contributor

@jw-s jw-s commented Feb 9, 2022

Fixes #873 and for other scan types which are not mentioned in this issue

The problem which is fixed in this PR is the following:

When a scan job creates the pod which handles scans, if this pod has been deleted before starboard parses the pod's logs, these jobs are never deleted which means when the max concurrent job setting is hit, part of this threshold is eaten up by jobs which never get deleted.

Signed-off-by: Joel Whittaker-Smith jdws.dev@gmail.com

@CLAassistant
Copy link

CLAassistant commented Feb 9, 2022

CLA assistant check
All committers have signed the CLA.

@jw-s jw-s force-pushed the bugfix/delete-job-when-assoc-pod-not-found branch from 84ee6b9 to 138de88 Compare February 9, 2022 10:47
@codecov
Copy link

codecov bot commented Feb 9, 2022

Codecov Report

Merging #956 (d5ffbe8) into main (42b2152) will decrease coverage by 0.46%.
The diff coverage is 7.84%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #956      +/-   ##
==========================================
- Coverage   64.73%   64.27%   -0.47%     
==========================================
  Files          59       59              
  Lines        6970     7014      +44     
==========================================
- Hits         4512     4508       -4     
- Misses       1999     2046      +47     
- Partials      459      460       +1     
Impacted Files Coverage Δ
pkg/kube/logs.go 34.54% <0.00%> (-1.31%) ⬇️
pkg/operator/controller/ciskubebenchreport.go 50.99% <0.00%> (-4.71%) ⬇️
pkg/operator/controller/configauditreport.go 62.95% <0.00%> (-4.01%) ⬇️
pkg/operator/controller/vulnerabilityreport.go 58.66% <25.00%> (-1.02%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 42b2152...d5ffbe8. Read the comment docs.

@jw-s jw-s force-pushed the bugfix/delete-job-when-assoc-pod-not-found branch 2 times, most recently from 75ad698 to ad246fc Compare February 9, 2022 18:14
@jw-s jw-s requested a review from chen-keinan February 11, 2022 09:49
@danielpacak danielpacak self-requested a review February 14, 2022 11:30
Copy link
Contributor

@danielpacak danielpacak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this enhancement @jw-s Overall the controllers code looks good to me.

The only thing I'm not comfortable with are integration tests that you wrote for "relatively rare" corner case when the operator's cache cannot catch up with K8s API Server state and parse logs.

What's more, I'm afraid that the integration tests might became flaky because we cannot reliably control Pods managed by JobController, which is part of K8s controller-manager. (See failed PR checks.)

To sum up, I'd just add unit tests whenever possible to keep integration (~ e2e) test nice and lean so we can afford running them on each PR. See this nice and detailed explanation why I think we should not validate everything at the integration tests phase https://martinfowler.com/articles/practical-test-pyramid.html.

@jw-s
Copy link
Contributor Author

jw-s commented Feb 14, 2022

Thank you for working on this enhancement @jw-s Overall the controllers code looks good to me.

The only thing I'm not comfortable with are integration tests that you wrote for "relatively rare" corner case when the operator's cache cannot catch up with K8s API Server state and parse logs.

What's more, I'm afraid that the integration tests might became flaky because we cannot reliably control Pods managed by JobController, which is part of K8s controller-manager. (See failed PR checks.)

To sum up, I'd just add unit tests whenever possible to keep integration (~ e2e) test nice and lean so we can afford running them on each PR. See this nice and detailed explanation why I think we should not validate everything at the integration tests phase martinfowler.com/articles/practical-test-pyramid.html.

Yeah I completely agree, this area is definitely flake provoking and i'm happy to remove these tests (Though the failure is not because of a flake but actually something else). Do we have an area where we can add unit tests to cover these cases? I couldn't seem to find it.

@danielpacak
Copy link
Contributor

Thank you for working on this enhancement @jw-s Overall the controllers code looks good to me.
The only thing I'm not comfortable with are integration tests that you wrote for "relatively rare" corner case when the operator's cache cannot catch up with K8s API Server state and parse logs.
What's more, I'm afraid that the integration tests might became flaky because we cannot reliably control Pods managed by JobController, which is part of K8s controller-manager. (See failed PR checks.)
To sum up, I'd just add unit tests whenever possible to keep integration (~ e2e) test nice and lean so we can afford running them on each PR. See this nice and detailed explanation why I think we should not validate everything at the integration tests phase martinfowler.com/articles/practical-test-pyramid.html.

Yeah I completely agree, this area is definitely flake provoking and i'm happy to remove these tests (Though the failure is not because of a flake but actually something else). Do we have an area where we can add unit tests to cover these cases? I couldn't seem to find it.

We'd have to add _test.go files to implement unit tests for reconcilers. So far we only have integration tests for controllers. If this requires too much effort I'm also happy to approve this PR without very strict requirement for unit testing.

@jw-s jw-s force-pushed the bugfix/delete-job-when-assoc-pod-not-found branch from ad246fc to 5e192a9 Compare February 16, 2022 12:47
jw-s added 2 commits February 16, 2022 13:48
…ollected

Signed-off-by: Joel Whittaker-Smith <jdws.dev@gmail.com>
Signed-off-by: Joel Whittaker-Smith <jdws.dev@gmail.com>
@jw-s jw-s force-pushed the bugfix/delete-job-when-assoc-pod-not-found branch from 5e192a9 to d5ffbe8 Compare February 16, 2022 12:48
@danielpacak danielpacak self-requested a review February 17, 2022 08:18
Copy link
Contributor

@danielpacak danielpacak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jw-s for the contribution. The code LGTM!

@danielpacak danielpacak merged commit 5c79162 into aquasecurity:main Feb 17, 2022
@jw-s jw-s deleted the bugfix/delete-job-when-assoc-pod-not-found branch February 17, 2022 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ignore parsing logs of scan jobs have been deleted
4 participants