Reconciler error due to unavailable secrets #1113

timbuchwaldt · 2022-04-05T09:44:11Z

What steps did you take and what happened:

We are running a pretty default starboard but continuously see errors triggered by the unavailability of secrets in our logs.
This seems to only happen for our gitlab CI jobs that are very shortlived, secrets are being auto-deleted accordingly.

What did you expect to happen:

No errors / no repeated errors on short-lived pods / deleted secrets.

Anything else you would like to add:

We see the following log repeat:

{
   "level":"error",
   "ts":1649151260.958836,
   "logger":"controller.pod",
   "msg":"Reconciler error",
   "reconciler group":"",
   "reconciler kind":"Pod",
   "name":"runner-my-secret-runner-123",
   "namespace":"gitlab-runner-legacy",
   "error":"getting secret by name: gitlab-runnerrunner-my-secret-runner-123: Secret \"runner-my-secret-runner-123-1234\" not found",
   "stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:227"
}

Environment:

Starboard version (use starboard version): 0.15.1
Kubernetes version (use kubectl version): 1.22.5
OS (macOS 10.15, Windows 10, Ubuntu 19.10 etc): Ubuntu 20.04 (server)

The text was updated successfully, but these errors were encountered:

danielpacak · 2022-04-05T09:53:54Z

Thank you for the feedback @timbuchwaldt I think this might be related to #808 , which makes me thinking about few solutions:

As suggested in Reconciler fast workload issue. #808, we could check if a Job runs for some time, but still chances are it will get deleted right after we checked its age.
Don't scan Jobs at all, but if you have long running Jobs you probably want to check them for vulnerabilities anyway.
Add exclusion logic based on label selectors:
- In 0.15 we added new environment variable to exclude certain namespaces, i.e. OPERATOR_EXCLUDE_NAMESPACES, but maybe we need more granularity to exclude GitLab Jobs and similar workloads.
- Exclude workloads from scanning #670 is where we discussed similar ideas.
- @timbuchwaldt are GitLab Jobs created in a specific namespace or they might be created in any namespace?

Please let us know if you have any other ideas!

timbuchwaldt · 2022-04-05T10:10:23Z

Oh yeah that sounds exactly the same, yes.

Yeah that is for sure not as stable, but could be better.
I think I'd want some jobs scanned, altough most are short-lived, too
Excluded namespaces sounds feasible for now, yeah! Those live in very specific namespaces without other things.

In general a more lenient failure-handling seems approriate to me, generally all pods could die before the scans are done/in between, so I think the operator should stop retrying after some time or if the pod or secrets or the like are gone.

danielpacak added the 🚀 enhancement New feature or request label Apr 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reconciler error due to unavailable secrets #1113

Reconciler error due to unavailable secrets #1113

timbuchwaldt commented Apr 5, 2022 •

edited

Loading

danielpacak commented Apr 5, 2022 •

edited

Loading

timbuchwaldt commented Apr 5, 2022

Reconciler error due to unavailable secrets #1113

Reconciler error due to unavailable secrets #1113

Comments

timbuchwaldt commented Apr 5, 2022 • edited Loading

danielpacak commented Apr 5, 2022 • edited Loading

timbuchwaldt commented Apr 5, 2022

timbuchwaldt commented Apr 5, 2022 •

edited

Loading

danielpacak commented Apr 5, 2022 •

edited

Loading