-
Notifications
You must be signed in to change notification settings - Fork 3.4k
fix(workflow/sync): use RWMutex to prevent concurrent map access #14321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(workflow/sync): use RWMutex to prevent concurrent map access #14321
Conversation
This resolves a fatal "concurrent map iteration and map write" error by synchronizing reads and writes to syncLockMap with a read-write mutex. This change has been running in our environment for 7 days (8,764 workflows) without reoccurrence. Issue: argoprojGH-14300 Signed-off-by: Ryan Currah <ryan@currah.ca>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR addresses a fatal "concurrent map iteration and map write" error by switching from a basic mutex to a read-write mutex in order to better synchronize access when workflows are checked, released, or released all at once.
- The Manager struct now uses a sync.RWMutex.
- CheckWorkflowExistence now acquires a read lock.
- The Release and ReleaseAll methods have been updated to use read locks.
Comments suppressed due to low confidence (1)
workflow/sync/sync_manager.go:342
- If the ReleaseAll method modifies the syncLockMap or its contents, consider using a write lock to ensure safe concurrent modifications.
sm.lock.RLock()
|
/cherry-pick release-3.6 |
…oproj#14321) Signed-off-by: Ryan Currah <ryan@currah.ca>
…abilities fixes (Cr 28355) (#358) * fix: bump deps for k8schain to fix ecr-login (argoproj#14008) (release-3.6 cherry-pick) (argoproj#14174) * fix(ci): python sdk release process (release-3.6) (argoproj#14183) Signed-off-by: Alan Clucas <alan@clucas.org> * docs: clarify qps/burst on controller (cherry-pick argoproj#14190) (argoproj#14192) Signed-off-by: Tim Collins <tim@thecollins.team> Co-authored-by: Tim Collins <45351296+tico24@users.noreply.github.com> * fix(api/jsonschema): use unchanging JSON Schema version (cherry-pick argoproj#14092) (argoproj#14256) Signed-off-by: Roger Peppe <rogpeppe@gmail.com> Co-authored-by: Roger Peppe <rogpeppe@gmail.com> * fix(api/jsonschema): use working `$id` (cherry-pick argoproj#14257) (argoproj#14258) Signed-off-by: Roger Peppe <rogpeppe@gmail.com> Co-authored-by: Roger Peppe <rogpeppe@gmail.com> * docs: autogenerate tested k8s versions and centralize config (argoproj#14176) (release-3.6) (argoproj#14262) Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com> Signed-off-by: Alan Clucas <alan@clucas.org> Co-authored-by: Mason Malone <651224+MasonM@users.noreply.github.com> * chore(deps): bump minio-go to newer version (argoproj#14185) (release-3.6) (argoproj#14261) Co-authored-by: Vaibhav Kaushik <vaibhavkaushik@salesforce.com> * fix: split pod controller from workflow controller (argoproj#14129) (release-3.6) (argoproj#14263) * chore(deps): fix snyk (argoproj#14264) (release-3.6) (argoproj#14268) * chore: revert to correct k8s versions Accidental bump from argoproj#14176 cherry-pick Signed-off-by: Alan Clucas <alan@clucas.org> * chore(deps): bump github.com/go-jose/go-jose/v3 from 3.0.3 to 3.0.4 in the go_modules group (cherry-pick argoproj#14231) (argoproj#14269) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * fix: wait for workflow informer to sync before pod informer (cherry-pick argoproj#14248) (argoproj#14266) Signed-off-by: Rohan K <rohankmr414@gmail.com> Co-authored-by: Rohan K <rohankmr414@gmail.com> * fix(cli): remove red from log colour selection. Fixes argoproj#6740 (cherry-pick argoproj#14215) (argoproj#14278) Signed-off-by: Prabakaran Kumaresshan <4676330+nixphix@users.noreply.github.com> Co-authored-by: Prabakaran Kumaresshan <4676330+nixphix@users.noreply.github.com> * fix: correct semaphore configmap keys for multiple semaphores (argoproj#14184) (release-3.6) (argoproj#14281) * fix: don't print help for non-validation errors. Fixes argoproj#14234 (cherry-pick argoproj#14249) (argoproj#14283) Signed-off-by: Koichi Shimada <jumpe1programming@gmail.com> Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com> Co-authored-by: koichi <51446844+jumpe1@users.noreply.github.com> Co-authored-by: Mason Malone <651224+MasonM@users.noreply.github.com> * docs: fix kubernetes versions (release-3.6) (argoproj#14273) Signed-off-by: Alan Clucas <alan@clucas.org> * fix(workflow/sync): use RWMutex to prevent concurrent map access (cherry-pick argoproj#14321) (argoproj#14322) Signed-off-by: Ryan Currah <ryan@currah.ca> Co-authored-by: Ryan Currah <ryan@currah.ca> * chore(lint): update golangci-lint to 2.1.1 (argoproj#14390) (cherry-pick release-3.6) (argoproj#14417) * chore: bump golang 1.23->1.24 (argoproj#14385) (cherry-pick release-3.6) (argoproj#14418) * fix: gracefully handle invalid CronWorkflows and simplify logic. (cherry-pick argoproj#14197) (argoproj#14419) Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com> * fix: prevent dfs sorter infinite recursion on cycle. Fixes argoproj#13395 (cherry-pick argoproj#14391) (argoproj#14420) Signed-off-by: Adrien Delannoy <a.delannoyfr@gmail.com> Co-authored-by: Adrien Delannoy <a.delannoyfr@gmail.com> * chore(deps): bump github.com/expr-lang/expr from 1.16.9 to 1.17.0 (argoproj#14307) (release-3.6) (argoproj#14421) Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps)!: update k8s and argo-events (release-3.6) (argoproj#14424) Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: william.vanhevelingen <william.vanhevelingen@acquia.com> Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com> Signed-off-by: William Van Hevelingen <william.vanhevelingen@acquia.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: William Van Hevelingen <William.VanHevelingen@acquia.com> Co-authored-by: Mason Malone <651224+MasonM@users.noreply.github.com> * fix: correct retry logic (argoproj#13734) (release-3.6) (argoproj#14428) Signed-off-by: isubasinghe <isitha@pipekit.io> Signed-off-by: Alan Clucas <alan@clucas.org> Co-authored-by: Isitha Subasinghe <isitha@pipekit.io> * fix: manual retries exit handler cleanup. Fixes argoproj#14180 (argoproj#14181) (release-3.6) (argoproj#14429) Signed-off-by: isubasinghe <isitha@pipekit.io> Signed-off-by: Alan Clucas <alan@clucas.org> Co-authored-by: Isitha Subasinghe <isitha@pipekit.io> * fix: correct manual retry logic. Fixes argoproj#14124 (argoproj#14328) (release-3.6) (argoproj#14430) Signed-off-by: oninowang <oninowang@tencent.com> Signed-off-by: Alan Clucas <alan@clucas.org> Co-authored-by: jswxstw <jswxstw@gmail.com> * fix: disable ALPN in argo-server as a workaround (argoproj#14433) Signed-off-by: Alan Clucas <alan@clucas.org> * result of codegen Signed-off-by: Kim <kim.aharfi@codefresh.io> * fix:lint Signed-off-by: Kim <kim.aharfi@codefresh.io> --------- Signed-off-by: Alan Clucas <alan@clucas.org> Signed-off-by: Tim Collins <tim@thecollins.team> Signed-off-by: Roger Peppe <rogpeppe@gmail.com> Signed-off-by: Mason Malone <651224+MasonM@users.noreply.github.com> Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Rohan K <rohankmr414@gmail.com> Signed-off-by: Prabakaran Kumaresshan <4676330+nixphix@users.noreply.github.com> Signed-off-by: Koichi Shimada <jumpe1programming@gmail.com> Signed-off-by: Ryan Currah <ryan@currah.ca> Signed-off-by: Adrien Delannoy <a.delannoyfr@gmail.com> Signed-off-by: william.vanhevelingen <william.vanhevelingen@acquia.com> Signed-off-by: William Van Hevelingen <william.vanhevelingen@acquia.com> Signed-off-by: isubasinghe <isitha@pipekit.io> Signed-off-by: oninowang <oninowang@tencent.com> Signed-off-by: Kim <kim.aharfi@codefresh.io> Co-authored-by: Alan Clucas <alan@clucas.org> Co-authored-by: gcp-cherry-pick-bot[bot] <98988430+gcp-cherry-pick-bot[bot]@users.noreply.github.com> Co-authored-by: Tim Collins <45351296+tico24@users.noreply.github.com> Co-authored-by: Roger Peppe <rogpeppe@gmail.com> Co-authored-by: Mason Malone <651224+MasonM@users.noreply.github.com> Co-authored-by: Vaibhav Kaushik <vaibhavkaushik@salesforce.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Rohan K <rohankmr414@gmail.com> Co-authored-by: Prabakaran Kumaresshan <4676330+nixphix@users.noreply.github.com> Co-authored-by: koichi <51446844+jumpe1@users.noreply.github.com> Co-authored-by: Ryan Currah <ryan@currah.ca> Co-authored-by: Adrien Delannoy <a.delannoyfr@gmail.com> Co-authored-by: William Van Hevelingen <William.VanHevelingen@acquia.com> Co-authored-by: Isitha Subasinghe <isitha@pipekit.io> Co-authored-by: jswxstw <jswxstw@gmail.com>
This resolves a fatal "concurrent map iteration and map write" error by synchronizing reads and writes to syncLockMap with a read-write mutex. This change has been running in our environment for 7 days (8,764 workflows) without reoccurrence.
Fixes #14300
Motivation
To stop fatal panics when reading and writing to the
syncLockMapmap at the same time.Modifications
CheckWorkflowExistencefunction.Verification
make workflow-controller-image.Documentation
N/A