-
Notifications
You must be signed in to change notification settings - Fork 553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cephfs: safeguard localClusterState struct from race conditions #4163
Conversation
/test ci/centos/mini-e2e/k8s-1.27 |
This fixes #4162. feel free to attach the issue to the PR |
internal/cephfs/core/volume.go
Outdated
var clusterAdditionalInfo = make(map[string]*localClusterState) | ||
var clusterAdditionalInfoMutex sync.Mutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Early suggestion:- use sync.Map instead of map and sync.Mutex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll make things more complex.
We'll need to get variable and convert to clusterState type every time and then store it back.
The race condition happens only when we are writing so current code should be good.
The upper map uses a mutex and members themselves are atomic.
/retest ci/centos/mini-e2e/k8s-1.27 |
61b19fa
to
0f0d56a
Compare
internal/cephfs/core/volume.go
Outdated
const ( | ||
unknown operationState = iota | ||
unknown int64 = iota |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why this was int64 before, as we have minimal values we should have used uint32.
internal/cephfs/core/volume.go
Outdated
if clusterAdditionalInfo[s.clusterID].resizeState.Load() == unknown || | ||
clusterAdditionalInfo[s.clusterID].resizeState.Load() == supported { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also go to the helper function like in metadata.go as an enhancement.
clusterAdditionalInfo = make(map[string]*localClusterState) | ||
// clusterAdditionalInfoMutex is used to synchronize access to | ||
// clusterAdditionalInfo map. | ||
clusterAdditionalInfoMutex = sync.Mutex{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would a sync.RWMutex
not be better?
If you only mean to protect against concurrent writes, please state it clearly.
internal/cephfs/core/volume.go
Outdated
@@ -232,7 +238,7 @@ func (s *subVolumeClient) CreateVolume(ctx context.Context) error { | |||
} | |||
|
|||
// create subvolumegroup if not already created for the cluster. | |||
if !clusterAdditionalInfo[s.clusterID].subVolumeGroupsCreated[s.FsName] { | |||
if _, found := clusterAdditionalInfo[s.clusterID].subVolumeGroupsCreated.Load(s.FsName); !found { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still racy. If the subVolumeGroup is not found, there is a timespan where multiple processes can try to create it. This is something that should be protected by a Mutex.
If creating the subVolumeGroup multiple times is not problematic, please leave a comment in the code about it.
internal/cephfs/core/volume.go
Outdated
resizeState operationState | ||
subVolMetadataState operationState | ||
subVolSnapshotMetadataState operationState | ||
resizeState atomic.Int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand why using an atomic int is safer here. Can you explain that in the commit message?
Pull request has been modified.
On deeper inspection, indeed atomic really didn't serve the purpose and subVolgroupCreated was still kinda buggy. Rewrote the code to use RWMutex locks individually with lot of helpers. Please take a look again. |
/retest ci/centos/mini-e2e/k8s-1.27 |
internal/cephfs/core/volume.go
Outdated
subVolSnapshotMetadataState operationState | ||
resizeState operationStateMutex | ||
subVolMetadataState operationStateMutex | ||
subVolSnapshotMetadataState operationStateMutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this may be overdoing it, is the subVolumeGroupsRWMutex
of this struct not sufficient to protect its members?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this may be overdoing it, is the
subVolumeGroupsRWMutex
of this struct not sufficient to protect its members?
No, all are independent parameters.
- with R-Locks it should basically not have much overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not about the performance overhead. I am worried about the complexity that looks unneeded.
internal/cephfs/core/volume.go
Outdated
@@ -240,7 +247,7 @@ func (s *subVolumeClient) CreateVolume(ctx context.Context) error { | |||
} | |||
|
|||
// create subvolumegroup if not already created for the cluster. | |||
if !clusterAdditionalInfo[s.clusterID].subVolumeGroupsCreated[s.FsName] { | |||
if !s.isSubVolumeGroupCreated(s.SubvolumeGroup) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still racy. two or more go routines can call the function, and both can continue to create the SubVolumeGroup.
In order to prevent races, you will need to take a lock on the clusterAdditionalInfo[s.clusterID]
object (a single lock for all members to keep it simple would be fine), or
- lock the
.subVolumeGroupsRWMutex
- create the SubVolumeGroup
- unlock the
.subVolumeGroupsRWMutex
If access to the different members needs similar serialization, use a single Mutex to update any of the members and have other go routines block until the updating is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still racy. two or more go routines can call the function, and both can continue to create the SubVolumeGroup.
In order to prevent races, you will need to take a lock on the
clusterAdditionalInfo[s.clusterID]
object (a single lock for all members to keep it simple would be fine), or
- lock the
.subVolumeGroupsRWMutex
- create the SubVolumeGroup
- unlock the
.subVolumeGroupsRWMutex
If access to the different members needs similar serialization, use a single Mutex to update any of the members and have other go routines block until the updating is done.
Yes that is intentional,
The mentioned parallel create will happen only once and there's no side affects to svg create cmd being called more than once. The follow on update will still be guarded by rwlock so we're fine.
I don't want to hold locks while we wait for a response from ceph.
And I want don't want other members to wait while one is being updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but that is not what the commit message describes:
Multiple go-routines may simultaneously check for
presence of a clusterID's metadata such as
subvolumegroup created state, resize state,
metadata state and snapshot metadata state in the
clusterAdditionalInfo map and update an entry after creation
if it is absent. This set of operation needs to be serialized.
There is no serialization in this commit. This commit add atomically setting/reading a value. Usually an int is atomic already, and sufficient if there are no calculations done with it. I do not think the mutex per member in the operationState
contributes to preventing race conditions.
👍 |
@Mergifyio queue |
✅ The pull request has been merged automaticallyThe pull request has been merged automatically at d516a1d |
Multiple go-routines may simultaneously check for a clusterID's presence in clusterAdditionalInfo and create an entry if it is absent. This set of operation needs to be serialized. Therefore, this commit safeguards clusterAdditionalInfo map from concurrent writes with a mutex to prevent the above problem. Signed-off-by: Rakshith R <rar@redhat.com>
Multiple go-routines may simultaneously create the subVolumeGroupCreated map or write into it for a particular group. This commit safeguards subVolumeGroupCreated map from concurrent creation/writes while allowing for multiple readers. Signed-off-by: Rakshith R <rar@redhat.com>
/test ci/centos/upgrade-tests-cephfs |
/test ci/centos/upgrade-tests-rbd |
/test ci/centos/k8s-e2e-external-storage/1.26 |
/test ci/centos/k8s-e2e-external-storage/1.27 |
/test ci/centos/mini-e2e-helm/k8s-1.26 |
/test ci/centos/mini-e2e-helm/k8s-1.27 |
/test ci/centos/k8s-e2e-external-storage/1.28 |
/test ci/centos/mini-e2e/k8s-1.26 |
/test ci/centos/mini-e2e/k8s-1.27 |
/test ci/centos/mini-e2e-helm/k8s-1.28 |
/test ci/centos/mini-e2e/k8s-1.28 |
@Mergifyio backport release-3.9 |
❌ No backport have been created
GitHub error: |
@Mergifyio backport release-v3.9 |
🟠 Pending
|
Describe what this PR does
This commit uses atomic.Int64 and sync.Map with members of localClusterState and safeguards clusterAdditionalInfo map operations with a mutex.
Is the change backward compatible?
yes
Are there concerns around backward compatibility?
no
Related issues
Fixes: #4162
Show available bot commands
These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:
/retest ci/centos/<job-name>
: retest the<job-name>
after unrelatedfailure (please report the failure too!)