-
Notifications
You must be signed in to change notification settings - Fork 561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default multiwrite blockmode #261
Default multiwrite blockmode #261
Conversation
This reverts commit b5b8e46.
Need to implement rpc status codes in the return for the case of users trying to provision a file based rbd volume with multi-node-multi-writer set |
47fab4b
to
6a3af1f
Compare
@j-griffith Do we have a test case for this PR ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@j-griffith as @humblec suggested it would be great to have a test case of this one, unfortunately we don't have an e2e test framework, it would be great if you give some points on how to test this feature (+ve and -ve scenarios) so that we would like use these points to write some test cases later.
@Madhu-1 Yeah, e2e testing could be as simple as create a block mode pvc, and attach it to a pod, ensure that it's mounted at the provided device path (eg: 6a3af1f#diff-1c0f522a61adc59307209c8e0296db49R118). Feel free to ping me if you want to chat, if we get an infrastructure in place the test itself is fairly easy. |
iic, we already tried this and it was working even without this patch.. I think thats the base for test case question :).. @j-griffith it seems that, the main concern should be on data consistency , Isnt it ? |
@humblec Sorry, not sure I'm following; this did work when we merged b5b8e46 and that was cool; but consistency was a concern for file based mode (for good reason). So this is basically a revert of the file based option, and only allows RWX with Block mode. Consistency checks would have to be done with something like dd, or direct IO calls. |
@j-griffith IMO, file based PVCs should also have support for RWX , isnt it ? if there are consistency issues on file, I think we have to track it down without delay. ps# looks like we have some disconnect about this PR.. lets discuss about it first :P, |
It was my opinion that RWX was reasonable for file mode as well (tbc rbd block with k8s file mode); the concerns that came up pointed out that since Kubernetes and the CRI in use will mkfs and mount these on the node; and that it's typically using ext-4 or xfs, this is not going to be consistent if two nodes try and use the volume at the same time. There are valid use-cases IMO, but they are limited, and of course they're dependent on very good application coordination.
For sure, it's great to flush things out and make sure we get what we want long term here :) |
6a3af1f
to
932b91b
Compare
fixed the rpc response, as far as tests; given we don't have a framework in place perhaps that should be an independent item? |
@Madhu-1 thoughts and the DNM label? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes look good, I have few comments.
Makefile
Outdated
@@ -31,7 +31,7 @@ go-test: | |||
./scripts/test-go.sh | |||
|
|||
static-check: | |||
./scripts/lint-go.sh | |||
./scripts/lint-go.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to remove extra space
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it
examples/README.md
Outdated
``` | ||
|
||
Create a POD that uses this PVC: | ||
|
||
```yaml | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding yaml results in a good rendering in markdown files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point adding the tag to all the yaml items I added, thanks!
examples/README.md
Outdated
userid: admin | ||
fsType: xfs | ||
multiNodeWritable: "enabled" | ||
reclaimPolicy: Delete | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add yaml here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
examples/README.md
Outdated
Wait for the POD to enter Running state, write some data to | ||
`/var/lib/www/html` | ||
Wait for the POD to enter Running state, check that our block device | ||
is available in the container at `/dev/rdbblock` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add an example here how to check the block device and a sample output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, added!
examples/README.md
Outdated
resources: | ||
requests: | ||
storage: 1Gi | ||
storageClassName: csi-rbd | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we require storage class name in PVC template?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, probably make this match https://github.com/ceph/ceph-csi/blob/csi-v1.0/examples/rbd/raw-block-pvc.yaml except s/ReadWriteOnce/ReadWriteMany/ ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not required if you set it to be the default SC (I should've had a note in there pointing out the doc assumes that). I'll put the SC in there explicitly though so it's "safe" to copy/paste :)
pkg/rbd/nodeserver.go
Outdated
disableInUseChecks = true | ||
} else { | ||
klog.Warningf("MULTI_NODE_MULTI_WRITER currently only supported with volumes of access type `block`, invalid AccessMode for volume: %v", req.GetVolumeId()) | ||
return nil, fmt.Errorf("rbd: MULTI_NODE_MULTI_WRITER access mode only allowed with BLOCK access type") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to return status.Error
with codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, good catch
examples/README.md
Outdated
Volumes that are of access_type `block` | ||
|
||
*WARNING* This feature is strictly for workloads that know how to deal | ||
with concurrent acces to the Volume (eg Active/Passive applications). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: access
is available in the container at `/dev/rdbblock` | ||
|
||
Now, we can create a second POD (ensure the POD is scheduled on a different | ||
node; multiwriter single node works without this feature) that also uses this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, is there a way to use some kind of anti-affinity to ensure that the second POD is scheduled on a different node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you could use anti-affinity or node-selector. See Kube docs here: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
pkg/rbd/controllerserver.go
Outdated
isMultiNode := false | ||
isBlock := false | ||
for _, cap := range req.VolumeCapabilities { | ||
// Only checking SINGLE_NODE_SINGLE_WRITER here because regardless of the other types (MULTI READER) we need to implement the same logic to ignore the in-use response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will work with today's advertised capabilities but it is fragile if future capabilites are added in k8s. SINGLE_NODE_READER_ONLY is not MultiNode. MULTI_NODE_READER_ONLY & MULTI_NODE_SINGLE_WRITER may be safe because there is no concurrent writing going on.
Would you consider testing explicitly for MULTI_NODE_MULTI_WRITER doing the fail early at line 108ff if that is true ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is it's not a check of whether it's "safe" or not, it's a check of whether K8s and the plugin will let you even request the mode. I'm not sure I follow the "fragile" bit? We're explicitly supporting a single capability here, if other capabilities are added to K8s or if we choose to support additional capabilities in the plugin we should do that explicitly.
By the way, one of the reasons I thought about this and changed it was because you raised concerns about not being explicit and I was using an "any multi-node" logic; so I'm sort of confused. Let me know if I'm missing something; I might just not be seeing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize I didn't express my concern well and that what bothers me here is the name of the flag is_multinode
. One of the access_modes that isn't SINGLE_MODE_SINGLE_WRITER is SINGLE_NODE_SINGLE_READER so it seems wrong to say is_multinode
would be true. In fact SINGLE_NODE_SINGLE_READER should be safe w/o block mode too, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey John -- a couple remarks inline for your consideration -- overall looks good.
b16a039
to
ef54c11
Compare
This commit reverts the initial implementation of the multi-node-multi-writer feature: commit: b5b8e46 It replaces that implementation with a more restrictive version that only allows multi-node-multi-writer for volumes of type `block` With this change there are no volume parameters required in the stoarge class, we also fail any attempt to create a file based device with multi-node-multi-write being specified, this way a user doesn't have to wait until they try and do the publish before realizing it doesn't work.
ef54c11
to
6ec1196
Compare
for _, cap := range req.VolumeCapabilities { | ||
// Only checking SINGLE_NODE_SINGLE_WRITER here because regardless of the other types (MULTI READER) we need to implement the same logic to ignore the in-use response | ||
if cap.GetAccessMode().GetMode() != csi.VolumeCapability_AccessMode_SINGLE_NODE_WRITER { | ||
isMultiNode = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what about setting isMultiNode to true if the access mode is neither SINGLE_MODE_WRITER or SINGLE_MODE_READER_ONLY ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's "NODE", not "MODE.. If the mode is "neither" SINGLE_NODE_xxxx" then by elimination it's "MULTI_NODE_xxx".
I'm not exactly sure what you're looking for here, but I think the safest answer is to just make this explicitly for MULTI_NODE_MULTI_WRITER option only, and then I can handle other cases involving RO and implementing the RO option separately. Trying to infer the fix or work around the issues with the single-node-multi cases doesn't belong in this PR I don't think, and I'd prefer to see how those things end up sorting out upstream. Does that work?
Note that the valid options are:
message AccessMode {
enum Mode {
UNKNOWN = 0;
// Can only be published once as read/write on a single node, at
// any given time.
SINGLE_NODE_WRITER = 1;
// Can only be published once as readonly on a single node, at
// any given time.
SINGLE_NODE_READER_ONLY = 2;
// Can be published as readonly at multiple nodes simultaneously.
MULTI_NODE_READER_ONLY = 3;
// Can be published at multiple nodes simultaneously. Only one of
// the node can be used as read/write. The rest will be readonly.
MULTI_NODE_SINGLE_WRITER = 4;
// Can be published as read/write at multiple nodes
// simultaneously.
MULTI_NODE_MULTI_WRITER = 5;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's "NODE", not "MODE.. If the mode is "neither" SINGLE_NODE_xxxx" then by elimination it's "MULTI_NODE_xxx".
I think the safest answer is to just make this explicitly for MULTI_NODE_MULTI_WRITER option only, and then I can handle other cases involving RO and implementing the RO option separately. Trying to infer the fix or work around the issues with the single-node-multi cases doesn't belong in this PR I don't think, and I'd prefer to see how those things end up sorting out upstream. Does that work?
Yes, and thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So are you good here then (https://github.com/ceph/ceph-csi/pull/261/files#diff-35c77cc181d49675cd67a6095520b41fR99) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yessir
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, sorry to prolong this, but don't you want to change the sense of the comparison as well?
if cap.GetAccessMode().GetMode() == csi.VolumeCapability_AccessMode_MULTI_NODE_MULTI_WRITER {
isMultiNode = true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gahhh, yes!
fd73ee5
to
3951b4a
Compare
pkg/rbd/nodeserver.go
Outdated
} else { | ||
klog.Warningf("MULTI_NODE_MULTI_WRITER currently only supported with volumes of access type `block`, invalid AccessMode for volume: %v", req.GetVolumeId()) | ||
e := fmt.Errorf("rbd: MULTI_NODE_MULTI_WRITER access mode only allowed with BLOCK access type") | ||
return nil, status.Error(codes.InvalidArgument, e.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can just return status.Error(codes.InvalidArgument, "rbd: MULTI_NODE_MULTI_WRITER access mode only allowed with BLOCK access type")
instead of creating a new variable e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, changed
3951b4a
to
ab7b6a0
Compare
@tombarron do you have any other comments on this one? |
ab7b6a0
to
d79ca5b
Compare
No, I'm OK with this PR now and thanks John for working on this! |
thanks @j-griffith @tombarron |
Default multiwrite blockmode
Syncing latest changes from upstream devel for ceph-csi
Rework multi-node-multi-writer feature
This commit reverts the initial implementation of the
multi-node-multi-writer feature:
commit: b5b8e46
It replaces that implementation with a more restrictive version that
only allows multi-node-multi-writer for volumes of type
block
With this change there are no volume parameters required in the stoarge
class, we also fail any attempt to create a file based device with
multi-node-multi-write being specified, this way a user doesn't have to
wait until they try and do the publish before realizing it doesn't work.