-
Notifications
You must be signed in to change notification settings - Fork 54
Conversation
0583acd
to
2d4b5f0
Compare
Rebased and updated to reflect that This changes the value proposition for raw block volumes. I'm not sure anymore how useful they really are. |
It turned out that Google made the same change in their own fork of Kubernetes, so there is support upstream for this API change. It's now just a matter of getting it reviewed and tested. I think we could merge this PR if we agree that the test is worth it (IMHO it is). |
I started extending the test to raw block volumes. Good that I did that, because there are additional complications:
|
The original API doesn't support implementing a test suite outside of Kubernetes. Change submitted upstream in kubernetes/kubernetes#85540
We could check mount options, but that wouldn't work for raw block volumes. A better approach is to do exactly what an application would do, which is calling mmap(MAP_SYNC). A new Go program is used for that because then we can compile on the host and run on the target. This new pmem-dax-check might also be useful as stand-alone program in _output, but for now it just gets build on demand when running tests. The check is implemented as a test suite that applies the same test to several differen test patterns (different file systems, PVC/inline, raw blocks). Raw blocks need special treatment because dax only works after mounting.
Our documentation did not specify how exactly PMEM-CSI provides raw block volumes and thus what users can expect from them. Actually using them also isn't trivial. This now gets explained in the documentation and in the updated example. Ubuntu is used as base image because the "mknod" in busybox does not accept hex device numbers as printed by "stat" and because mkfs.ext4 is in the image. With Clear Linux, mkfs.ext4 would have to be installed from a rather large bundle.
This has passed CI testing, please review. |
The reason was the default block size chosen by mkfs.ext4 when |
README.md
Outdated
and mount the raw block volume themselves if they want dax. The | ||
advantage then is that they have full control over that part. | ||
|
||
Volumes with "devdax" namespace mode are not supported. That mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like mentioning here in Blockvolumes section about devdax
is not appropriate as we have not mentioned devdax
as supported namespace mode anywhere in our documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which is exactly the problem that I am trying to solve here: if we don't mention it anywhere, how does a reader know that it really isn't supported?
This felt like a good place because in some ways it would be more natural to use "devdax" for raw block devices. Some users might come here with that expectation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have listed supported namespace modes under Architecture and Operation
section.
|Namespace modes |fsdax, sector mode3 namespaces pre-created as pools |namespace in required mode created directly, no need to pre-create pools |
If you would like to add a note about why we are not supporting devdax
that would be the right place in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved.
cmd/pmem-dax-check/main.go
Outdated
@@ -0,0 +1,107 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe, the better place for this code would be under test/
as it's only used for E2E tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently yes, but this might change. It's a rather useful tool that I have used quite a lot lately also outside of the automated tests. I'd prefer to keep it here, but if you disagree then I'll move it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to keep in test/
, It's good to know what @okartau feels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved it.
It's only used for testing at the moment and thus not a "normal" command (intel#476 (comment)).
Might be a better place (intel#476 (comment)).
Privileged: &privileged, | ||
}, | ||
Name: containerName, | ||
Image: os.Getenv("PMEM_CSI_IMAGE"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may use any generic(maybe ubuntu) image here which has mknod
and other filesystem utilities installed right. PMEM_CSI_IMAGE
here is bit misleading. Running this dax test outside our driver image is preferable in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we need to pull another, potentially large image into the cluster. We also don't have control over the content of that image and would have to lock onto a specific, soon out-dated version to avoid a potential regression should "ubuntu:latest" ever drop required tools (like mkfs.ext4).
I chose the PMEM-CSI image because it's already on the cluster and we know that it has the necessary tools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let's keep this way.
} | ||
|
||
By("checking that missing DAX support is detected") | ||
pmempod.RunInPod(f, l.root, []string{l.daxCheckBinary}, l.daxCheckBinary+" /no-dax; if [ $? -ne 1 ]; then echo should have reported missing DAX >&2; exit 1; fi", ns, pod.Name, containerName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pohly I couldn't get this point. Where we are copying the test binary to the target container? Can you please help me to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's done by RunInPod
, see
Lines 24 to 41 in 46259b5
if len(items) > 0 { | |
args := []string{"-cf", "-"} | |
args = append(args, items...) | |
tar := exec.Command("tar", args...) | |
tar.Stderr = GinkgoWriter | |
tar.Dir = rootdir | |
pipe, err := tar.StdoutPipe() | |
framework.ExpectNoError(err, "create pipe for tar") | |
err = tar.Start() | |
framework.ExpectNoError(err, "run tar") | |
defer func() { | |
err = tar.Wait() | |
framework.ExpectNoError(err, "tar runtime error") | |
}() | |
input = pipe | |
cmdPrefix = "tar -xf - &&" | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, sorry I was bit lazy :). Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
We want to know whether mmap(MAP_SYNC) really works.
This builds on top of the infrastructure in PR #473, so that needs to be merged first.
Having to fork Kubernetes is unfortunate, but simpler than copying the entire code that we need. Let's wait for some feedback on kubernetes/kubernetes#85540 before merging this.
Fixes: #450