Skip to content
This repository was archived by the owner on Oct 22, 2024. It is now read-only.

dax tests + raw block volumes #476

Merged
merged 5 commits into from
Dec 5, 2019
Merged

dax tests + raw block volumes #476

merged 5 commits into from
Dec 5, 2019

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Nov 22, 2019

We want to know whether mmap(MAP_SYNC) really works.

This builds on top of the infrastructure in PR #473, so that needs to be merged first.

Having to fork Kubernetes is unfortunate, but simpler than copying the entire code that we need. Let's wait for some feedback on kubernetes/kubernetes#85540 before merging this.

Fixes: #450

@pohly pohly force-pushed the dax-test branch 3 times, most recently from 0583acd to 2d4b5f0 Compare November 29, 2019 16:46
@pohly pohly changed the title WIP: dax tests dax tests Nov 29, 2019
@pohly
Copy link
Contributor Author

pohly commented Nov 29, 2019

Rebased and updated to reflect that mmap(MAP_SYNC) does not work directly for the raw block device. At least I didn't know that - learned something new this week. 🤷‍♂️

This changes the value proposition for raw block volumes. I'm not sure anymore how useful they really are.

@pohly pohly changed the title dax tests dax tests + raw block volumes Nov 29, 2019
@pohly
Copy link
Contributor Author

pohly commented Nov 29, 2019

Having to fork Kubernetes is unfortunate, but simpler than copying the entire code that we need. Let's wait for some feedback on kubernetes/kubernetes#85540 before merging this.

It turned out that Google made the same change in their own fork of Kubernetes, so there is support upstream for this API change. It's now just a matter of getting it reviewed and tested. I think we could merge this PR if we agree that the test is worth it (IMHO it is).

@pohly pohly changed the title dax tests + raw block volumes WIP: dax tests + raw block volumes Dec 2, 2019
@pohly
Copy link
Contributor Author

pohly commented Dec 2, 2019

I started extending the test to raw block volumes. Good that I did that, because there are additional complications:

pohly added 3 commits December 3, 2019 15:29
The original API doesn't support implementing a test suite outside of
Kubernetes. Change submitted upstream in
kubernetes/kubernetes#85540
We could check mount options, but that wouldn't work for raw block
volumes. A better approach is to do exactly what an application would
do, which is calling mmap(MAP_SYNC). A new Go program is used for that
because then we can compile on the host and run on the target. This
new pmem-dax-check might also be useful as stand-alone program in
_output, but for now it just gets build on demand when running tests.

The check is implemented as a test suite that applies the same test to
several differen test patterns (different file systems,
PVC/inline, raw blocks). Raw blocks need special treatment because
dax only works after mounting.
Our documentation did not specify how exactly PMEM-CSI provides raw
block volumes and thus what users can expect from them. Actually using
them also isn't trivial. This now gets explained in the documentation
and in the updated example.

Ubuntu is used as base image because the "mknod" in busybox does not
accept hex device numbers as printed by "stat" and because mkfs.ext4
is in the image. With Clear Linux, mkfs.ext4 would have to be
installed from a rather large bundle.
@pohly
Copy link
Contributor Author

pohly commented Dec 4, 2019

This has passed CI testing, please review.

@pohly pohly changed the title WIP: dax tests + raw block volumes dax tests + raw block volumes Dec 4, 2019
@pohly
Copy link
Contributor Author

pohly commented Dec 4, 2019

with the default 110Mi claim size the resulting raw block volume doesn't support -odax. 8Gi works. To be investigated further...

The reason was the default block size chosen by mkfs.ext4 when -b is not given. Adding -b as we do in PMEM-CSI itself made it work also for smaller volumes.

README.md Outdated
and mount the raw block volume themselves if they want dax. The
advantage then is that they have full control over that part.

Volumes with "devdax" namespace mode are not supported. That mode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like mentioning here in Blockvolumes section about devdax is not appropriate as we have not mentioned devdax as supported namespace mode anywhere in our documentation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which is exactly the problem that I am trying to solve here: if we don't mention it anywhere, how does a reader know that it really isn't supported?

This felt like a good place because in some ways it would be more natural to use "devdax" for raw block devices. Some users might come here with that expectation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have listed supported namespace modes under Architecture and Operation section.

|Namespace modes |fsdax, sector mode3 namespaces pre-created as pools |namespace in required mode created directly, no need to pre-create pools |

If you would like to add a note about why we are not supporting devdax that would be the right place in my opinion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved.

@@ -0,0 +1,107 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe, the better place for this code would be under test/ as it's only used for E2E tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently yes, but this might change. It's a rather useful tool that I have used quite a lot lately also outside of the automated tests. I'd prefer to keep it here, but if you disagree then I'll move it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to keep in test/, It's good to know what @okartau feels.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it.

pohly added 2 commits December 5, 2019 08:29
It's only used for testing at the moment and thus not a "normal"
command (intel#476 (comment)).
Privileged: &privileged,
},
Name: containerName,
Image: os.Getenv("PMEM_CSI_IMAGE"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may use any generic(maybe ubuntu) image here which has mknod and other filesystem utilities installed right. PMEM_CSI_IMAGE here is bit misleading. Running this dax test outside our driver image is preferable in my opinion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we need to pull another, potentially large image into the cluster. We also don't have control over the content of that image and would have to lock onto a specific, soon out-dated version to avoid a potential regression should "ubuntu:latest" ever drop required tools (like mkfs.ext4).

I chose the PMEM-CSI image because it's already on the cluster and we know that it has the necessary tools.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's keep this way.

}

By("checking that missing DAX support is detected")
pmempod.RunInPod(f, l.root, []string{l.daxCheckBinary}, l.daxCheckBinary+" /no-dax; if [ $? -ne 1 ]; then echo should have reported missing DAX >&2; exit 1; fi", ns, pod.Name, containerName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pohly I couldn't get this point. Where we are copying the test binary to the target container? Can you please help me to understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's done by RunInPod, see

if len(items) > 0 {
args := []string{"-cf", "-"}
args = append(args, items...)
tar := exec.Command("tar", args...)
tar.Stderr = GinkgoWriter
tar.Dir = rootdir
pipe, err := tar.StdoutPipe()
framework.ExpectNoError(err, "create pipe for tar")
err = tar.Start()
framework.ExpectNoError(err, "run tar")
defer func() {
err = tar.Wait()
framework.ExpectNoError(err, "tar runtime error")
}()
input = pipe
cmdPrefix = "tar -xf - &&"
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, sorry I was bit lazy :). Thanks.

Copy link
Contributor

@avalluri avalluri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@pohly pohly merged commit 4d3ca30 into intel:devel Dec 5, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test that MAP_SYNC works
2 participants