-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support more methods of WDL task disk specification #5001
Conversation
…toil into issues/4995-disk-spec-wdl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might need to go back and get clarification from the WDL folks before we can implement this properly.
The spec talks about "persistent volumes", but doesn't really explain what those are or which tasks would expect to be able to read which other tasks' writes. The implementation here doesn't actually provide any kind of persistence that I can see, unless running somewhere where the worker nodes have persistent filesystems.
It's not really clear to me whether we're meant to be mounting arbitrary host paths into the containers, or doing something more like Docker volumes. There's a requirement for the host-side path to exist but no other evidence that the task would be able to expect to actually access anything at that host-side path, and the execution engine is somehow responsible for making the volumes be the right size, which is impossible if it is just meant to mount across whatever's already there.
Can we dig up any workflows that genuinely use the mountpoint feature as more than just a test case, to see how they expect it to behave? Can we find or elicit any more explanation from the spec authors as to what the mount point feature is meant to accomplish?
It also might not really make sense to implement this on our own in Toil without some support from MiniWDL. We rely on MiniWDL for ordering up an appropriate container given the runtime spec, and unless we need to hook it into Toil's job requirements logic or make the batch system do special things with batch-system-specific persistent storage, it would be best if we could just get this feature for free when it shows up in MiniWDL.
src/toil/wdl/wdltoil.py
Outdated
if not os.path.exists(part_mount_point): | ||
# this isn't a valid mount point | ||
raise NotImplementedError(f"Cannot use mount point {part_mount_point} as it does not exist") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what the spec says we're supposed to require (the mount point needs to exist on the "host system"), but I've read the disks section of the spec twice and that requirement doesn't really make any sense, because we're mounting storage into the container. The spec doesn't say we actually do anything with this path on the host.
src/toil/wdl/wdltoil.py
Outdated
singularity_original_prepare_mounts = task_container.prepare_mounts | ||
|
||
def patch_prepare_mounts_singularity() -> List[Tuple[str, str, bool]]: | ||
""" | ||
Mount the mount points specified from the disk requirements. | ||
|
||
The singularity and docker patch are separate as they have different function signatures | ||
""" | ||
# todo: support AWS EBS/Kubernetes persistent volumes | ||
# this logic likely only works for local clusters as we don't deal with the size of each mount point | ||
mounts: List[Tuple[str, str, bool]] = singularity_original_prepare_mounts() | ||
# todo: support AWS EBS/Kubernetes persistent volumes | ||
# this logic likely only works for local clusters as we don't deal with the size of each mount point | ||
for mount_point, _ in self._mount_spec.items(): | ||
abs_mount_point = os.path.abspath(mount_point) | ||
mounts.append((abs_mount_point, abs_mount_point, True)) | ||
return mounts | ||
task_container.prepare_mounts = patch_prepare_mounts_singularity # type: ignore[method-assign] | ||
elif isinstance(task_container, SwarmContainer): | ||
docker_original_prepare_mounts = task_container.prepare_mounts | ||
|
||
try: | ||
# miniwdl depends on docker so this should be available but check just in case | ||
import docker | ||
# docker stubs are still WIP: https://github.com/docker/docker-py/issues/2796 | ||
from docker.types import Mount # type: ignore[import-untyped] | ||
|
||
def patch_prepare_mounts_docker(logger: logging.Logger) -> List[Mount]: | ||
""" | ||
Same as the singularity patch but for docker | ||
""" | ||
mounts: List[Mount] = docker_original_prepare_mounts(logger) | ||
for mount_point, _ in self._mount_spec.items(): | ||
abs_mount_point = os.path.abspath(mount_point) | ||
mounts.append( | ||
Mount( | ||
abs_mount_point.rstrip("/").replace("{{", '{{"{{"}}'), | ||
abs_mount_point.rstrip("/").replace("{{", '{{"{{"}}'), | ||
type="bind", | ||
) | ||
) | ||
return mounts | ||
task_container.prepare_mounts = patch_prepare_mounts_docker # type: ignore[method-assign] | ||
except ImportError: | ||
logger.warning("Docker package not installed. Unable to add mount points.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it makes sense to add the ability to make these mounts in Toil as a monkey-patch. There's nothing here that wouldn't make just as much sense in MiniWDL (or more, since MiniWDL actually has a shared filesystem as an assumption), so instead of doing monkey-patches we should PR this machinery to MiniWDL.
Or if we're starting to add multiple monkey-patches to the TaskContainers, maybe we really want to extend them instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably is a good idea to PR this machinery to MiniWDL. The main reason why I had to monkeypatch this in the first place is because MiniWDL doesn't actually support the disks
runtime attribute. Instead of PR-ing the code one-to-one, I think a PR that adds functionality to extend the list of mount points for both docker and singularity is best
src/toil/wdl/wdltoil.py
Outdated
# this logic likely only works for local clusters as we don't deal with the size of each mount point | ||
for mount_point, _ in self._mount_spec.items(): | ||
abs_mount_point = os.path.abspath(mount_point) | ||
mounts.append((abs_mount_point, abs_mount_point, True)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How sure are you that the spec actually means you're supposed to mount this particular host path through to the container?
It does say that the host path is required to exist. But it also says a lot about mounting "persistent volumes" at these mount points, and making them the required size. It doesn't seem to say anything about mounting those host paths specifically into the container.
What if you ask for a 100 GiB /tmp
mount, and /tmp
exists on the host, but /tmp
on the host is only 10 GiB and Toil is actually doing all its work on a much larger /scratch/tmp
? Shouldn't you get a 100 GiB /tmp
mount in the container that actually refers to some location in /scratch/tmp
on the host?
If the mount point feature was really meant to mount particular host paths across, wouldn't it take both a host-side path and a container-side path like the actual underlying mount functionality uses?
@stxue1 Don't we now have the clarification we need so that we can finish this? I think the WDL spec is being revised to make it clearer that the mounts are meant to request so much storage available at such-and-such a path in the container, and that they are not actually meant to mount specific paths into the container. But we still need the changes in this PR adding array-of-mounts support. |
Co-authored-by: Adam Novak <anovak@soe.ucsc.edu>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks better but I think there are still ways to break it.
If the parsing was broken out into its own function with tests for its behavior on both allowed and disallowed disk specification sets, it would be easier to convincingly say that the implementation is correct.
@@ -1979,15 +1984,23 @@ def run(self, file_store: AbstractFileStore) -> Promised[WDLBindings]: | |||
|
|||
per_part_size = convert_units(part_size, part_suffix) | |||
total_bytes += per_part_size | |||
if specified_mount_point is not None: | |||
if mount_spec.get(specified_mount_point) is not None: | |||
if mount_spec.get(specified_mount_point) is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check doesn't account for how local-disk
and no mount point mean the same thing, so will still let you specify both.
If there was a test for this validation logic, that would catch the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mapped local-disk
and no mount point to the same thing in a conditional branch below. I changed the logic a bit so it's more clear and so there's only one conditional branch that does this duplication check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good now.
source_location = os.path.join(tmpdir, str(uuid.uuid4())) | ||
os.mkdir(source_location) | ||
if mount_target is not None: | ||
# None represents an omitted mount point, which will default to the task's work directory. MiniWDL's internals will mount the task's work directory by itself | ||
mount_src_mapping[mount_target] = source_location |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes a directory for omitted mount points even if it never uses them.
We also sort of assume Docker/Singularity is pulling from the same storage we are for /
, but never really enforce/check that.
Closes #4995
Changelog Entry
To be copied to the draft changelog by merger:
Reviewer Checklist
issues/XXXX-fix-the-thing
in the Toil repo, or from an external repo.camelCase
that want to be insnake_case
.docs/running/{cliOptions,cwl,wdl}.rst
Merger Checklist