Skip to content

Conversation

jfennick
Copy link
Contributor

@jfennick jfennick commented Oct 3, 2023

The primary purpose of cwl-docker-extract is to pull all images in a workflow (and subworkflows, etc) and populate the default docker / --container-engine cache. Additionally copying the images to another directory is often unnecessary and takes up 2X the disk space. For large workflows that have many unoptimized images, this could approach 100GB. This PR makes copying optional (for docker only).

@codecov
Copy link

codecov bot commented Oct 3, 2023

Codecov Report

Merging #247 (ea43afc) into main (823b687) will decrease coverage by 0.01%.
The diff coverage is 68.42%.

@@            Coverage Diff             @@
##             main     #247      +/-   ##
==========================================
- Coverage   32.03%   32.03%   -0.01%     
==========================================
  Files          30       30              
  Lines       30476    30484       +8     
  Branches     9133     9137       +4     
==========================================
+ Hits         9764     9766       +2     
- Misses      18160    18169       +9     
+ Partials     2552     2549       -3     
Files Coverage Δ
cwl_utils/image_puller.py 82.89% <84.61%> (-1.83%) ⬇️
cwl_utils/docker_extract.py 87.50% <33.33%> (-5.84%) ⬇️

... and 5 files with indirect coverage changes

@mr-c
Copy link
Member

mr-c commented Oct 3, 2023

The primary purpose of cwl-docker-extract is to pull all images in a workflow (and subworkflows, etc) and populate the default docker / --container-engine cache. Additionally copying the images to another directory is often unnecessary and takes up 2X the disk space. For large workflows that have many unoptimized images, this could approach 100GB. This PR makes copying optional (for docker only).

Thanks for the PR. I'm glad to hear about your use case, though it is not the original use case for this script. There are HPC systems where the compute nodes have no internet access, and thus the docket containers have to be downloaded on a different system and transferred out of band to the compute nodes.

@jfennick
Copy link
Contributor Author

jfennick commented Oct 3, 2023

The primary purpose of cwl-docker-extract is to pull all images in a workflow (and subworkflows, etc) and populate the default docker / --container-engine cache. Additionally copying the images to another directory is often unnecessary and takes up 2X the disk space. For large workflows that have many unoptimized images, this could approach 100GB. This PR makes copying optional (for docker only).

Thanks for the PR. I'm glad to hear about your use case, though it is not the original use case for this script. There are HPC systems where the compute nodes have no internet access, and thus the docket containers have to be downloaded on a different system and transferred out of band to the compute nodes.

Right, I should have phrased that differently. I also have to deal with an out of band HPC machine, and I agree that is one use case.

The use case I mentioned above is a workaround for the fact that cwltool only uses docker run when executing workflows, and by default docker run will only pull an image if one does not exist locally (--pull missing). If an image exists locally, but it is not in fact the latest image, docker run will execute the old image. This can lead to "works for me" situations where someone is developing a docker image on their local machine and periodically pushing, but everyone else gets one of the stale images and thus weird errors. Perhaps I'll make a PR for cwltool and/or Toil as well.

@jfennick
Copy link
Contributor Author

jfennick commented Oct 3, 2023

https://cwltool.readthedocs.io/en/latest/cli.html#cmdoption-cwltool-force-docker-pull

Nevermind, I guess there's already an option for cwltool. It would be nice if --force-docker-pull is enabled by default.

@mr-c mr-c enabled auto-merge (rebase) October 6, 2023 08:41
@mr-c mr-c merged commit a104e9d into common-workflow-language:main Oct 6, 2023
@jfennick jfennick deleted the docker_save_optional branch December 1, 2023 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants