-
-
Notifications
You must be signed in to change notification settings - Fork 155
Closed as not planned
Description
Is it worth adding a couple of util scripts that would export the users current environment (conda or virtual env) and builds a docker image on top of daskdev/dask:latest
?
Being able to pass a list of conda/pip packages is fine for relatively simple environments / prototyping but I can see value in something slightly more stable. Building a new image (Shouldn't need to be done too often) will increase the connection time to the KubeCluster, but will reduce the worker start up time.
I have some basic POC of this, which I am currently using, which looks something like
- On Jupyterlab, build my conda env / validate locally in notebook
- Export the environment
- Build a docker image on top of
daskdev/dask:latest
, using something like
dockerfile_template = (
'FROM daskdev/dask:latest\n'
'ADD {environment_file} /opt/app/environment.yml\n'
'RUN /opt/conda/bin/conda env update -n dask -f /opt/app/environment.yml && \ \n'
' conda clean -tipsy'
)
def build_publish_dockerfile(context_dir, dockerfile_txt, tag):
with pathlib.Path(os.getcwd()).joinpath('dockerfile').open('w') as f:
f.write(dockerfile_txt)
client.images.build(
path='.', dockerfile='dockerfile', tag='%s/%s' % (DOCKER_HUB_REPO, tag), nocache=True
)
def image_from_conda_env(env_name, tag, conda_bin='conda'):
with tempfile.TemporaryDirectory() as tmp_dir:
env_file = pathlib.Path(tmp_dir).joinpath('environment.yml')
export_conda_env(env_name, env_file, conda_bin)
dockerfile = dockerfile_template.format(env_file)
build_publish_dockerfile(tmp_dir, dockerfile_txt=dockerfile, tag=tag)
image_from_conda_env('myenv', 'dask-worker-myenv')
k = KubeCluster(image='dask-worker-myenv')
Is this in the works? Or any thoughts on the above?
chrish42