Add Deep Learning VM (DLVM) images as a base image option for Caliban.#20
Add Deep Learning VM (DLVM) images as a base image option for Caliban.#20sagravat wants to merge 14 commits intogoogle:mainfrom
Conversation
…elp function to list the DLVM types.
…e-auth release issue.
Codecov Report
@@ Coverage Diff @@
## master #20 +/- ##
==========================================
- Coverage 45.67% 44.97% -0.70%
==========================================
Files 17 17
Lines 2728 2777 +49
==========================================
+ Hits 1246 1249 +3
- Misses 1482 1528 +46
Continue to review full report at Codecov.
|
sritchie
left a comment
There was a problem hiding this comment.
@sagravat , this is great! I want to see if we can separate the logic here a bit more. Ideally, we would be able to reuse the existing run_interactive and run_notebook commands (I think that's the names), and make this PR into TWO changes.
The first is the --base_image support, and the second is the GCP notebook submitter.
The second is the GCP scheduler, which I think will work without the DLVM base images.
I'm having trouble parsing the new entrypoint that the second change requires. Is it really different than how we launch jupyterlab normally? If not, I wonder if we could actually enable this by making it easier for folks to install ANY jupyterlab extension.
Happy to chat more. Take a look and let me know if these notes make sense.
| "--dlvm", | ||
| help="DLVM base image type. Must be one of " | ||
| "{}".format(dlvm_types) + ". If supplied, " | ||
| "Caliban will skip the build and push steps and use this image tag.") |
There was a problem hiding this comment.
hey, I'm going to add comments as I read, so this may be clear later! Naively I would have assumed that the DLVM would be a base image, and that you could still install dependencies on top, right? If that is true, can we call this argument --base_image?
Right now, --image_id removes any requirement.txt installation; it's truly a flag to skip any build, including even getting your code into the image.
Really, the syntax should be caliban run e2a4af785bdb...
|
|
||
|
|
||
| def generate_image_tag(project_id, docker_args, dry_run: bool = False): | ||
| def generate_image_tag(project_id, docker_args, dlvm: str = None, dry_run: bool = False): |
There was a problem hiding this comment.
if we call it --base_image we can add that key to the generate_docker_args function here: https://github.com/google/caliban/blob/master/caliban/cli.py#L536
| return _dlvm_config(job_mode).get(dlvm_arg) | ||
|
|
||
|
|
||
| def _dlvm_config(job_mode: JobMode) -> Dict[str, str]: |
There was a problem hiding this comment.
nice, this is good. I think I see now why you have --dlvm vs --base_image. What we COULD do is have a prefix for --base_image, something like --base_image dlvm:pytorch, that would force a lookup here. That would let this feature give us general base images too.
| return """ | ||
| if not dlvm: | ||
| return """ | ||
| RUN pip install {}{} |
There was a problem hiding this comment.
oh, wait, good catch that we need up date this pip too! I can do that separately.
There was a problem hiding this comment.
We have this --lab flag. I'm thinking that we should only do this special installation if --lab is specified. If not, even with a dlvm base image, we should just install normal jupyter. wdyt?
| """.format(library, version_suffix) | ||
| else: | ||
| return """ | ||
| RUN /opt/conda/bin/pip install \ |
There was a problem hiding this comment.
does this only work on the deep learning VMs? and does it work on ALL the deep learning VMs?
If it works without the dlvms, maybe it needs its own flag.
caliban/docker.py
Outdated
| if base_image_fn is None: | ||
| base_image_fn = base_image_id | ||
|
|
||
| base_image = base_image_fn(job_mode) |
There was a problem hiding this comment.
Can you instead pass
base_image_fn = lambda job_mode: _dlvm_id(job_mode, dlvm)
Then we can keep to the API.
|
|
||
|
|
||
| def _notebook_entries(lab: bool = False, version: Optional[str] = None) -> str: | ||
| def _notebook_entries(lab: bool = False, version: Optional[str] = None, dlvm: bool = False) -> str: |
There was a problem hiding this comment.
Can we make this flag called scheduler instead? I think it doesn't depend on dlvm and COULD be a separate flag.
| if inject_notebook.value != 'none': | ||
| install_lab = inject_notebook == NotebookInstall.lab | ||
| if dlvm is None: | ||
| dockerfile += _notebook_entries(lab=install_lab, version=jupyter_version, dlvm=False) |
There was a problem hiding this comment.
how about remove the if/else and make the line
dockerfile += _notebook_entries(lab=install_lab, version=jupyter_version, dlvm=bool(dlvm))
|
|
||
| dockerfile += """ | ||
|
|
||
| USER {uid}:{gid} |
There was a problem hiding this comment.
nice! Now that we're on python 3.6, we can actually make this
dockerfile += f"""
USER {uid}:{gid}
"""using f-strings.
This contribution enables Deep Learning VM (DLVM) images from the Google Cloud AI Platform to be used as base images for execution in local, cloud, shell, and notebook modes. The notebook mode includes a Jupyterlab extension widget that allows the user to directly submit the notebook for training on the AI Platform with configurable hardware accelerators like GPUs and TPUs.