Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Add docker run options #3682

Merged
merged 10 commits into from
Jun 26, 2024
Merged

[Core] Add docker run options #3682

merged 10 commits into from
Jun 26, 2024

Conversation

Michaelvll
Copy link
Collaborator

@Michaelvll Michaelvll commented Jun 21, 2024

This PR adds a new field allowing users to customize the run_options for the docker image.

docker:
  run_options:
      - -v /var/run/docker.sock:/var/run/docker.sock

TODO:
- [ ] propagate the run_options for k8s. (We cannot apply docker run_options to k8s pod containers)

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
    • sky launch -c test-dind --cloud aws --cpus 2 --image-id docker:continuumio/miniconda3:24.1.2-0 "apt update; apt install -y docker.io; docker run hello-world" with config above.
    • sky launch -c test-dind --cloud gcp --cpus 2 --image-id docker:continuumio/miniconda3:24.1.2-0 "apt update; apt install -y docker.io; docker run hello-world" with config above.
    • sky launch -c test-dind --cloud azure --cpus 2 --image-id docker:continuumio/miniconda3:24.1.2-0 "apt update; apt install -y docker.io; docker run hello-world" with config above.
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: conda deactivate; bash -i tests/backward_compatibility_tests.sh

Copy link
Collaborator

@romilbhardwaj romilbhardwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Michaelvll! - haven't tried but the code lgtm. Left some comments on the documentation.

docs/source/reference/config.rst Show resolved Hide resolved
docs/source/reference/config.rst Outdated Show resolved Hide resolved
docs/source/reference/config.rst Show resolved Hide resolved
docs/source/reference/config.rst Outdated Show resolved Hide resolved
@romilbhardwaj romilbhardwaj added this to the v0.6.1 milestone Jun 25, 2024
@Michaelvll Michaelvll mentioned this pull request Jun 25, 2024
6 tasks
Michaelvll and others added 4 commits June 25, 2024 12:02
Copy link
Collaborator

@romilbhardwaj romilbhardwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Michaelvll! LGTM - this will be quite helpful for our users.

Nit - bad args in run_config aren't surfaced nicely, but can be debugged by the user if they go through the logs. We may want to improve the error surfacing here in the future. Example.

Copy link
Collaborator

@cblmemo cblmemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this @Michaelvll ! It would be useful for our user :) Left one thing to discuss and other parts looks great to me!

# These options will be passed directly as command line args to `docker run`,
# see: https://docs.docker.com/reference/cli/docker/container/run/
#
# The following run options are applied by default and cannot be overridden:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where have we guarantee this "cannot be override"? In the code it seems like we are just appending them in the docker run cmds and follow the default behaviour of multiple same argument to the command. Do we need some checks on the docker run options to make sure it does not conflict w/ existing options?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is hard to check the arguments for docker. It might be fine for now to allow the program to proceed and let it error out after the cluster is launched. Since we have allowed the change of run_options for an existing cluster, a user can quickly fix it according to the error message. Wdyt?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good!

@Michaelvll
Copy link
Collaborator Author

Michaelvll commented Jun 26, 2024

Thanks @Michaelvll! LGTM - this will be quite helpful for our users.

Nit - bad args in run_config aren't surfaced nicely, but can be debugged by the user if they go through the logs. We may want to improve the error surfacing here in the future. Example.

The error surfacing should be better now, by avoiding unnecessary retries and printing out the error:

$ sky launch -c doc --cloud aws --image-id docker:continuumio/miniconda3 "apt update; apt install -y docker.io; docker run hello-world"
I 06-26 08:30:01 provisioner.py:458] Successfully provisioned or found existing instance.
E 06-26 08:31:00 subprocess_utils.py:84] NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
E 06-26 08:31:00 subprocess_utils.py:84] 
E 06-26 08:31:00 subprocess_utils.py:84] 
E 06-26 08:31:00 subprocess_utils.py:84] unknown flag: --zhwu
E 06-26 08:31:00 subprocess_utils.py:84] See 'docker run --help'.
E 06-26 08:31:00 subprocess_utils.py:84] 
E 06-26 08:31:00 provisioner.py:591] *** Failed setting up cluster. ***
Clusters
NAME  LAUNCHED    RESOURCES                                                                  STATUS  AUTOSTOP  COMMAND                           
doc   2 mins ago  1x AWS(m6i.2xlarge, image_id={'us-east-1': 'docker:continuumio/minicon...  INIT    -         sky launch -c doc --cloud aws...  

sky.exceptions.CommandError: Command docker run --name sky_container -d -it -e LC_ALL=C.UTF-8 -e LANG=C.UTF-8 --ulimit nofile=1048576:1048576 -v /var/run/docker.sock:/var/run/docker.sock --zhwu=True --shm-size=2g --net=host --cap-add=SYS_ADMIN --device=/dev/fuse --security-opt=apparmor:unconfined continuumio/miniconda3 bash failed with return code 125.
Failed to run docker setup commands.

Remove the problematic arguments, and sky launch again on the existing cluster, and it works.

Tested:

  • sky launch -c doc --cloud aws --image-id docker:continuumio/miniconda3 "apt update; apt install -y docker.io; docker run hello-world" with
    docker:
      run_options:
          - -v /var/run/docker.sock:/var/run/docker.sock
          - --shm-size=2g
    
  • sky launch -c doc --cloud aws --image-id docker:continuumio/miniconda3 "echo hi" without config.yaml

@Michaelvll Michaelvll merged commit bd383e9 into master Jun 26, 2024
20 checks passed
@Michaelvll Michaelvll deleted the docker-run-options branch June 26, 2024 08:47
Michaelvll added a commit that referenced this pull request Aug 23, 2024
* Add docker run options

* Add docs

* Add warning for docker run options in kubernetes

* Update docs/source/reference/config.rst

Co-authored-by: Romil Bhardwaj <romil.bhardwaj@berkeley.edu>

* update

* update doc

* Stream logs

* allow changing the `run_options`

---------

Co-authored-by: Romil Bhardwaj <romil.bhardwaj@berkeley.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants