Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Add docker run options #3682

Merged
merged 10 commits into from
Jun 26, 2024
25 changes: 25 additions & 0 deletions docs/source/reference/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,31 @@ Available fields and semantics:
- gcp
- kubernetes

docker:
# Additional Docker run options (optional).
#
# When image_id: docker:<docker_image> is used in a task YAML, additional
# run options for starting the Docker container can be specified here.
Michaelvll marked this conversation as resolved.
Show resolved Hide resolved
# These options will be passed directly as command line args to `docker run`,
# see: https://docs.docker.com/reference/cli/docker/container/run/
#
# The following run options are applied by default and cannot be overridden:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where have we guarantee this "cannot be override"? In the code it seems like we are just appending them in the docker run cmds and follow the default behaviour of multiple same argument to the command. Do we need some checks on the docker run options to make sure it does not conflict w/ existing options?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is hard to check the arguments for docker. It might be fine for now to allow the program to proceed and let it error out after the cluster is launched. Since we have allowed the change of run_options for an existing cluster, a user can quickly fix it according to the error message. Wdyt?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good!

# --net=host
# --cap-add=SYS_ADMIN
# --device=/dev/fuse
# --security-opt=apparmor:unconfined
Michaelvll marked this conversation as resolved.
Show resolved Hide resolved
# --runtime=nvidia # Applied if nvidia GPUs are detected on the host
#
# This field can be useful for mounting volumes and other advanced Docker
# configurations. You can specify a list of arguments or a string, where the
# former will be combined into a single string with spaces. The following is
# an example option for allowing running Docker inside Docker and increase
# the size of /dev/shm.:
# sky launch --cloud aws --image-id docker:continuumio/miniconda3 "apt update; apt install -y docker.io; docker run hello-world"
run_options:
- -v /var/run/docker.sock:/var/run/docker.sock
- --shm-size=2g

nvidia_gpus:
# Disable ECC for NVIDIA GPUs (optional).
#
Expand Down
14 changes: 14 additions & 0 deletions sky/backends/backend_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -873,6 +873,17 @@ def write_cluster_config(
f'open(os.path.expanduser("{constants.SKY_REMOTE_RAY_PORT_FILE}"), "w", encoding="utf-8"))\''
)

# Docker run options
docker_run_options = skypilot_config.get_nested(('docker', 'run_options'),
[])
if isinstance(docker_run_options, str):
docker_run_options = [docker_run_options]
if docker_run_options and isinstance(to_provision.cloud, clouds.Kubernetes):
logger.warning(f'{colorama.Style.DIM}Docker run options are specified, '
'but ignored for Kubernetes: '
f'{" ".join(docker_run_options)}'
f'{colorama.Style.RESET_ALL}')

# Use a tmp file path to avoid incomplete YAML file being re-used in the
# future.
initial_setup_commands = []
Expand Down Expand Up @@ -923,6 +934,9 @@ def write_cluster_config(
wheel_hash).replace('{cloud}',
str(cloud).lower())),

# Docker
'docker_run_options': docker_run_options,

# Port of Ray (GCS server).
# Ray's default port 6379 is conflicted with Redis.
'ray_port': constants.SKY_REMOTE_RAY_PORT,
Expand Down
3 changes: 3 additions & 0 deletions sky/templates/aws-ray.yml.j2
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ docker:
{%- if custom_resources is not none %}
--gpus all
{%- endif %}
{%- for run_option in docker_run_options %}
- {{run_option}}
{%- endfor %}
{%- if docker_login_config is not none %}
docker_login_config:
username: |-
Expand Down
3 changes: 3 additions & 0 deletions sky/templates/azure-ray.yml.j2
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ docker:
{%- if custom_resources is not none %}
--gpus all
{%- endif %}
{%- for run_option in docker_run_options %}
- {{run_option}}
{%- endfor %}
{%- endif %}

provider:
Expand Down
3 changes: 3 additions & 0 deletions sky/templates/gcp-ray.yml.j2
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@ docker:
{%- if gpu is not none %}
--gpus all
{%- endif %}
{%- for run_option in docker_run_options %}
- {{run_option}}
{%- endfor %}
{%- if docker_login_config is not none %}
docker_login_config:
username: |-
Expand Down
3 changes: 3 additions & 0 deletions sky/templates/paperspace-ray.yml.j2
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ docker:
{%- if custom_resources is not none %}
--gpus all
{%- endif %}
{%- for run_option in docker_run_options %}
- {{run_option}}
{%- endfor %}
{%- if docker_login_config is not none %}
docker_login_config:
username: |-
Expand Down
18 changes: 18 additions & 0 deletions sky/utils/schemas.py
Original file line number Diff line number Diff line change
Expand Up @@ -757,6 +757,23 @@ def get_config_schema():
}
}

docker_configs = {
'type': 'object',
'required': [],
'additionalProperties': False,
'properties': {
'run_options': {
'anyOf': [{
'type': 'string',
}, {
'type': 'array',
'items': {
'type': 'string',
}
}]
}
}
}
gpu_configs = {
'type': 'object',
'required': [],
Expand Down Expand Up @@ -785,6 +802,7 @@ def get_config_schema():
'spot': controller_resources_schema,
'serve': controller_resources_schema,
'allowed_clouds': allowed_clouds,
'docker': docker_configs,
'nvidia_gpus': gpu_configs,
**cloud_configs,
},
Expand Down
Loading