-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[k8s] Allow configuring /dev/shm size limit #3244
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding this @romilbhardwaj! Question: could we mention why we the use case of this and the best practice for a user to specify this field, e.g. if they encounter some specific errors, they should change this field?
docs/source/reference/config.rst
Outdated
# Size of the /dev/shm shared memory for the pod (optional). | ||
# | ||
# Defaults to None, which means no size limits are set. If set, the value | ||
# must be a string that is a valid Kubernetes quantity, e.g., "3Gi". | ||
# https://kubernetes.io/docs/reference/kubernetes-api/common-definitions/quantity/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we mention the use case for specifying this in the comment?
Also, it seems a user could specify this directly in the pod_config
above, is it necessary to have a separate field in the config for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment. This is required by one of our users, since their production environment uses an AdmissionController that denies any pods which do not have a specific limit on /dev/shm size.
Using pod_config is possible, but requires the user to understand our pod template and use the exact fields ( (name: dshm, ...
) when overriding the pod_config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation! How frequent this field will be used? I am wondering if it is not that frequently used, we can, instead, just have the full config in the pod_config
in this doc, and have the comment there.
I kind of feeling including two ways to change the same config might be a bit confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh good point. I thought more about it and indeed it likely won't be used that frequently. I added a comment to our config.yaml docs and updated the PR description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating this PR @romilbhardwaj! LGTM.
- name: dshm # Use this to modify the /dev/shm volume mounted by SkyPilot | ||
emptyDir: | ||
medium: Memory | ||
sizeLimit: 3Gi # Set a size limit for the /dev/shm volume |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is /dev/shm
a convention name that people would know? Just wondering if we should elaborate a bit for what is this for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/dev/shm
is generally well understood as a shared memory, and explaining it here might add more clutter. It's usage depends on the application (e.g., Ray stores GCS state there, pytorch uses it for IPC), so it's hard to generalize. Folks running docker containers would usually be familiar with this, so might be fine to leave it be?
destination_volume = next( | ||
(v for v in destination[key] | ||
if v.get('name') == new_volume_name), None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: using filter might be easier to read. not feeling strongly, please feel free to ignore.
destination_volume = next( | |
(v for v in destination[key] | |
if v.get('name') == new_volume_name), None) | |
destination_volume = next(filter(lambda v: v.get('name') == new_volume_name, destination[key]), None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the generator might be more pythonic :)
Many production environments require limits on the size of
/dev/shm
partition allocated to a pod. This PR adds support for this configuration throughpod_config
in~/.sky/config.yaml
.Tested (run the relevant ones):
bash format.sh
sky launch
with config.yaml set and inspect pod withkubectl describe pods
config.yaml used for testing: