-
Notifications
You must be signed in to change notification settings - Fork 792
[server] remove file_mounts_mapping after use #7395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/smoke-test --remote-server |
I'm slightly concerned about the failing |
/smoke-test --managed-jobs |
Current hypothesis on the test failure:
|
/smoke-test -k test_ignore_exclusions |
/smoke-test -k test_ignore_exclusions --remote-server |
Smoke tests pass after merging master, which makes me think that #7379 may have resolved that failure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?????
Thanks @cg505! Can we add unit test for the task YAML got by API server / jobs controller and further the API server submitted by jobs controller?
Regarding smoke tests, this case should already be covered by using Regarding unit tests, we can't easily test the integration flow you mentioned. But we can add some test on this function directly. |
I found another (might be the same underlying issue) issue with |
/smoke-test -k test_managed_jobs_force_disable_cloud_bucket |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the tests @cg505!
task.file_mounts_mapping
is set when uploading a task to a remote API server to help the server to use the right files in the uploaded zip. This changes the translation code to remove this internal field from the translated task. Normally, this wouldn't matter, but in jobs the task may be sent in a future API request, which can cause issues.If you call jobs.launch on a remote API server, the file mounts will be translated using
file_mounts_mapping
as with any other task submitted to a remote API server. This is expected.However, later, we submit the same translated task YAML to the jobs controller, which will submit it back to an API server (either a local API server on the jobs controller in non-consolidation mode, or back to the same API server in consolidation mode). However, both of these calls will be to a "local" API server (local to the job controller process), so we will not do any file zipping/uploading/file mount translation. However, the old
file_mount_mapping
from the first translation will persist, even though it is now irrelevant.The presence of the field will cause the API server to try translate the file mounts for this second API call, even though they should not be translated.
skypilot/sky/server/common.py
Lines 845 to 848 in fda3757
KeyError
inprocess_mounts_in_task_on_api_server
when the job controller is callingsdk.launch
- heresrc
/original_path
will not exist in thefile_mounts_mapping
skypilot/sky/server/common.py
Lines 862 to 863 in fda3757
skypilot/sky/server/common.py
Lines 835 to 838 in fda3757
Before #7051, this was not an issue because the job controller called
core.launch
directly instead of callingsdk.launch
.This is not an issue if you're not using a remote API server, because we won't upload files and write the
file_mounts_mapping
for a local API server.This is not an issue if an intermediate bucket is used for the jobs controller. That will be used by default unless one of the following is true:
jobs.force_disable_cloud_bucket
is set totrue
in the configTested (run the relevant ones):
bash format.sh
/smoke-test
(CI) orpytest tests/test_smoke.py
(local)/smoke-test -k test_name
(CI) orpytest tests/test_smoke.py::test_name
(local)/quicktest-core
(CI) orpytest tests/smoke_tests/test_backward_compat.py
(local)