-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SkyServe] Support mixture of spot and on-demand #3194
Conversation
@cblmemo Thanks! resolved all the comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just finished a pass : )
# blocking probe to other replicas. | ||
is_preempted = self._handle_preemption(info) | ||
if is_preempted: | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole for loop of the future in probe_futures
is quite weird, why can't we just have the whole "probe and process probe results" in a same function and have that function to be parallelized, instead of only parallelizing info.probe
. cc'ing @cblmemo
Otherwise the processing and termination of the unhealthy replica could block the whole process of the probing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Actually this is taking a lot of time and once caused some bugs in our SpotHedge e2e experiment; we should parallelize the preemption handling as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bump for this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we spawn another thread to handle preemption (for modularity)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm feeling like we could spawn a thread to do all of the following:
info.probe
- Update all attributes in
info
, e.g.info.status_property
- Handle preemption
- Update database
Does that sounds good to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if we should mix all these tasks under one thread. This probably requires more discussion and I think can be a different PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the smoke test! Left some comments about that 🫡
tests/test_smoke.py
Outdated
def _check_both_spot_on_demand_in_status(name: str) -> str: | ||
return (f'output=$(sky serve status {name});' | ||
'echo "$output" | grep "GCP(\[Spot\]vCPU=2)" && ' | ||
'echo "$output" | grep "GCP(vCPU=2)";') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually that makes me think, should we print the is_spot
column in our sky serve status table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The row will have \[Spot\]
so I think we don't need a is_spot
column
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this PR @MaoZiming! The code looks mostly good to me.
…o serve-spot-no-placer
#2793
without spot placer.
Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
bash tests/backward_comaptibility_tests.sh