[Lambda Cloud] Update Regex of Internal IP for H100 Support #1969
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
According to RFC 1918, the IP range
172.16.0.0
-172.31.255.255
can be used as internal IP address, which Lambda Cloud seems to be actually doing so for the new H100 GPU instances. (img)Therefore, users may get a
Failed to obtain private IP from node
error while launching those new instances on Lambda Cloud. (#1948)This PR modifies the regex used by
ray get-head-ip
to also match IPs172.16.*
-172.32.*
.The new Regex is tested as: https://regex101.com/r/jHHWti/2 and with
on a Lambda Cloud H100 instance.
Breaking Changes / Possible Issues
With old instance types that do have a
10.x.x.x
internal IP, the updated command tends to use the later-listed172.x.x.x
IP instead of the original10.x
.I haven't used multiple clusters yet and I have not test if this will break anything.
Still trying to resolve
tests/test_smoke.py:41 ModuleNotFoundError: No module named 'sky'
while runningpytest tests/test_smoke.py
.Tested (run the relevant ones):
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
bash tests/backward_comaptibility_tests.sh