-
Notifications
You must be signed in to change notification settings - Fork 7.2k
[ci] refactor auth out of copy_files #60614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: andrew/revup/master/upload-guards
Are you sure you want to change the base?
[ci] refactor auth out of copy_files #60614
Conversation
|
Reviews in this chain: |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors authentication logic into a new centralized library, ci/ray_ci/rayci_auth.py, which is a great improvement for code organization and reusability. The new implementation for Docker Hub login also enhances security by using --password-stdin, preventing password exposure in the process list. My review focuses on the new rayci_auth.py file, offering suggestions to improve its robustness, maintainability, and error handling.
| def _retry(f): | ||
| """Retry decorator for API calls with exponential backoff on 5xx errors.""" | ||
|
|
||
| def inner(): | ||
| resp = None | ||
| for _ in range(5): | ||
| resp = f() | ||
| print("Getting Presigned URL, status_code", resp.status_code) | ||
| if resp.status_code >= 500: | ||
| print("errored, retrying...") | ||
| print(resp.text) | ||
| time.sleep(5) | ||
| else: | ||
| return resp | ||
| if resp is None or resp.status_code >= 500: | ||
| print("still errored after many retries") | ||
| sys.exit(1) | ||
|
|
||
| return inner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _retry decorator can be improved. As this logic is now in a shared library, making it more robust is important.
- Docstring Mismatch: The docstring claims "exponential backoff", but the implementation uses a fixed
time.sleep(5). - Exiting the process: The function calls
sys.exit(1)on failure. In a library module, it's better to raise an exception to allow the caller to handle the error. This makes the function more reusable.
| def _retry(f): | |
| """Retry decorator for API calls with exponential backoff on 5xx errors.""" | |
| def inner(): | |
| resp = None | |
| for _ in range(5): | |
| resp = f() | |
| print("Getting Presigned URL, status_code", resp.status_code) | |
| if resp.status_code >= 500: | |
| print("errored, retrying...") | |
| print(resp.text) | |
| time.sleep(5) | |
| else: | |
| return resp | |
| if resp is None or resp.status_code >= 500: | |
| print("still errored after many retries") | |
| sys.exit(1) | |
| return inner | |
| def _retry(f): | |
| """Retry decorator for API calls with a fixed-interval retry on 5xx errors.""" | |
| def inner(): | |
| resp = None | |
| for _ in range(5): | |
| resp = f() | |
| print("Getting Presigned URL, status_code", resp.status_code) | |
| if resp.status_code >= 500: | |
| print("errored, retrying...") | |
| print(resp.text) | |
| time.sleep(5) | |
| else: | |
| return resp | |
| if resp is None or resp.status_code >= 500: | |
| print("still errored after many retries") | |
| raise RuntimeError("Failed to get a valid response from RayCI API after multiple retries.") | |
| return inner |
ci/ray_ci/rayci_auth.py
Outdated
| aws_region="us-west-2", | ||
| aws_service="execute-api", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AWS region and service are hardcoded here. To improve maintainability and configurability, consider defining them as module-level constants, similar to RAYCI_API_HOST. You would then define RAYCI_API_REGION and RAYCI_API_SERVICE at the top of the file.
| aws_region="us-west-2", | |
| aws_service="execute-api", | |
| aws_region=RAYCI_API_REGION, | |
| aws_service=RAYCI_API_SERVICE, |
| resp = requests.get( | ||
| RAYCI_API_URL, | ||
| auth=auth, | ||
| params={"job_id": os.environ["BUILDKITE_JOB_ID"]}, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Direct access to os.environ["BUILDKITE_JOB_ID"] will raise a KeyError if the environment variable is not set. It's better practice to use .get() and raise a more informative ValueError if it's missing.
| resp = requests.get( | |
| RAYCI_API_URL, | |
| auth=auth, | |
| params={"job_id": os.environ["BUILDKITE_JOB_ID"]}, | |
| ) | |
| job_id = os.environ.get("BUILDKITE_JOB_ID") | |
| if not job_id: | |
| raise ValueError("BUILDKITE_JOB_ID environment variable must be set.") | |
| resp = requests.get( | |
| RAYCI_API_URL, | |
| auth=auth, | |
| params={"job_id": job_id}, | |
| ) |
ci/ray_ci/rayci_auth.py
Outdated
| Fetches credentials and runs `docker login` for the rayproject account. | ||
| """ | ||
| resp = get_rayci_api_response() | ||
| pwd = resp.json()["docker_password"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line can raise a KeyError or requests.exceptions.JSONDecodeError if the API response is not as expected. It's safer to wrap this in a try...except block to handle potential errors gracefully and provide a more informative error message.
try:
pwd = resp.json()["docker_password"]
except (requests.exceptions.JSONDecodeError, KeyError):
raise ValueError(f"Could not get docker password from API. Response: {resp.text}")68188cc to
7452f4c
Compare
For the Wanda version of building each Ray Image, we want to selectively upload to DockerHub only in certain postsubmit cases. Rather than calling on 'buildkite:copy_files --destination docker_login' separately, this refactor pushes the auth setup to a central library that can be called on certain conditions. Without this, logic would need to be pushed into the Rayci step itself, since current login is gated on a per-Pipeline-ID basis. Topic: auth-refactor Relative: upload-guards Signed-off-by: andrew <andrew@anyscale.com>
2195731 to
e8e58b5
Compare
54061a8 to
9fc08d8
Compare
e8e58b5 to
26fe123
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
| subprocess.check_call( | ||
| ["docker", "login", "--username", DOCKER_HUB_USERNAME, "--password-stdin"], | ||
| input=pwd.encode(), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subprocess.check_call does not support input parameter
High Severity
The docker_hub_login function uses subprocess.check_call() with an input parameter, but check_call doesn't support input. This parameter is only available in subprocess.run(). When called, this will raise a TypeError because the input kwarg gets passed to Popen, which doesn't accept it. The function needs to use subprocess.run(..., check=True) instead.


For the Wanda version of building each Ray Image, we want to selectively upload to DockerHub only in certain postsubmit cases. Rather than calling on 'buildkite:copy_files --destination docker_login' separately, this refactor pushes the auth setup to a central library that can be called on certain conditions.
Without this, logic would need to be pushed into the Rayci step itself, since current login is gated on a per-Pipeline-ID basis.
Topic: auth-refactor
Relative: upload-guards
Signed-off-by: andrew andrew@anyscale.com