-
Notifications
You must be signed in to change notification settings - Fork 81
feat(clp-package): Add support for more AWS authentication methods. #743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Warning Rate limit exceeded@Eden-D-Zhang has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 5 minutes and 46 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
WalkthroughThis set of changes implements a comprehensive refactor and extension of AWS authentication handling across several components of the codebase. A new Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI/Script
participant Docker
participant Container
participant AWS
User->>CLI/Script: Start component (e.g., compression worker)
CLI/Script->>CLI/Script: Load CLPConfig (incl. aws_authentication, aws_config_directory)
CLI/Script->>CLI/Script: generate_container_auth_options()
CLI/Script->>Docker: Build Docker command with env vars & mounts (incl. AWS creds/config)
Docker->>Container: Start with provided env vars and mounts
Container->>AWS: Access S3 using provided credentials/profile/config
Possibly related PRs
Suggested reviewers
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I think your apporach makes sense. we need to agree on where to put the .aws directory config in the config file.
Please also have a table where each row as a authentication method supported by boto3, with the following columns, so we can see the big picture
- Whether supported by clp package or not
- How should user specify such credentials in boto3 (and maybe they don't need to do anything)
- How should user specify such authentication info in clp package
- How CLP supports the method (i.e. via env, or mount)
- Any extra note(for example, webui viewer support)
It's also worthwhile to think about:
On failure, which component will run into the error and what user will see
components/job-orchestration/job_orchestration/scheduler/job_config.py
Outdated
Show resolved
Hide resolved
components/job-orchestration/job_orchestration/executor/query/fs_search_task.py
Outdated
Show resolved
Hide resolved
components/job-orchestration/job_orchestration/executor/query/extract_stream_task.py
Outdated
Show resolved
Hide resolved
components/job-orchestration/job_orchestration/executor/query/fs_search_task.py
Outdated
Show resolved
Hide resolved
components/clp-package-utils/clp_package_utils/scripts/start_clp.py
Outdated
Show resolved
Hide resolved
Ignore latest commit doesn't work atm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't looked at start_clp.py carefully,. but let me leave the first batch of comments
components/clp-package-utils/clp_package_utils/scripts/compress.py
Outdated
Show resolved
Hide resolved
components/clp-package-utils/clp_package_utils/scripts/compress.py
Outdated
Show resolved
Hide resolved
components/clp-package-utils/clp_package_utils/scripts/native/compress.py
Outdated
Show resolved
Hide resolved
components/job-orchestration/job_orchestration/executor/query/fs_search_task.py
Outdated
Show resolved
Hide resolved
components/job-orchestration/job_orchestration/executor/query/extract_stream_task.py
Outdated
Show resolved
Hide resolved
components/job-orchestration/job_orchestration/executor/query/extract_stream_task.py
Outdated
Show resolved
Hide resolved
Co-authored-by: haiqi96 <14502009+haiqi96@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (7)
docs/src/user-guide/guides-using-object-storage/clp-config.md (7)
22-24
: Add context foraws_config_directory
with profile authentication
Thelogs_input.s3_config
snippet shows only theaws_authentication
block. It would help to include a brief inline reminder (or comment) that when using theprofile
type, anaws_config_directory
top-level key must also be set.
32-33
: Rephrase link for readability
Minor nit: consider rewording the bullet for smoother flow, for example:-* `<type>` and the type-specific settings are described in the [configuring AWS authentication](#configuring-aws-authentication) section. +* See the [Configuring AWS authentication](#configuring-aws-authentication) section for type-specific settings.
50-52
: Add context foraws_config_directory
with profile authentication
Similarly, thearchive_output.storage.s3_config
example shows onlyaws_authentication
. Including a note about settingaws_config_directory
at the top level when usingprofile
authentication would aid reader comprehension.
65-66
: Rephrase link for consistency
Same nit as above: you might reword the bullet for consistency:-* `<type>` and the type-specific settings are described in the [configuring AWS authentication](#configuring-aws-authentication) section. +* See the [Configuring AWS authentication](#configuring-aws-authentication) section for type-specific settings.
83-85
: Add context foraws_config_directory
with profile authentication
In thestream_output.storage.s3_config
example, a reminder to setaws_config_directory
at the top level for theprofile
type would prevent readers from missing that requirement.
98-99
: Rephrase link for consistency
A minor style suggestion to keep the phrasing consistent across bullets:-* `<type>` and the type-specific settings are described in the [configuring AWS authentication](#configuring-aws-authentication) section. +* See the [Configuring AWS authentication](#configuring-aws-authentication) section for type-specific settings.
106-116
: Clarify nesting admonition andaws_config_directory
scope
The note correctly warns thataws_authentication
is shown at the top level but should be nested under*.s3_config
. To avoid confusion with the separateaws_config_directory
key (which is always top level), you could explicitly state that only the authentication block is illustrated top level here and thataws_config_directory
remains a top-level sibling oflogs_input
,archive_output
, etc.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/src/user-guide/guides-using-object-storage/clp-config.md
(5 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/src/user-guide/guides-using-object-storage/clp-config.md
[style] ~30-~30: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...s the bucket's name. * <key-prefix>
is the prefix of all logs you wish to comp...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~152-~152: To form a complete sentence, be sure to include a subject or ‘there’.
Context: ... "" ``` <profile-name>
should be the name of an existing [AWS CLI pro...
(MISSING_IT_THERE)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: build (macos-latest)
- GitHub Check: lint-check (macos-latest)
- GitHub Check: lint-check (ubuntu-latest)
🔇 Additional comments (1)
docs/src/user-guide/guides-using-object-storage/clp-config.md (1)
118-118
: Settings list is clear and consistent
The introduction "Settings for each type are described below:" and the subsequent list of authentication types is clear and follows the order of the detailed sections.
Co-authored-by: haiqi96 <14502009+haiqi96@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
components/clp-py-utils/clp_py_utils/s3_utils.py (4)
27-46
: Function implementation looks good, but could improve error handlingThe function correctly retrieves credentials from a boto3 session, but it doesn't handle specific boto3 exceptions that might occur (like ProfileNotFound). Also, the docstring should clarify what "None" return means.
def _get_session_credentials(aws_profile: Optional[str] = None) -> Optional[S3Credentials]: """ Generates AWS credentials created by boto3 when starting a session with given profile. :param aws_profile: Name of profile configured in ~/.aws directory :return: An S3Credentials object with access key pair and session token if applicable. + Returns None if credentials cannot be retrieved (e.g., missing profile, + insufficient permissions, or expired credentials). + :raises: ProfileNotFound if the specified profile does not exist """ aws_session: Optional[boto3.Session] = None - if aws_profile is not None: - aws_session = boto3.Session(profile_name=aws_profile) - else: - aws_session = boto3.Session() + try: + if aws_profile is not None: + aws_session = boto3.Session(profile_name=aws_profile) + else: + aws_session = boto3.Session() + except boto3.exceptions.ProfileNotFound: + raise ValueError(f"AWS profile '{aws_profile}' not found") + credentials = aws_session.get_credentials() if credentials is None: return None
90-153
: Well-structured container auth options functionThe function correctly handles different component types and authentication methods, with appropriate validations and error handling. This implementation aligns well with the PR objectives for supporting various AWS authentication methods.
Consider adding more inline comments to explain the logic flow, particularly about when AWS config mounting is needed versus when environment variables are used.
156-180
: Good implementation of S3 client creationThe function handles all the required authentication types and creates the appropriate boto3 session and client.
For consistency with other functions in the boto3 ecosystem, consider renaming to
_get_s3_client
instead of_create_s3_client
.
230-252
: Good refactoring of s3_put with retry configurationThe function now uses the centralized S3 client creation and adds retry configuration with appropriate defaults.
Consider making the retry parameters configurable through additional optional parameters instead of hardcoding "3" attempts.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
components/clp-py-utils/clp_py_utils/s3_utils.py
(4 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
components/clp-py-utils/clp_py_utils/s3_utils.py (3)
components/job-orchestration/job_orchestration/scheduler/job_config.py (1)
S3InputConfig
(31-34)components/clp-py-utils/clp_py_utils/clp_config.py (4)
AwsAuthType
(59-63)S3Config
(372-396)S3Credentials
(322-337)StorageType
(54-56)components/clp-py-utils/clp_py_utils/compression.py (1)
FileMetadata
(10-25)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: lint-check (macos-latest)
- GitHub Check: lint-check (ubuntu-latest)
- GitHub Check: build (macos-latest)
🔇 Additional comments (2)
components/clp-py-utils/clp_py_utils/s3_utils.py (2)
49-87
: Comprehensive implementation of credential env vars functionThe function handles all authentication types described in the PR objectives (credentials, profile, ec2, env_vars) and correctly manages session tokens. The code is well-structured with appropriate error handling.
209-209
: LGTM - Good refactoring to use centralized client creationUsing the new
_create_s3_client
function improves code reuse and consistency in authentication handling.
The webui changes lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code change looks good to me. Added one more suggestion to the doc.
Co-authored-by: haiqi96 <14502009+haiqi96@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
components/clp-py-utils/clp_py_utils/s3_utils.py (3)
32-52
: Simplify session creation logic.The function works correctly, but session creation can be simplified.
def _get_session_credentials(aws_profile: Optional[str] = None) -> Optional[S3Credentials]: """ Generates AWS credentials created by boto3 when starting a session with given profile. :param aws_profile: Name of profile configured in ~/.aws directory :return: An S3Credentials object with access key pair and session token if applicable. """ - aws_session: Optional[boto3.Session] = None - if aws_profile is not None: - aws_session = boto3.Session(profile_name=aws_profile) - else: - aws_session = boto3.Session() + aws_session = boto3.Session(profile_name=aws_profile) credentials = aws_session.get_credentials() if credentials is None: return None
54-92
: Clean implementation of credential environment variables generation.The function appropriately handles different authentication types and properly handles errors with descriptive messages.
There's a small inconsistency in string formatting on line 91:
if aws_session_token is not None: - env_vars[f"{AWS_ENV_VAR_SESSION_TOKEN}"] = aws_session_token + env_vars[AWS_ENV_VAR_SESSION_TOKEN] = aws_session_token
95-160
: Well-structured container authentication options generation.The function properly generates the necessary authentication options based on component type and storage configuration.
Consider adding an early return when no S3 storages are found for better readability:
for storage in storages_by_component_type: if StorageType.S3 == storage.type: auth = storage.s3_config.aws_authentication if AwsAuthType.profile == auth.type: config_mount = True elif AwsAuthType.env_vars == auth.type: add_env_vars = True + if not config_mount and not add_env_vars: + return (False, []) credentials_env_vars = []
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
components/clp-py-utils/clp_py_utils/s3_utils.py
(4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: lint-check (ubuntu-latest)
- GitHub Check: lint-check (macos-latest)
🔇 Additional comments (6)
components/clp-py-utils/clp_py_utils/s3_utils.py (6)
1-4
: Imports and type hints look good.The addition of the
typing
imports and the switch to using constants for AWS environment variables are solid improvements.
9-22
: Clean import structure from clp_config.The explicit imports of specific classes and constants from clp_config is a good practice, making dependencies clear.
27-29
: Good use of constants for AWS environment variables.Using constants for AWS environment variables helps prevent typos and makes the code more maintainable.
162-186
: Well-implemented S3 client creation with proper authentication handling.The function handles different authentication types correctly and provides clear error messages for unsupported types.
215-215
: Good use of centralized client creation.Replacing direct client creation with the new helper function improves consistency.
236-258
: Improved S3 put implementation.Moving the retry configuration inline and using the centralized client creation function improves code consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I edited the PR title slightly. Let me know if y'all disagree.
Description
This PR introduces significant improvements to AWS authentication handling and other improvements to the CLP package. These changes include:
ec2
)env_vars
)profile
)credentials
)aws_config_directory
field toCLPConfig
to specify the location of the.aws
where profiles are configured (defaults to~/.aws
).aws_config_dir
Docker mount for authentication through configured profiles.S3Manager
class in thelog_viewer_webui
component to support authentication through profiles.S3InputConfig
class to inheritS3Config
class to allow reusing similar code.start_clp.py
:append_docker_mounts
andappend_docker_env_vars
functions to simplify construction of Docker container start commands.append_docker_env_vars
function and improve maintainabiility.Checklist
breaking change.
Validation performed
Manually verified compression, search, and log viewer both locally and on EC2 using all combinations of authentication methods.
Summary by CodeRabbit
New Features
Improvements
Documentation