Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(clp-package): Add support for clp-json to ingest logs from S3. #651

Merged
merged 79 commits into from
Jan 16, 2025
Merged
Changes from 1 commit
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
ca46dca
First version backup
haiqi96 Dec 11, 2024
b763e8b
Small refactor
haiqi96 Dec 11, 2024
4e9529c
First trial for new config
haiqi96 Dec 11, 2024
e9cdea4
Further refactor and polishing
haiqi96 Dec 11, 2024
9ba0a38
Another small refactor
haiqi96 Dec 12, 2024
58befef
small refactor again
haiqi96 Dec 12, 2024
35ec0c3
Combine s3 utils
haiqi96 Dec 12, 2024
5d57b10
Support handling S3 error message
haiqi96 Dec 12, 2024
9991307
Slight logging modification
haiqi96 Dec 12, 2024
5d23790
Linter
haiqi96 Dec 12, 2024
b4bb2af
Add extra verification
haiqi96 Dec 12, 2024
f41c558
Update components/clp-py-utils/clp_py_utils/clp_config.py
haiqi96 Dec 12, 2024
ce5a667
do nothing for now
haiqi96 Dec 12, 2024
f05dc88
backup changes for worker config
haiqi96 Dec 12, 2024
abf5dde
More support
haiqi96 Dec 13, 2024
7d34456
Remove unnecssary change
haiqi96 Dec 13, 2024
a7afd0d
Linter
haiqi96 Dec 13, 2024
99d3094
Handle mount for fs & S3
haiqi96 Dec 13, 2024
1afed1a
Linter
haiqi96 Dec 13, 2024
1de661a
Remove unused functions
haiqi96 Dec 13, 2024
ce3de98
Update components/job-orchestration/job_orchestration/executor/compre…
haiqi96 Dec 13, 2024
f49664f
simplify worker config
haiqi96 Dec 13, 2024
046cdcb
polishing
haiqi96 Dec 13, 2024
242dec2
linter
haiqi96 Dec 14, 2024
ed280cb
Apply suggestions from code review
haiqi96 Dec 16, 2024
0788e59
Fix easier ones
haiqi96 Dec 16, 2024
c198f27
Backup changes
haiqi96 Dec 16, 2024
4819f76
Small fixes
haiqi96 Dec 16, 2024
e5f43fb
fixes
haiqi96 Dec 16, 2024
1246062
add safeguard for archive update failure
haiqi96 Dec 17, 2024
3b870a4
Add docstrings
haiqi96 Dec 17, 2024
214ae3f
Apply suggestions from code review
haiqi96 Dec 18, 2024
6ff92fc
Clean up
haiqi96 Dec 18, 2024
9e07d37
update pyproject.toml
haiqi96 Dec 18, 2024
915b49d
Add docstrings
haiqi96 Dec 18, 2024
a061a29
Apply suggestions from code review
haiqi96 Dec 18, 2024
8301748
Update name as suggested by the code review
haiqi96 Dec 18, 2024
2ada464
a few small fixes to ensure other scripts still work
haiqi96 Dec 18, 2024
6e5aad5
adding safeguard for empty stdout line from clp.
haiqi96 Dec 18, 2024
55c0f36
add safe guard for search
haiqi96 Dec 18, 2024
2d7443e
Polish error messages.
haiqi96 Dec 18, 2024
6f907b2
Linter
haiqi96 Dec 18, 2024
120ffec
Slighlty improve the error message
haiqi96 Dec 18, 2024
d5eae21
Back up
haiqi96 Dec 17, 2024
ce2b440
Backup
haiqi96 Dec 19, 2024
6d2b815
Merge branch 'main' into s3_scheduler
haiqi96 Dec 19, 2024
b8f715d
Update execution image dependency
haiqi96 Dec 19, 2024
57e1912
simplify the code a little bit
haiqi96 Dec 19, 2024
27b8612
fix a previous mistake
haiqi96 Dec 19, 2024
d55f1ad
Keep fixing previous mistake
haiqi96 Dec 19, 2024
4de4fee
add url parsing helper
haiqi96 Dec 19, 2024
4224bd6
Linter
haiqi96 Dec 20, 2024
1cf3d01
Some refactor
haiqi96 Dec 20, 2024
b1655cd
Refactor compress scripts
haiqi96 Dec 20, 2024
d12e173
Initial support for cmdline
haiqi96 Jan 2, 2025
6833ee9
Linter fixes
haiqi96 Jan 2, 2025
a4e92ae
add argument checks
haiqi96 Jan 2, 2025
a638f2d
Polishing
haiqi96 Jan 3, 2025
5685224
Add some docstrings
haiqi96 Jan 3, 2025
fd9dba2
fixes
haiqi96 Jan 3, 2025
f7a175c
Rename task script
haiqi96 Jan 3, 2025
20488a1
fixes
haiqi96 Jan 3, 2025
12d6b97
Some captilization and update to the docstrings
haiqi96 Jan 3, 2025
709dfb0
Merge branch 'main' into s3_scheduler
haiqi96 Jan 13, 2025
fa5ace1
Apply suggestions from code review
haiqi96 Jan 15, 2025
a4675fd
First batch of required updates
haiqi96 Jan 15, 2025
642eb81
Changes missing from the first batch
haiqi96 Jan 15, 2025
d3c0065
Fixing path processing logic
haiqi96 Jan 15, 2025
c7d629e
Fix
haiqi96 Jan 15, 2025
2ddd9c3
Use config parser (tested)
haiqi96 Jan 15, 2025
b7c746d
move config argument around
haiqi96 Jan 15, 2025
3805d36
Linter
haiqi96 Jan 15, 2025
16ad596
Update error reporting
haiqi96 Jan 15, 2025
af732b6
Small touches
haiqi96 Jan 15, 2025
356a1c3
Updatew docstrings
haiqi96 Jan 15, 2025
57afa5f
Use default user for now
haiqi96 Jan 15, 2025
99ea6dd
Linter
haiqi96 Jan 15, 2025
996692e
Fix
haiqi96 Jan 16, 2025
351f239
Apply suggestions from code review
haiqi96 Jan 16, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Use config parser (tested)
  • Loading branch information
haiqi96 committed Jan 15, 2025
commit 2ddd9c3544488b075d53b7d5ccb988c420790f44
41 changes: 27 additions & 14 deletions components/clp-py-utils/clp_py_utils/s3_utils.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import re
from pathlib import Path
from typing import List, Tuple
import configparser

import boto3
from botocore.config import Config
Expand All @@ -15,33 +16,38 @@
AWS_ENDPOINT = "amazonaws.com"


def parse_aws_credentials_file(credentials_file_path: Path) -> Tuple[str, str]:
def parse_aws_credentials_file(credentials_file_path: Path, user: str = "default") -> Tuple[str, str]:
"""
Parses the `aws_access_key_id` and `aws_secret_access_key` from the given credentials_file_path.
Parses the `aws_access_key_id` and `aws_secret_access_key` of 'user' from the given
haiqi96 marked this conversation as resolved.
Show resolved Hide resolved
credentials_file_path.
:param credentials_file_path:
:param user:
:return: A tuple of (aws_access_key_id, aws_secret_access_key)
:raise: ValueError if the file doesn't exist, or doesn't contain the aws credentials.
:raise: ValueError if the file doesn't exist, or doesn't contain valid aws credentials.
"""

aws_access_key_id = None
aws_secret_access_key = None

if not credentials_file_path.exists():
raise ValueError(f"'{credentials_file_path}' doesn't exist.")

with open(credentials_file_path, "r") as f:
for line in f:
line = line.strip()
if line.startswith("aws_access_key_id"):
aws_access_key_id = line.split("=", 1)[1].strip()
elif line.startswith("aws_secret_access_key"):
aws_secret_access_key = line.split("=", 1)[1].strip()
config_reader = configparser.ConfigParser()
config_reader.read(credentials_file_path)

if not config_reader.has_section(user):
raise ValueError(f"User '{user}' doesn't exist.")

user_credentials = config_reader[user]
if "aws_session_token" in user_credentials:
raise ValueError(f"Short-term credentials with session token is not supported.")
haiqi96 marked this conversation as resolved.
Show resolved Hide resolved

aws_access_key_id = user_credentials.get("aws_access_key_id")
aws_secret_access_key = user_credentials.get("aws_secret_access_key")

if aws_access_key_id is None or aws_secret_access_key is None:
raise ValueError(
"The credentials file must contain aws_access_key_id and aws_secret_access_key."
"The credentials file must contain both aws_access_key_id and aws_secret_access_key."
)


return aws_access_key_id, aws_secret_access_key


Expand Down Expand Up @@ -83,6 +89,13 @@ def parse_s3_url(s3_url: str) -> Tuple[str, str, str]:
def generate_s3_virtual_hosted_style_url(
region_code: str, bucket_name: str, object_key: str
) -> str:
if region_code is None or "" == region_code:
raise ValueError("Region code is not specified")
if bucket_name is None or "" == bucket_name:
raise ValueError("Bucket name is not specified")
if object_key is None or "" == object_key:
raise ValueError("Object key is not specified")

return f"https://{bucket_name}.s3.{region_code}.{AWS_ENDPOINT}/{object_key}"


Expand Down
Loading