-
Notifications
You must be signed in to change notification settings - Fork 82
clp-package: Add support for extracting files as IR streams through the CLI. #472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
def generate_container_start_cmd( | ||
container_name: str, container_mounts: List[CLPDockerMounts], container_image: str | ||
container_name: str, container_mounts: List[Optional[DockerMount]], container_image: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a type fix
from strenum import KebabCaseStrEnum | ||
|
||
# CONSTANTS | ||
DECOMPRESSION_COMMAND = "x" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was thinking about using a enum similar to JobType, but will need to manually assign a value instead of using auto.
A high level concern is that what would be good terms to distinguish, (1) a general job name launched by decompress.py script" (2) a job that decompress a file, and (3) a job that extracts an IR. I am currently using "x" as the decompression command following CLP and CLO, but internally still use decompression as job type, which is inconsistent. Should we call both job as decompression job, and internally distinguish them as "extraction" and "ir_extraction" command? |
components/clp-package-utils/clp_package_utils/scripts/decompress.py
Outdated
Show resolved
Hide resolved
components/clp-package-utils/clp_package_utils/scripts/decompress.py
Outdated
Show resolved
Hide resolved
components/clp-package-utils/clp_package_utils/scripts/native/decompress.py
Outdated
Show resolved
Hide resolved
components/clp-package-utils/clp_package_utils/scripts/native/decompress.py
Outdated
Show resolved
Hide resolved
How do we specify
which both yielded
|
Co-authored-by: Junhao Liao <junhao@junhao.ca>
Sorry, it looks like I forgot to do a type conversion. I included it in https://github.com/y-scope/clp/pull/472/files#diff-c3c708ca5b9cee2be7634fcb5966f6fc622b85ec0ad8575491540e676414cbbaL48. if you need this change urgently, I can also make a separate PR for it, |
components/clp-package-utils/clp_package_utils/scripts/decompress.py
Outdated
Show resolved
Hide resolved
… Make command handlers return consistently rather than throwing exceptions in some cases.
logger.error(f"job {job_id} finished with unexpected status: {job_status}") | ||
|
||
|
||
async def do_extract( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't really need this method, right? (Since it's purely wrapping another async method.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I tried to remove it but then it seems that I have to turn the entire call trees to async, including main. Maybe my knowledge about async is not enough to make a neat change quickly.
Given that this code is copied from search.py, can we turn this refactor into a separate PR and apply to both decompress.py and search.py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't work (in handle_extract_ir_cmd
)?
return asyncio.run(
run_function_in_process(
submit_and_monitor_ir_extraction_job_in_db,
clp_config.database,
orig_file_id,
parsed_args.msg_ix,
parsed_args.target_uncompressed_size,
)
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right,, this one just works. don't remeber what I wrote but not as neat as this one.
components/clp-package-utils/clp_package_utils/scripts/decompress.py
Outdated
Show resolved
Hide resolved
from strenum import KebabCaseStrEnum | ||
|
||
# CONSTANTS | ||
DECOMPRESSION_COMMAND = "x" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about:
EXTRACT_FILE_CMD
andEXTRACT_IR_CMD
handle_extract_file_cmd
andhandle_extract_ir_cmd
components/clp-package-utils/clp_package_utils/scripts/native/decompress.py
Show resolved
Hide resolved
|
||
group = ir_extraction_parser.add_mutually_exclusive_group(required=True) | ||
group.add_argument("--orig-file-id", type=str, help="Original file's ID.") | ||
group.add_argument("--path", type=str, help="Original file's path.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about --orig-file-path? Makes it more consistent with the other arg.
…ess.py Co-authored-by: Junhao Liao <junhao@junhao.ca>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the PR title, how about:
clp-package: Add support for extracting files as IR streams through the CLI.
Description
Let decompress cmd script in package handle both decompression and ir extraction
Validation performed
Launched the package, verified that compression, decompression, ir extraction and search still work from command line