-
Notifications
You must be signed in to change notification settings - Fork 81
feat(integration-tests): Add loseless round-trip compression tests for xz
, zstd
, lz4
log archives.
#1311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughAdds dependency env vars and wiring for integration tests, introduces DepsConfig, updates taskfiles to expose install prefixes, reworks log-archive fixture to generate multiple compressed formats from a downloaded tar.gz, and updates tests/configs to use a new compression_input field and validate absolute paths. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant PyTest as Pytest Runner
participant FixCfg as Fixture: integration_test_config
participant Deps as DepsConfig
participant LogFix as Fixture: integration_test_logs
participant Tools as External Tools (curl/gzip/lz4/xz/zstd)
participant FS as Filesystem
participant Tests as Tests (identity)
PyTest->>FixCfg: Build IntegrationTestConfig()
FixCfg->>Deps: Construct from env (CLP_* roots)
FixCfg-->>PyTest: IntegrationTestConfig (includes deps_config)
PyTest->>LogFix: Prepare archives
LogFix->>Tools: curl → download .tar.gz
Tools-->>FS: Write tar_gz_path
LogFix->>Tools: gzip -d <tar.gz> → base .tar
Tools-->>FS: Write base_tar_path
LogFix->>Tools: lz4/xz/zstd compress base .tar
Tools-->>FS: Write .tar.lz4/.tar.xz/.tar.zstd
Note over LogFix,Tools: Binary paths resolved via DepsConfig
PyTest->>Tests: Execute identity tests across inputs
Tests->>FS: Read sources (dir/.tar.*)
Tests-->>PyTest: Assertions pass/fail
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Ruff (0.12.2)integration-tests/tests/fixtures/integration_test_config.py�[1;31mruff failed�[0m integration-tests/tests/fixtures/integration_test_logs.py�[1;31mruff failed�[0m integration-tests/tests/test_identity_transformation.py�[1;31mruff failed�[0m
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
fe5a585
to
70a7529
Compare
xz
, zstd
, lz4
log archives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
integration-tests/tests/utils/utils.py (1)
54-76
: Make unlink truly handle both files and directories (doc/impl mismatch).The function promises to remove “file or directory” but uses rmtree only; it will raise on regular files. Handle files/symlinks explicitly.
Apply this diff:
def unlink(rm_path: Path, force: bool = True) -> None: @@ - try: - shutil.rmtree(rm_path) + # Files/symlinks + if rm_path.is_file() or rm_path.is_symlink(): + try: + rm_path.unlink() + except FileNotFoundError: + return + except PermissionError: + if not force: + raise + try: + subprocess.run(["sudo", "rm", "-f", str(rm_path)], check=True) + except subprocess.CalledProcessError as e: + raise OSError(f"Failed to remove {rm_path} due to lack of sudo.") from e + return + # Directories + try: + shutil.rmtree(rm_path)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (8)
integration-tests/.pytest.ini
(1 hunks)integration-tests/tests/fixtures/integration_test_config.py
(2 hunks)integration-tests/tests/fixtures/integration_test_logs.py
(3 hunks)integration-tests/tests/test_identity_transformation.py
(5 hunks)integration-tests/tests/utils/config.py
(7 hunks)integration-tests/tests/utils/utils.py
(1 hunks)taskfiles/deps/main.yaml
(4 hunks)taskfiles/tests/integration.yaml
(1 hunks)
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
📚 Learning: 2025-07-23T09:54:45.185Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Applied to files:
taskfiles/deps/main.yaml
taskfiles/tests/integration.yaml
📚 Learning: 2025-08-09T04:07:27.083Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.
Applied to files:
taskfiles/deps/main.yaml
taskfiles/tests/integration.yaml
📚 Learning: 2025-08-13T19:58:26.058Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1184
File: taskfiles/deps/main.yaml:0-0
Timestamp: 2025-08-13T19:58:26.058Z
Learning: In the CLP project, custom CMake find modules (like FindLibLZMA.cmake) are designed to flexibly link against either shared or static libraries based on availability, rather than requiring both variants to be built. The find modules use flags like LibLZMA_USE_STATIC_LIBS and temporarily modify CMAKE_FIND_LIBRARY_SUFFIXES to prefer the desired library type while gracefully falling back if needed.
Applied to files:
taskfiles/deps/main.yaml
📚 Learning: 2025-08-06T07:32:28.462Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1156
File: taskfiles/deps/main.yaml:70-71
Timestamp: 2025-08-06T07:32:28.462Z
Learning: In LZ4's CMake build system (build/cmake/CMakeLists.txt), the option to build static libraries is `BUILD_STATIC_LIBS=ON`, not `LZ4_BUILD_STATIC=ON`. The correct options for building both shared and static LZ4 libraries are `-DBUILD_SHARED_LIBS=ON` and `-DBUILD_STATIC_LIBS=ON`.
Applied to files:
taskfiles/deps/main.yaml
📚 Learning: 2025-08-06T08:10:26.381Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1156
File: components/core/CMakeLists.txt:195-202
Timestamp: 2025-08-06T08:10:26.381Z
Learning: According to Bill-hbrhbr, in LZ4's CMake configuration, the correct package name for find_package is lowercase "lz4", and the exported targets use uppercase namespace "LZ4::" with lowercase prefixes (LZ4::lz4_static, LZ4::lz4_shared), not lowercase namespace "lz4::" as initially assumed.
Applied to files:
taskfiles/deps/main.yaml
📚 Learning: 2025-08-17T16:10:38.722Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
Applied to files:
integration-tests/.pytest.ini
📚 Learning: 2025-08-16T10:24:29.316Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/pyproject.toml:11-13
Timestamp: 2025-08-16T10:24:29.316Z
Learning: In the CLP project's integration-tests directory, the integration tests package is not intended to be shipped or distributed - only the tests are leveraged for local testing purposes, so console script entry points should not be included in pyproject.toml.
Applied to files:
integration-tests/.pytest.ini
📚 Learning: 2024-11-19T17:30:04.970Z
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.
Applied to files:
integration-tests/tests/test_identity_transformation.py
📚 Learning: 2025-08-20T22:07:04.953Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/tests/utils/config.py:121-123
Timestamp: 2025-08-20T22:07:04.953Z
Learning: For the CLP integration tests codebase, do not suggest deriving tarball filenames from URLs instead of hard-coding ".tar.gz" extension. The user has explicitly rejected this suggestion.
Applied to files:
integration-tests/tests/fixtures/integration_test_logs.py
🧬 Code graph analysis (4)
integration-tests/tests/fixtures/integration_test_config.py (2)
integration-tests/tests/utils/config.py (3)
DepsConfig
(51-83)PackageConfig
(87-106)IntegrationTestConfig
(110-133)integration-tests/tests/utils/utils.py (1)
get_env_var
(11-21)
integration-tests/tests/test_identity_transformation.py (4)
integration-tests/tests/utils/config.py (9)
IntegrationTestLogs
(137-184)tar_gz_path
(167-169)tar_lz4_path
(172-174)tar_xz_path
(177-179)tar_zstd_path
(182-184)IntegrationTestConfig
(110-133)CompressionTestConfig
(188-219)clear_test_outputs
(216-219)clp_binary_path
(40-42)integration-tests/tests/fixtures/integration_test_config.py (1)
integration_test_config
(17-37)integration-tests/tests/utils/asserting_utils.py (1)
run_and_assert
(9-22)integration-tests/tests/utils/utils.py (1)
is_dir_tree_content_equal
(24-38)
integration-tests/tests/utils/config.py (2)
integration-tests/tests/utils/utils.py (1)
validate_dir_exists_and_is_absolute
(78-88)integration-tests/tests/fixtures/integration_test_config.py (1)
integration_test_config
(17-37)
integration-tests/tests/fixtures/integration_test_logs.py (2)
integration-tests/tests/utils/config.py (8)
tar_gz_path
(167-169)base_tar_path
(162-164)lz4_binary_path
(71-73)tar_lz4_path
(172-174)xz_binary_path
(76-78)tar_xz_path
(177-179)zstd_binary_path
(81-83)tar_zstd_path
(182-184)integration-tests/tests/fixtures/integration_test_config.py (1)
integration_test_config
(17-37)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: package-image
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: build-macos (macos-15, true)
🔇 Additional comments (21)
integration-tests/tests/utils/utils.py (1)
78-89
: Absolute-path validator is good and precise.Requiring absolute dirs early will prevent subtle cwd‑dependent failures.
taskfiles/deps/main.yaml (5)
24-27
: Good: centralised lib name constants.This reduces string drift across tasks.
28-33
: Install-prefix vars are clear and consistent.Naming aligns with downstream env mapping.
307-315
: Dynamic LibLZMA INSTALL_PREFIX selection is correct.Matches BUILD_SHARED_LIBS, and the LIB_NAME suffix is informative.
Ensure at least one of the LibLZMA installs includes the xz CLI as expected.
349-351
: LZ4 task uses correct options and prefixes.BUILD_STATIC_LIBS/BUILD_SHARED_LIBS are the right flags; prefix/name wiring looks good.
576-578
: Zstd task wiring looks good.INSTALL_PREFIX and LIB_NAME match the new scheme.
integration-tests/tests/fixtures/integration_test_config.py (1)
22-27
: DepsConfig construction is correct; absolute resolution avoids cwd issues.Env→Path→resolve is the right order. With the pytest.ini fix, this should be robust for direct runs too.
If running tests outside Task, confirm CLP_LIBLZMA_ROOT points to the “-static-install” (or “-shared-install”) directory that actually exists.
integration-tests/tests/fixtures/integration_test_logs.py (3)
72-75
: Using tar_gz_path as the curl output target is consistent with the new API.
81-82
: Unpack from tar_gz_path into extraction_dir looks good.Works with shutil.unpack_archive for “.tar.gz”.
92-92
: chmod path passed as str fixes common subprocess arg issues.integration-tests/tests/test_identity_transformation.py (6)
54-67
: Testing both directory and multiple tar. inputs is a strong coverage bump.*The list covers the intended formats; not testing the raw .tar is reasonable.
93-98
: Double‑check remove‑path‑prefix semantics for file inputs.For tar.* inputs, you remove the parent dir. Confirm
clp c --remove-path-prefix
applies as intended when the source is a single archive file (not a dir). If behaviour differs by format, conditionally set the prefix.
100-107
: Command assembly looks correct and stable.Arg ordering for clp “c” matches prior tests; using explicit strings avoids shell quoting issues.
114-119
: Good: compare decompressed output to the canonical uncompressed source.This asserts true losslessness across all input formats.
142-146
: CLP‑S test updated to use compression_input surface; consistent with refactor.
176-180
: CLP‑S helper now sources from compression_input; aligns with the new API.integration-tests/tests/utils/config.py (3)
27-28
: Absolute path validation for core bins dir is appropriate.
155-160
: Validation of logs_download_dir before computing paths is good.Prevents surprises if the parent dir ever goes missing.
207-214
: Validating test_root_dir and deriving per‑test dirs is clean and predictable.taskfiles/tests/integration.yaml (1)
26-30
: Verify compression binaries under configured rootsEnv wiring looks correct; sandbox verification failed because printenv is unavailable and CLP_LIBLZMA_ROOT, CLP_LZ4_ROOT and CLP_ZSTD_ROOT were unset — run the verification script locally or set those env vars and confirm bin/xz, bin/lz4 and bin/zstd exist under the configured roots.
integration-tests/.pytest.ini (1)
12-16
: Fix default CLP_LIBLZMA_ROOT in integration-tests/.pytest.iniintegration-tests/.pytest.ini sets D:CLP_LIBLZMA_ROOT=../build/deps/core/LibLZMA-install (line 13); Taskfiles use split installs (LibLZMA-shared-install / LibLZMA-static-install) — update CLP_LIBLZMA_ROOT to the split-install name produced by your Taskfiles (for example ../build/deps/core/LibLZMA-static-install) or make the path configurable. Repository search returned no matches for the split-install names; verify which name Taskfiles create before changing.
# Create base tar stream object to be compressed into different formats | ||
gzip_bin = shutil.which("gzip") | ||
if gzip_bin is None: | ||
err_msg = "gzip executable not found" | ||
raise RuntimeError(err_msg) | ||
gzip_cmds = [gzip_bin, "--decompress", "--stdout", str(integration_test_logs.tar_gz_path)] | ||
with integration_test_logs.base_tar_path.open(mode="wb") as fout: | ||
subprocess.run(gzip_cmds, check=True, stdout=fout, stdin=subprocess.DEVNULL) | ||
|
||
# Create lz4 tar | ||
lz4_bin = str(integration_test_config.deps_config.lz4_binary_path) | ||
lz4_cmds = [ | ||
lz4_bin, | ||
str(integration_test_logs.base_tar_path), | ||
str(integration_test_logs.tar_lz4_path), | ||
] | ||
subprocess.run(lz4_cmds, check=True) | ||
|
||
# Create xz tar | ||
xz_bin = str(integration_test_config.deps_config.xz_binary_path) | ||
xz_cmds = [xz_bin, "--compress", "--stdout", str(integration_test_logs.base_tar_path)] | ||
with integration_test_logs.tar_xz_path.open(mode="wb") as fout: | ||
subprocess.run(xz_cmds, check=True, stdout=fout, stdin=subprocess.DEVNULL) | ||
|
||
# Create zstd tar | ||
zstd_bin = str(integration_test_config.deps_config.zstd_binary_path) | ||
zstd_cmds = [zstd_bin, "--stdout", str(integration_test_logs.base_tar_path)] | ||
with integration_test_logs.tar_zstd_path.open(mode="wb") as fout: | ||
subprocess.run(zstd_cmds, check=True, stdout=fout, stdin=subprocess.DEVNULL) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick
Archive fan‑out pipeline is sound; consider quiet flags and a clearer log.
The gzip→lz4/xz/zstd steps are correct and stream‑friendly. Minor polish:
- Add quiet flags (-q where available) to reduce test noise.
- Update the final log message to reflect that compressed archives were generated, not just “extracted”.
Apply this diff for the log:
- logger.info("Downloaded and extracted uncompressed logs for dataset `%s`.", name)
+ logger.info(
+ "Downloaded logs for `%s`, extracted, and generated .tar.gz/.tar.lz4/.tar.xz/.tar.zstd.",
+ name,
+ )
Optional: if you want reproducible downloads, we can wire in SHA256 checks for the datasets in a follow‑up.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In integration-tests/tests/fixtures/integration_test_logs.py around lines 94 to
123, the subprocess commands for creating compressed archives should run quieter
and the final log message should correctly state that compressed archives were
generated (not “extracted”); update the gzip/lz4/xz/zstd command argument lists
to include the appropriate quiet flags (e.g., gzip: --quiet or -q if supported,
lz4: -q, xz: -q, zstd: -q) and adjust the final log call text to say "compressed
archives generated" or similar to accurately reflect the operation.
@dataclass(frozen=True) | ||
class DepsConfig: | ||
"""The configuration for dependencies used by CLP package and binaries.""" | ||
|
||
#: Install directory for all core CLP dependencies. | ||
clp_deps_core_dir: Path | ||
#: Install prefix of LibLZMA used by CLP. | ||
clp_liblzma_root: Path | ||
#: Install prefix of lz4 used by CLP. | ||
clp_lz4_root: Path | ||
#: Install prefix of zstd used by CLP. | ||
clp_zstd_root: Path | ||
|
||
def __post_init__(self) -> None: | ||
"""Validates that dependency directories exist.""" | ||
validate_dir_exists_and_is_absolute(self.clp_deps_core_dir) | ||
validate_dir_exists_and_is_absolute(self.clp_liblzma_root) | ||
validate_dir_exists_and_is_absolute(self.clp_lz4_root) | ||
validate_dir_exists_and_is_absolute(self.clp_zstd_root) | ||
|
||
@property | ||
def lz4_binary_path(self) -> Path: | ||
""":return: The absolute path to the lz4 compression tool.""" | ||
return self.clp_lz4_root / "bin" / "lz4" | ||
|
||
@property | ||
def xz_binary_path(self) -> Path: | ||
""":return: The absolute path to the LibLZMA xz compression tool.""" | ||
return self.clp_liblzma_root / "bin" / "xz" | ||
|
||
@property | ||
def zstd_binary_path(self) -> Path: | ||
""":return: The absolute path to the zstd compression tool.""" | ||
return self.clp_zstd_root / "bin" / "zstd" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick
Consider validating tool binaries exist in DepsConfig.
Early failure with a clear message helps when a tool wasn’t built/installed.
Apply this diff:
def __post_init__(self) -> None:
"""Validates that dependency directories exist."""
validate_dir_exists_and_is_absolute(self.clp_deps_core_dir)
validate_dir_exists_and_is_absolute(self.clp_liblzma_root)
validate_dir_exists_and_is_absolute(self.clp_lz4_root)
validate_dir_exists_and_is_absolute(self.clp_zstd_root)
+ # Ensure compression tools are present
+ for tool in (self.lz4_binary_path, self.xz_binary_path, self.zstd_binary_path):
+ if not tool.is_file():
+ raise ValueError(f"Compression tool not found: {tool}")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
@dataclass(frozen=True) | |
class DepsConfig: | |
"""The configuration for dependencies used by CLP package and binaries.""" | |
#: Install directory for all core CLP dependencies. | |
clp_deps_core_dir: Path | |
#: Install prefix of LibLZMA used by CLP. | |
clp_liblzma_root: Path | |
#: Install prefix of lz4 used by CLP. | |
clp_lz4_root: Path | |
#: Install prefix of zstd used by CLP. | |
clp_zstd_root: Path | |
def __post_init__(self) -> None: | |
"""Validates that dependency directories exist.""" | |
validate_dir_exists_and_is_absolute(self.clp_deps_core_dir) | |
validate_dir_exists_and_is_absolute(self.clp_liblzma_root) | |
validate_dir_exists_and_is_absolute(self.clp_lz4_root) | |
validate_dir_exists_and_is_absolute(self.clp_zstd_root) | |
@property | |
def lz4_binary_path(self) -> Path: | |
""":return: The absolute path to the lz4 compression tool.""" | |
return self.clp_lz4_root / "bin" / "lz4" | |
@property | |
def xz_binary_path(self) -> Path: | |
""":return: The absolute path to the LibLZMA xz compression tool.""" | |
return self.clp_liblzma_root / "bin" / "xz" | |
@property | |
def zstd_binary_path(self) -> Path: | |
""":return: The absolute path to the zstd compression tool.""" | |
return self.clp_zstd_root / "bin" / "zstd" | |
@dataclass(frozen=True) | |
class DepsConfig: | |
"""The configuration for dependencies used by CLP package and binaries.""" | |
#: Install directory for all core CLP dependencies. | |
clp_deps_core_dir: Path | |
#: Install prefix of LibLZMA used by CLP. | |
clp_liblzma_root: Path | |
#: Install prefix of lz4 used by CLP. | |
clp_lz4_root: Path | |
#: Install prefix of zstd used by CLP. | |
clp_zstd_root: Path | |
def __post_init__(self) -> None: | |
"""Validates that dependency directories exist.""" | |
validate_dir_exists_and_is_absolute(self.clp_deps_core_dir) | |
validate_dir_exists_and_is_absolute(self.clp_liblzma_root) | |
validate_dir_exists_and_is_absolute(self.clp_lz4_root) | |
validate_dir_exists_and_is_absolute(self.clp_zstd_root) | |
# Ensure compression tools are present | |
for tool in (self.lz4_binary_path, self.xz_binary_path, self.zstd_binary_path): | |
if not tool.is_file(): | |
raise ValueError(f"Compression tool not found: {tool}") | |
@property | |
def lz4_binary_path(self) -> Path: | |
""":return: The absolute path to the lz4 compression tool.""" | |
return self.clp_lz4_root / "bin" / "lz4" | |
@property | |
def xz_binary_path(self) -> Path: | |
""":return: The absolute path to the LibLZMA xz compression tool.""" | |
return self.clp_liblzma_root / "bin" / "xz" | |
@property | |
def zstd_binary_path(self) -> Path: | |
""":return: The absolute path to the zstd compression tool.""" | |
return self.clp_zstd_root / "bin" / "zstd" |
🤖 Prompt for AI Agents
In integration-tests/tests/utils/config.py around lines 50 to 84, DepsConfig
currently only validates the dependency root directories but does not verify
that the actual tool binaries exist or are executable; update __post_init__ to
also validate lz4_binary_path, xz_binary_path, and zstd_binary_path (either by
calling a new helper validate_file_exists_and_is_executable(path, name) or an
existing equivalent) so that each binary path is checked for existence and
executability at object construction, and compute the property paths before
calling the validation so failures produce a clear, early error message naming
the missing/non-executable binary.
#: Path to the CLP compressionm input archive or directory. | ||
compression_input: Path | ||
integration_test_config: InitVar[IntegrationTestConfig] | ||
#: Path to store compressed archives generated by the test. | ||
#: Directory to store generated compressed CLP archives. | ||
compression_dir: Path = field(init=False, repr=True) | ||
#: Path to store decompressed logs generated by the test. | ||
#: Directory to store logs decompressed from CLP archives. | ||
decompression_dir: Path = field(init=False, repr=True) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick
Fix minor typo in docstring.
“compressionm” → “compression”.
Apply this diff:
- #: Path to the CLP compressionm input archive or directory.
+ #: Path to the CLP compression input archive or directory.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
#: Path to the CLP compressionm input archive or directory. | |
compression_input: Path | |
integration_test_config: InitVar[IntegrationTestConfig] | |
#: Path to store compressed archives generated by the test. | |
#: Directory to store generated compressed CLP archives. | |
compression_dir: Path = field(init=False, repr=True) | |
#: Path to store decompressed logs generated by the test. | |
#: Directory to store logs decompressed from CLP archives. | |
decompression_dir: Path = field(init=False, repr=True) | |
#: Path to the CLP compression input archive or directory. | |
compression_input: Path | |
integration_test_config: InitVar[IntegrationTestConfig] | |
#: Directory to store generated compressed CLP archives. | |
compression_dir: Path = field(init=False, repr=True) | |
#: Directory to store logs decompressed from CLP archives. | |
decompression_dir: Path = field(init=False, repr=True) |
🤖 Prompt for AI Agents
In integration-tests/tests/utils/config.py around lines 193 to 200, fix the
minor typo in the docstring for the compression_input field: change
"compressionm" to "compression" so the comment reads "Path to the CLP
compression input archive or directory." and preserve surrounding formatting and
punctuation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (8)
integration-tests/.pytest.ini
(1 hunks)integration-tests/tests/fixtures/integration_test_config.py
(2 hunks)integration-tests/tests/fixtures/integration_test_logs.py
(3 hunks)integration-tests/tests/test_identity_transformation.py
(5 hunks)integration-tests/tests/utils/config.py
(7 hunks)integration-tests/tests/utils/utils.py
(1 hunks)taskfiles/deps/main.yaml
(4 hunks)taskfiles/tests/integration.yaml
(1 hunks)
🧰 Additional context used
🧠 Learnings (9)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
📚 Learning: 2025-08-09T04:07:27.083Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.
Applied to files:
taskfiles/tests/integration.yaml
integration-tests/.pytest.ini
taskfiles/deps/main.yaml
📚 Learning: 2025-08-17T16:10:38.722Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
Applied to files:
integration-tests/tests/fixtures/integration_test_config.py
integration-tests/.pytest.ini
📚 Learning: 2025-08-16T10:24:29.316Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/pyproject.toml:11-13
Timestamp: 2025-08-16T10:24:29.316Z
Learning: In the CLP project's integration-tests directory, the integration tests package is not intended to be shipped or distributed - only the tests are leveraged for local testing purposes, so console script entry points should not be included in pyproject.toml.
Applied to files:
integration-tests/.pytest.ini
📚 Learning: 2025-07-23T09:54:45.185Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Applied to files:
taskfiles/deps/main.yaml
📚 Learning: 2025-08-13T19:58:26.058Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1184
File: taskfiles/deps/main.yaml:0-0
Timestamp: 2025-08-13T19:58:26.058Z
Learning: In the CLP project, custom CMake find modules (like FindLibLZMA.cmake) are designed to flexibly link against either shared or static libraries based on availability, rather than requiring both variants to be built. The find modules use flags like LibLZMA_USE_STATIC_LIBS and temporarily modify CMAKE_FIND_LIBRARY_SUFFIXES to prefer the desired library type while gracefully falling back if needed.
Applied to files:
taskfiles/deps/main.yaml
📚 Learning: 2025-08-06T07:32:28.462Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1156
File: taskfiles/deps/main.yaml:70-71
Timestamp: 2025-08-06T07:32:28.462Z
Learning: In LZ4's CMake build system (build/cmake/CMakeLists.txt), the option to build static libraries is `BUILD_STATIC_LIBS=ON`, not `LZ4_BUILD_STATIC=ON`. The correct options for building both shared and static LZ4 libraries are `-DBUILD_SHARED_LIBS=ON` and `-DBUILD_STATIC_LIBS=ON`.
Applied to files:
taskfiles/deps/main.yaml
📚 Learning: 2025-08-06T08:10:26.381Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1156
File: components/core/CMakeLists.txt:195-202
Timestamp: 2025-08-06T08:10:26.381Z
Learning: According to Bill-hbrhbr, in LZ4's CMake configuration, the correct package name for find_package is lowercase "lz4", and the exported targets use uppercase namespace "LZ4::" with lowercase prefixes (LZ4::lz4_static, LZ4::lz4_shared), not lowercase namespace "lz4::" as initially assumed.
Applied to files:
taskfiles/deps/main.yaml
📚 Learning: 2024-11-19T17:30:04.970Z
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.
Applied to files:
integration-tests/tests/test_identity_transformation.py
🧬 Code graph analysis (4)
integration-tests/tests/fixtures/integration_test_config.py (2)
integration-tests/tests/utils/config.py (3)
DepsConfig
(51-83)PackageConfig
(87-106)IntegrationTestConfig
(110-133)integration-tests/tests/utils/utils.py (1)
get_env_var
(11-21)
integration-tests/tests/fixtures/integration_test_logs.py (2)
integration-tests/tests/utils/config.py (8)
tar_gz_path
(167-169)base_tar_path
(162-164)lz4_binary_path
(71-73)tar_lz4_path
(172-174)xz_binary_path
(76-78)tar_xz_path
(177-179)zstd_binary_path
(81-83)tar_zstd_path
(182-184)integration-tests/tests/fixtures/integration_test_config.py (1)
integration_test_config
(17-37)
integration-tests/tests/test_identity_transformation.py (4)
integration-tests/tests/utils/config.py (9)
IntegrationTestLogs
(137-184)tar_gz_path
(167-169)tar_lz4_path
(172-174)tar_xz_path
(177-179)tar_zstd_path
(182-184)IntegrationTestConfig
(110-133)CompressionTestConfig
(188-219)clear_test_outputs
(216-219)clp_binary_path
(40-42)integration-tests/tests/fixtures/integration_test_config.py (1)
integration_test_config
(17-37)integration-tests/tests/utils/asserting_utils.py (1)
run_and_assert
(9-22)integration-tests/tests/utils/utils.py (1)
is_dir_tree_content_equal
(24-38)
integration-tests/tests/utils/config.py (1)
integration-tests/tests/fixtures/integration_test_config.py (1)
integration_test_config
(17-37)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: package-image
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: build-macos (macos-15, true)
🔇 Additional comments (15)
integration-tests/.pytest.ini (1)
12-16
: LGTM!The new environment variable definitions for dependency paths are well-structured and follow the existing naming convention. They properly expose the core dependency directory and individual library install prefixes needed for the compression tests.
taskfiles/tests/integration.yaml (1)
26-30
: LGTM!The environment variable mappings correctly wire the global dependency path variables to the integration test environment, enabling tests to locate the compression binaries.
integration-tests/tests/utils/utils.py (1)
78-88
: Good addition of absolute path validation.The renamed function and added absolute path check enhance robustness by ensuring all configuration paths are absolute, preventing potential path resolution issues during test execution.
integration-tests/tests/utils/config.py (3)
50-84
: Well-designedDepsConfig
class.The new
DepsConfig
dataclass properly encapsulates dependency paths with appropriate validation and provides clean property accessors for the compression tool binaries.
161-185
: Clever use of computed properties for tar archive paths.The tar path properties elegantly derive archive locations from the extraction directory, avoiding redundant path storage and ensuring consistency.
193-194
: Clear naming improvement.Renaming
logs_source_dir
tocompression_input
better reflects that the field can accept either archives or directories, improving API clarity.taskfiles/deps/main.yaml (5)
24-32
: Well-structured dependency configuration variables.The new library name variables and install prefix paths provide a clean, centralized configuration for dependency management, making the build system more maintainable.
307-316
: Good use of templating for install prefixes.The dynamic install prefix selection based on
BUILD_SHARED_LIBS
properly separates shared and static library installations, preventing conflicts.
271-271
: Keep xz enabled — integration tests use it.integration-tests/tests/utils/config.py:76 defines xz_binary_path and integration-tests/tests/fixtures/integration_test_logs.py:113 consumes xz_bin; enabling the xz tool is required for those tests.
576-577
: Keep zstd programs enabled — integration tests use the zstd CLI.
Integration tests reference zstd_binary_path / zstd_bin (integration-tests/tests/utils/config.py:81, integration-tests/tests/fixtures/integration_test_logs.py:119). Do not disable building the zstd programs.
349-350
: Keep lz4 CLI enabled — integration tests use the CLI.
integration-tests reference lz4_binary_path (integration-tests/tests/utils/config.py:71) and use it in fixtures (integration-tests/tests/fixtures/integration_test_logs.py:104); building the lz4 CLI is required for those compression tests.integration-tests/tests/fixtures/integration_test_config.py (1)
22-27
: LGTM! DepsConfig initialization properly handles environment variables.The implementation correctly retrieves and resolves dependency paths from environment variables, ensuring absolute paths are used throughout the configuration.
integration-tests/tests/fixtures/integration_test_logs.py (2)
72-72
: Good change: Using tar.gz path instead of generic tarball path.The rename from
tarball_path
totar_gz_path
provides better clarity about the specific compression format being handled.
103-122
: Consider consistent error handling across compression formats.While the subprocess calls use
check=True
which will raise on failure, adding consistent error handling across all compression operations would improve debugging when failures occur.Consider wrapping each compression operation in try-except blocks similar to the suggestion above, to provide format-specific error messages when compression fails.
⛔ Skipped due to learnings
Learnt from: haiqi96 PR: y-scope/clp#651 File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0 Timestamp: 2025-01-16T16:58:43.190Z Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
integration-tests/tests/test_identity_transformation.py (1)
56-66
: Good test coverage for multiple compression formats!The implementation properly tests round-trip compression for all supported archive formats (uncompressed directory, tar.gz, tar.lz4, tar.xz, tar.zstd), which aligns well with the PR's objective to verify lossless compression across formats.
@@ -89,7 +89,37 @@ def _download_and_extract_dataset( | |||
if chmod_bin is None: | |||
err_msg = "chmod executable not found" | |||
raise RuntimeError(err_msg) | |||
subprocess.run([chmod_bin, "-R", "gu+w", integration_test_logs.extraction_dir], check=True) | |||
subprocess.run([chmod_bin, "-R", "gu+w", str(integration_test_logs.extraction_dir)], check=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick
Unnecessary string conversion for Path argument.
The subprocess.run
function accepts Path objects directly, so converting to string is redundant here.
Apply this diff to simplify:
- subprocess.run([chmod_bin, "-R", "gu+w", str(integration_test_logs.extraction_dir)], check=True)
+ subprocess.run([chmod_bin, "-R", "gu+w", integration_test_logs.extraction_dir], check=True)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
subprocess.run([chmod_bin, "-R", "gu+w", str(integration_test_logs.extraction_dir)], check=True) | |
subprocess.run([chmod_bin, "-R", "gu+w", integration_test_logs.extraction_dir], check=True) |
🤖 Prompt for AI Agents
In integration-tests/tests/fixtures/integration_test_logs.py around line 92, the
call to subprocess.run converts a Path to a string unnecessarily; change the
argument list to pass the Path object directly (remove str(...) around
integration_test_logs.extraction_dir) so subprocess.run receives the Path
instance, keeping check=True and preserving the rest of the call.
# Create base tar stream object to be compressed into different formats | ||
gzip_bin = shutil.which("gzip") | ||
if gzip_bin is None: | ||
err_msg = "gzip executable not found" | ||
raise RuntimeError(err_msg) | ||
gzip_cmds = [gzip_bin, "--decompress", "--stdout", str(integration_test_logs.tar_gz_path)] | ||
with integration_test_logs.base_tar_path.open(mode="wb") as fout: | ||
subprocess.run(gzip_cmds, check=True, stdout=fout, stdin=subprocess.DEVNULL) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick
Consider error handling for base tar creation.
The base tar file creation could fail if disk space is insufficient or write permissions are missing. While the current implementation will raise an exception, a more specific error message would aid debugging.
Consider wrapping in a try-except to provide clearer error context:
# Create base tar stream object to be compressed into different formats
gzip_bin = shutil.which("gzip")
if gzip_bin is None:
err_msg = "gzip executable not found"
raise RuntimeError(err_msg)
gzip_cmds = [gzip_bin, "--decompress", "--stdout", str(integration_test_logs.tar_gz_path)]
- with integration_test_logs.base_tar_path.open(mode="wb") as fout:
- subprocess.run(gzip_cmds, check=True, stdout=fout, stdin=subprocess.DEVNULL)
+ try:
+ with integration_test_logs.base_tar_path.open(mode="wb") as fout:
+ subprocess.run(gzip_cmds, check=True, stdout=fout, stdin=subprocess.DEVNULL)
+ except (OSError, subprocess.CalledProcessError) as e:
+ err_msg = f"Failed to create base tar file at {integration_test_logs.base_tar_path}"
+ raise RuntimeError(err_msg) from e
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# Create base tar stream object to be compressed into different formats | |
gzip_bin = shutil.which("gzip") | |
if gzip_bin is None: | |
err_msg = "gzip executable not found" | |
raise RuntimeError(err_msg) | |
gzip_cmds = [gzip_bin, "--decompress", "--stdout", str(integration_test_logs.tar_gz_path)] | |
with integration_test_logs.base_tar_path.open(mode="wb") as fout: | |
subprocess.run(gzip_cmds, check=True, stdout=fout, stdin=subprocess.DEVNULL) | |
# Create base tar stream object to be compressed into different formats | |
gzip_bin = shutil.which("gzip") | |
if gzip_bin is None: | |
err_msg = "gzip executable not found" | |
raise RuntimeError(err_msg) | |
gzip_cmds = [gzip_bin, "--decompress", "--stdout", str(integration_test_logs.tar_gz_path)] | |
try: | |
with integration_test_logs.base_tar_path.open(mode="wb") as fout: | |
subprocess.run(gzip_cmds, check=True, stdout=fout, stdin=subprocess.DEVNULL) | |
except (OSError, subprocess.CalledProcessError) as e: | |
err_msg = f"Failed to create base tar file at {integration_test_logs.base_tar_path}" | |
raise RuntimeError(err_msg) from e |
🤖 Prompt for AI Agents
In integration-tests/tests/fixtures/integration_test_logs.py around lines 94 to
102, the block that locates gzip and writes the decompressed tar to
base_tar_path can fail for reasons like missing disk space or permission errors
but currently surfaces only generic exceptions; wrap the file open and
subprocess.run call in a try-except that catches OSError, IOError and
subprocess.CalledProcessError and re-raises a RuntimeError with a clear
contextual message (including the original exception text) so callers get a
specific, actionable error; keep the existing gzip executable existence check
as-is and ensure the exception preserves the original traceback (e.g., using
"from e") when re-raising.
lz4_bin = str(integration_test_config.deps_config.lz4_binary_path) | ||
lz4_cmds = [ | ||
lz4_bin, | ||
str(integration_test_logs.base_tar_path), | ||
str(integration_test_logs.tar_lz4_path), | ||
] | ||
subprocess.run(lz4_cmds, check=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify lz4 binary path existence before use.
Unlike the system binaries (gzip, curl, chmod), the dependency binaries from deps_config
aren't validated for existence before use. This could lead to less clear error messages if the paths are incorrect.
🏁 Script executed:
#!/bin/bash
# Verify that the binary path properties in DepsConfig return valid paths
# Check if the binary path properties exist and point to actual files
ast-grep --pattern 'def $_binary_path(self) -> Path:
$$$'
# Look for any validation of these binary paths before use
rg -n -C3 '(lz4|xz|zstd)_binary_path' --type=py
Length of output: 3540
Verify lz4 binary path exists before use.
integration-tests/tests/utils/config.py returns clp_lz4_root / "bin" / "lz4" but there’s no check that the file exists; integration-tests/tests/fixtures/integration_test_logs.py invokes it directly (lines 104–110). Add an explicit Path.exists()/is_file() check or validate clp_lz4_root in DepsConfig and raise a clear error before calling subprocess.run.
🤖 Prompt for AI Agents
In integration-tests/tests/fixtures/integration_test_logs.py around lines 104 to
110, the code calls the lz4 binary directly without verifying it exists; add an
explicit check using the Path for
integration_test_config.deps_config.lz4_binary_path (e.g., Path.exists() and
Path.is_file()) before building lz4_cmds and calling subprocess.run, and if the
check fails raise a clear, descriptive exception (or ValueError) that includes
the expected path so the test setup fails fast with an actionable message;
alternatively validate clp_lz4_root when constructing DepsConfig and raise there
so callers never receive an invalid path.
def _test_clp_identity_transform_single_archive( | ||
compression_input: Path, | ||
integration_test_config: IntegrationTestConfig, | ||
logs_source_dir: Path, | ||
) -> None: | ||
""" | ||
Validate that compression and decompression by the core binary `clp` run successfully and are | ||
lossless for a single archive input format. | ||
|
||
:param compression_input: Path to the archive for compression. | ||
:param integration_test_config: General config for the integration tests. | ||
:param logs_source_dir: Path to the uncompressed logs for comparison. | ||
""" | ||
test_paths = CompressionTestConfig( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick
Add docstring for compression_input parameter formats.
The docstring should clarify what formats are expected for the compression_input
parameter to help future maintainers.
Update the docstring:
"""
Validate that compression and decompression by the core binary `clp` run successfully and are
lossless for a single archive input format.
- :param compression_input: Path to the archive for compression.
+ :param compression_input: Path to the archive for compression (directory, tar.gz, tar.lz4, tar.xz, or tar.zstd).
:param integration_test_config: General config for the integration tests.
:param logs_source_dir: Path to the uncompressed logs for comparison.
"""
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def _test_clp_identity_transform_single_archive( | |
compression_input: Path, | |
integration_test_config: IntegrationTestConfig, | |
logs_source_dir: Path, | |
) -> None: | |
""" | |
Validate that compression and decompression by the core binary `clp` run successfully and are | |
lossless for a single archive input format. | |
:param compression_input: Path to the archive for compression. | |
:param integration_test_config: General config for the integration tests. | |
:param logs_source_dir: Path to the uncompressed logs for comparison. | |
""" | |
test_paths = CompressionTestConfig( | |
def _test_clp_identity_transform_single_archive( | |
compression_input: Path, | |
integration_test_config: IntegrationTestConfig, | |
logs_source_dir: Path, | |
) -> None: | |
""" | |
Validate that compression and decompression by the core binary `clp` run successfully and are | |
lossless for a single archive input format. | |
:param compression_input: Path to the archive for compression (directory, tar.gz, tar.lz4, tar.xz, or tar.zstd). | |
:param integration_test_config: General config for the integration tests. | |
:param logs_source_dir: Path to the uncompressed logs for comparison. | |
""" | |
test_paths = CompressionTestConfig( |
🤖 Prompt for AI Agents
In integration-tests/tests/test_identity_transformation.py around lines 69 to
82, the docstring for the compression_input parameter is missing details about
allowed input formats; update the :param compression_input: entry to state that
it is a Path to the single archive or directory to be compressed/decompressed
and enumerate supported archive file formats (e.g., .tar, .tar.gz / .tgz, .zip)
and any expected plain-compressed inputs (e.g., .gz) or directory inputs used by
the test, plus a short example or note that the test only exercises
single-archive inputs so multi-archive patterns are not supported.
path_prefix_to_remove = ( | ||
input_path | ||
if test_paths.compression_input.is_dir() | ||
else str(test_paths.compression_input.parent) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick
Consider simplifying path prefix logic.
The path prefix removal logic could be simplified using a ternary operator for better readability.
Apply this diff to simplify:
- path_prefix_to_remove = (
- input_path
- if test_paths.compression_input.is_dir()
- else str(test_paths.compression_input.parent)
- )
+ path_prefix_to_remove = input_path if test_paths.compression_input.is_dir() else str(test_paths.compression_input.parent)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
path_prefix_to_remove = ( | |
input_path | |
if test_paths.compression_input.is_dir() | |
else str(test_paths.compression_input.parent) | |
) | |
path_prefix_to_remove = input_path if test_paths.compression_input.is_dir() else str(test_paths.compression_input.parent) |
🤖 Prompt for AI Agents
In integration-tests/tests/test_identity_transformation.py around lines 93 to
97, the current multi-line if/else that sets path_prefix_to_remove can be
simplified by replacing it with a single-line ternary expression; change the
block to assign path_prefix_to_remove = input_path if
test_paths.compression_input.is_dir() else
str(test_paths.compression_input.parent) to improve readability while preserving
the same logic.
Description
This PR is a precursor to the
LibArchive
PR:In short, we will remove the system-installed copies of the compression libraries that a system-installed
LibArchive
would use. Going forward, everything (LibArchive
and its compression backends) will beTask
-installed.To validate the migration, this PR adds integration tests that confirm CLP can ingest archives across multiple compression formats. These tests should pass now with the system libraries, and passing again after the follow-up PR will confirm the migration.
Checklist
breaking change.
Validation performed
Summary by CodeRabbit
Tests
Chores