-
Notifications
You must be signed in to change notification settings - Fork 82
feat(integration-tests): Add infrastructure for running integration tests; Add basic integration tests for clp
& clp-s
compression.
#1100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(integration-tests): Add infrastructure for running integration tests; Add basic integration tests for clp
& clp-s
compression.
#1100
Conversation
WalkthroughAdds a new integration-tests Python project (pytest config, fixtures, utils, and tests), taskfile and lint-task updates for per-project linting and integration test orchestration, documentation for testing, and a Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Dev as Developer
participant UV as uv (task runner)
participant Pytest as pytest
participant Fixtures as Fixtures
participant Utils as TestUtils
participant Bins as CLP binaries
Dev->>UV: task tests:integration
UV->>Pytest: run pytest in integration-tests (env vars set)
Pytest->>Fixtures: load integration_test_config (session)
Fixtures->>Utils: get_env_var(), validate_dir_exists()
Pytest->>Fixtures: load dataset fixture (download & extract if needed)
Fixtures->>Utils: unlink / prepare test dirs
Pytest->>Bins: invoke clp / clp-s (compress/decompress)
Bins-->>Pytest: outputs (archives, decompressed dirs)
Pytest->>Utils: is_dir_tree_content_equal / is_json_file_structurally_equal
Utils-->>Pytest: comparison result
Pytest-->>Dev: test results
note right of Pytest: clp-s flow includes consolidation before structural JSON compare
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: ASSERTIVE Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧠 Learnings (4)📓 Common learnings
📚 Learning: 2025-08-17T16:10:38.722Z
Applied to files:
📚 Learning: 2025-08-08T21:15:10.905Z
Applied to files:
📚 Learning: 2025-08-20T09:01:25.825Z
Applied to files:
🧬 Code graph analysis (1)integration-tests/tests/fixtures/integration_test_logs.py (3)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
🔇 Additional comments (4)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Ruff (0.12.2)integration-tests/tests/fixtures/integration_test_logs.py�[1;31mruff failed�[0m Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal). Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (1)
integration-tests/uv.lock
is excluded by!**/*.lock
📒 Files selected for processing (14)
.gitignore
(1 hunks)docs/src/dev-guide/building-package.md
(3 hunks)integration-tests/.pytest.ini
(1 hunks)integration-tests/.python-version
(1 hunks)integration-tests/pyproject.toml
(1 hunks)integration-tests/src/integration_tests/__init__.py
(1 hunks)integration-tests/tests/conftest.py
(1 hunks)integration-tests/tests/fixtures/base_config.py
(1 hunks)integration-tests/tests/fixtures/dataset_logs.py
(1 hunks)integration-tests/tests/test_identity_transformation.py
(1 hunks)integration-tests/tests/utils/config.py
(1 hunks)integration-tests/tests/utils/utils.py
(1 hunks)taskfile.yaml
(2 hunks)taskfiles/lint.yaml
(1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: kirkrodrigues
PR: y-scope/clp#881
File: components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh:35-41
Timestamp: 2025-05-06T09:48:55.408Z
Learning: For installation scripts in the CLP project, prefer explicit error handling over automatic dependency resolution (like `apt-get install -f`) when installing packages to give users more control over their system.
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Learnt from: quinntaylormitchell
PR: y-scope/clp#968
File: docs/src/user-guide/quick-start/overview.md:53-54
Timestamp: 2025-06-18T20:48:48.990Z
Learning: CLP is designed to run on Linux systems where Python is typically pre-installed, so Python installation links are generally not needed in CLP documentation.
docs/src/dev-guide/building-package.md (6)
Learnt from: jackluo923
PR: y-scope/clp#718
File: components/core/tools/scripts/utils/create-debian-package.py:41-41
Timestamp: 2025-02-12T22:24:17.723Z
Learning: For the clp-core Debian package creation script, strict version format validation is considered unnecessary complexity and should be avoided.
Learnt from: kirkrodrigues
PR: y-scope/clp#881
File: components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh:35-41
Timestamp: 2025-05-06T09:48:55.408Z
Learning: For installation scripts in the CLP project, prefer explicit error handling over automatic dependency resolution (like `apt-get install -f`) when installing packages to give users more control over their system.
Learnt from: quinntaylormitchell
PR: y-scope/clp#1069
File: docs/src/user-guide/quick-start/clp-json.md:135-152
Timestamp: 2025-07-05T03:38:16.779Z
Learning: The CLP project team has decided to refrain from using include directives in their documentation at present, preferring to maintain duplicated content rather than using shared includes or partials for de-duplication.
Learnt from: quinntaylormitchell
PR: y-scope/clp#968
File: docs/src/user-guide/quick-start/overview.md:73-109
Timestamp: 2025-06-18T20:39:05.899Z
Learning: The CLP project team prefers to use video content to demonstrate detailed procedural steps (like tarball extraction) rather than including every step in the written documentation, keeping the docs focused on conceptual guidance.
Learnt from: quinntaylormitchell
PR: y-scope/clp#968
File: docs/src/user-guide/quick-start/overview.md:53-54
Timestamp: 2025-06-18T20:48:48.990Z
Learning: CLP is designed to run on Linux systems where Python is typically pre-installed, so Python installation links are generally not needed in CLP documentation.
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in `install-prebuilt-packages.sh`, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like `--strip-components=1` to handle potential tarball layout changes.
integration-tests/.pytest.ini (2)
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-clp_s-end_to_end.cpp:109-110
Timestamp: 2024-11-29T22:50:17.206Z
Learning: In `components/core/tests/test-clp_s-end_to_end.cpp`, the success of `constructor.store()` is verified through `REQUIRE` statements and subsequent comparisons.
integration-tests/tests/test_identity_transformation.py (5)
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1050
File: components/clp-package-utils/clp_package_utils/scripts/search.py:100-106
Timestamp: 2025-07-03T20:10:43.789Z
Learning: In the current CLP codebase implementation, dataset validation using validate_dataset() is performed within the native scripts (like clp_package_utils/scripts/native/search.py) rather than at the wrapper script level, where the native scripts handle their own parameter validation.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1050
File: components/clp-package-utils/clp_package_utils/scripts/search.py:100-106
Timestamp: 2025-07-03T20:10:43.789Z
Learning: In the current CLP codebase implementation, dataset validation using validate_dataset() is performed within the native scripts (like clp_package_utils/scripts/native/search.py) rather than at the wrapper script level, where the native scripts handle their own parameter validation.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1036
File: components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py:204-211
Timestamp: 2025-07-03T12:58:18.407Z
Learning: In the CLP codebase, the validate_and_cache_dataset function in components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py uses in-place updates of the existing_datasets set parameter rather than returning a new set, as preferred by the development team.
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
🧬 Code Graph Analysis (1)
integration-tests/tests/conftest.py (2)
integration-tests/tests/fixtures/base_config.py (1)
base_config
(9-22)integration-tests/tests/fixtures/dataset_logs.py (4)
download_and_extract_dataset
(44-71)hive_24hr
(12-16)postgresql
(36-40)spark_event_logs
(28-32)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: build-macos (macos-15, false)
🔇 Additional comments (20)
.gitignore (1)
5-5
: LGTM! Standard Python cache directory exclusion.The recursive
**/__pycache__/
pattern is appropriate for the new integration tests infrastructure and follows Python project conventions.integration-tests/tests/utils/config.py (1)
1-18
: LGTM! Well-structured configuration classes.The frozen dataclasses provide clean, immutable configuration structures with proper type hints. The use of
Path
objects for directory paths and clear field names makes the code maintainable.integration-tests/tests/utils/utils.py (2)
16-20
: LGTM! Clean environment variable retrieval.The function provides proper error handling with a descriptive assertion message.
22-26
: LGTM! Simple and effective command execution wrapper.The function provides a clean abstraction for running commands with assertion-based error handling.
integration-tests/.python-version (1)
1-1
: LGTM! Appropriate Python version specification.Python 3.10 is a good choice for the integration tests and aligns with the project's requirements.
docs/src/dev-guide/building-package.md (3)
18-18
: LGTM! Proper dependency documentation.The uv requirement is clearly documented with the appropriate version constraint.
65-72
: LGTM! Clear testing instructions.The Test section provides concise, actionable instructions for running integration tests that align with the new infrastructure.
83-83
: LGTM! Proper reference link.The uv reference link is correctly formatted and provides users with additional documentation.
taskfiles/lint.yaml (1)
779-779
: LGTM!Adding the integration-tests directory to the linting process is correct and maintains code quality consistency across the project.
integration-tests/tests/conftest.py (1)
1-7
: LGTM!The conftest.py file correctly aggregates pytest fixtures following standard pytest practices. The imports are consistent with the fixture modules described in the context.
integration-tests/.pytest.ini (2)
10-11
: Verify that relative paths work consistently across different execution contexts.The environment variables use relative paths (
../build
). Ensure these paths resolve correctly regardless of where the tests are executed from (e.g., from the root directory, from the integration-tests directory, or from CI/CD pipelines).
12-15
: Well-defined test markers.The custom markers provide clear categorization for different types of integration tests, which will be helpful for selective test execution.
integration-tests/tests/fixtures/base_config.py (2)
8-9
: Appropriate fixture scope.Using session scope for the base_config fixture is correct since this configuration should be shared across all tests in the session.
20-21
: Robust directory creation.The directory creation logic correctly uses
parents=True
andexist_ok=True
to handle cases where directories may already exist or need to be created recursively.integration-tests/pyproject.toml (1)
1-23
: LGTM! Well-structured Python project configuration.The pyproject.toml file is correctly configured with appropriate metadata, build system, and development dependencies for the integration test suite.
taskfile.yaml (2)
18-18
: LGTM! Consistent variable naming convention.The new variable follows the established naming pattern with
G_
prefix for global variables.
166-172
: LGTM! Well-structured integration test task.The task properly:
- Sets required environment variables for the test suite
- Uses
uv run
for isolated Python execution- Runs from the correct directory context
The environment variables
CLP_BUILD_DIR
andCLP_PACKAGE_DIR
will be available to the tests for locating binaries and packages.integration-tests/tests/fixtures/dataset_logs.py (1)
11-40
: LGTM! Well-structured dataset fixtures.The fixtures correctly define session-scoped datasets with clear naming and URLs pointing to Zenodo archives.
integration-tests/tests/test_identity_transformation.py (2)
36-67
: LGTM! Well-structured identity transformation test for CLP.The test correctly implements the identity transformation pattern:
- Proper cleanup before and after test execution
- Uses appropriate CLI flags (
--progress
,--remove-path-prefix
)- Validates that decompressed output matches original input
- Follows established cleanup patterns from learned context
69-108
: LGTM! Comprehensive identity transformation test for CLP-S.The test properly handles CLP-S specific requirements:
- Implements double compression/decompression cycle for consistency verification
- Accounts for JSON key and row order limitations as documented
- Includes appropriate TODO comment referencing official documentation
- Proper cleanup of all temporary directories
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
taskfile.yaml
(2 hunks)taskfiles/lint.yaml
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: kirkrodrigues
PR: y-scope/clp#881
File: components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh:35-41
Timestamp: 2025-05-06T09:48:55.408Z
Learning: For installation scripts in the CLP project, prefer explicit error handling over automatic dependency resolution (like `apt-get install -f`) when installing packages to give users more control over their system.
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Learnt from: quinntaylormitchell
PR: y-scope/clp#968
File: docs/src/user-guide/quick-start/overview.md:53-54
Timestamp: 2025-06-18T20:48:48.990Z
Learning: CLP is designed to run on Linux systems where Python is typically pre-installed, so Python installation links are generally not needed in CLP documentation.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: build (macos-latest)
🔇 Additional comments (2)
taskfiles/lint.yaml (1)
777-778
: Integration-tests directory correctly added to Python lint scopeIncluding
integration-tests
ensures Black/Ruff run over the new test code. No issues spotted.taskfile.yaml (1)
18-18
: Variable addition looks good
G_INTEGRATION_TESTS_DIR
follows existing naming conventions and will simplify future path changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial comment. I will need to take a closer look at uv and pytest to fully review this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (3)
integration-tests/tests/fixtures/dataset_logs.py (1)
44-73
: Address resource cleanup and error handling improvements.The download function has several areas that could be improved based on past review feedback:
- Resource cleanup: Downloaded archives are not cleaned up after extraction
- Error handling: The broad
except Exception
clause could mask specific issues- Security: Consider validating HTTPS URLs for the datasets
Consider this improvement:
try: download_path_str = str(package_config.uncompressed_logs_dir / f"{dataset_name}.tar.gz") # fmt: off curl_cmds = [ "curl", "--fail", "--location", "--output", download_path_str, "--show-error", dataset_config.tar_url, ] # fmt: on subprocess.run(curl_cmds, check=True) extract_path_str = str(package_config.uncompressed_logs_dir / dataset_name) shutil.unpack_archive(download_path_str, extract_path_str) + # Clean up downloaded archive + os.unlink(download_path_str) - except Exception as e: + except (subprocess.CalledProcessError, shutil.ReadError, OSError) as e: raise RuntimeError(f"Failed to download and extract dataset `{dataset_name}`: {e}")This addresses resource cleanup and provides more specific error handling.
integration-tests/tests/test_identity_transformation.py (2)
126-126
: Remove duplicate assignment that causes resource leakLine 126 reassigns
keys_and_rows_sorted
, overwriting the context-managed temporary file created on line 125. This causes a resource leak and breaks the intended context management.Remove this line:
- keys_and_rows_sorted = NamedTemporaryFile(mode="w+", delete=True)
96-99
: Inconsistent naming: "single-file" directories without single-file-archive modeThe variable names
single_file_archives_dir
andsingle_file_extract_dir
suggest single-file archive mode, but the compression commands don't use the--single-file-archive
flag. This creates confusion about the test's intent.Either rename the variables to remove "single-file" references or add the flag if single-file mode is intended.
Also applies to: 104-105
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (6)
integration-tests/.pytest.ini
(1 hunks)integration-tests/tests/conftest.py
(1 hunks)integration-tests/tests/fixtures/dataset_logs.py
(1 hunks)integration-tests/tests/test_identity_transformation.py
(1 hunks)integration-tests/tests/utils/utils.py
(1 hunks)taskfile.yaml
(2 hunks)
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.149Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.
Learnt from: jackluo923
PR: y-scope/clp#718
File: components/core/tools/scripts/utils/create-debian-package.py:41-41
Timestamp: 2025-02-12T22:24:17.723Z
Learning: For the clp-core Debian package creation script, strict version format validation is considered unnecessary complexity and should be avoided.
Learnt from: kirkrodrigues
PR: y-scope/clp#881
File: components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh:35-41
Timestamp: 2025-05-06T09:48:55.408Z
Learning: For installation scripts in the CLP project, prefer explicit error handling over automatic dependency resolution (like `apt-get install -f`) when installing packages to give users more control over their system.
Learnt from: quinntaylormitchell
PR: y-scope/clp#968
File: docs/src/user-guide/quick-start/overview.md:53-54
Timestamp: 2025-06-18T20:48:48.990Z
Learning: CLP is designed to run on Linux systems where Python is typically pre-installed, so Python installation links are generally not needed in CLP documentation.
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like `apt-get update && apt-get install` or `apk update && apk add`), preserve this pattern for uniform script structure.
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like `apt-get update && apt-get install` or `apk update && apk add`), preserve this pattern for uniform script structure.
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like `apt-get update && apt-get install` or `apk update && apk add`), preserve this pattern for uniform script structure.
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in `install-prebuilt-packages.sh`, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like `--strip-components=1` to handle potential tarball layout changes.
integration-tests/.pytest.ini (2)
Learnt from: AVMatthews
PR: #595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.
Learnt from: AVMatthews
PR: #595
File: components/core/tests/test-clp_s-end_to_end.cpp:109-110
Timestamp: 2024-11-29T22:50:17.206Z
Learning: In components/core/tests/test-clp_s-end_to_end.cpp
, the success of constructor.store()
is verified through REQUIRE
statements and subsequent comparisons.
taskfile.yaml (7)
Learnt from: anlowee
PR: #925
File: taskfiles/deps/main.yaml:97-106
Timestamp: 2025-05-28T18:33:30.155Z
Learning: In the taskfiles dependency system (taskfiles/deps/main.yaml), echo commands are used to generate .cmake settings files that are consumed by the main CMake build process. These files set variables like ANTLR_RUNTIME_HEADER to point to dependency locations for use during compilation.
Learnt from: Bill-hbrhbr
PR: #1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.149Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in install-prebuilt-packages.sh
, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like --strip-components=1
to handle potential tarball layout changes.
Learnt from: kirkrodrigues
PR: #881
File: components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh:35-41
Timestamp: 2025-05-06T09:48:55.408Z
Learning: For installation scripts in the CLP project, prefer explicit error handling over automatic dependency resolution (like apt-get install -f
) when installing packages to give users more control over their system.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install
or apk update && apk add
), preserve this pattern for uniform script structure.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install
or apk update && apk add
), preserve this pattern for uniform script structure.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install
or apk update && apk add
), preserve this pattern for uniform script structure.
integration-tests/tests/utils/utils.py (1)
Learnt from: anlowee
PR: #925
File: taskfile.yaml:132-142
Timestamp: 2025-05-27T19:54:46.662Z
Learning: When checking for uncommitted changes in generated code directories, git status --porcelain
is more comprehensive than git diff --quiet
because it detects both modified tracked files and new untracked files, while git diff
only detects changes to already-tracked files.
integration-tests/tests/test_identity_transformation.py (20)
Learnt from: AVMatthews
PR: #595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.
Learnt from: Bill-hbrhbr
PR: #1036
File: components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py:204-211
Timestamp: 2025-07-03T12:58:18.407Z
Learning: In the CLP codebase, the validate_and_cache_dataset function in components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py uses in-place updates of the existing_datasets set parameter rather than returning a new set, as preferred by the development team.
Learnt from: Bill-hbrhbr
PR: #1004
File: components/clp-package-utils/clp_package_utils/scripts/native/decompress.py:139-144
Timestamp: 2025-06-24T08:54:14.438Z
Learning: In the CLP codebase, the get_orig_file_id function signature was changed after a recent merge to no longer accept a dataset parameter, making previous suggestions that reference this parameter invalid.
Learnt from: Bill-hbrhbr
PR: #1050
File: components/clp-package-utils/clp_package_utils/scripts/search.py:100-106
Timestamp: 2025-07-03T20:10:43.789Z
Learning: In the current CLP codebase implementation, dataset validation using validate_dataset() is performed within the native scripts (like clp_package_utils/scripts/native/search.py) rather than at the wrapper script level, where the native scripts handle their own parameter validation.
Learnt from: Bill-hbrhbr
PR: #1050
File: components/clp-package-utils/clp_package_utils/scripts/search.py:100-106
Timestamp: 2025-07-03T20:10:43.789Z
Learning: In the current CLP codebase implementation, dataset validation using validate_dataset() is performed within the native scripts (like clp_package_utils/scripts/native/search.py) rather than at the wrapper script level, where the native scripts handle their own parameter validation.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in install-prebuilt-packages.sh
, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like --strip-components=1
to handle potential tarball layout changes.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install
or apk update && apk add
), preserve this pattern for uniform script structure.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install
or apk update && apk add
), preserve this pattern for uniform script structure.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install
or apk update && apk add
), preserve this pattern for uniform script structure.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across different platforms (e.g., using separate update and install commands like apk update && apk add
, apt update && apt install
, yum update && yum install
) rather than platform-specific optimizations, to ensure uniform script structure and readability.
Learnt from: AVMatthews
PR: #595
File: components/core/tests/test-clp_s-end_to_end.cpp:109-110
Timestamp: 2024-11-29T22:50:17.206Z
Learning: In components/core/tests/test-clp_s-end_to_end.cpp
, the success of constructor.store()
is verified through REQUIRE
statements and subsequent comparisons.
Learnt from: AVMatthews
PR: #543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-07T20:10:08.254Z
Learning: In clp-s.cpp
, the run_serializer
function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.
Learnt from: AVMatthews
PR: #543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In clp-s.cpp
, the run_serializer
function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.
Learnt from: Bill-hbrhbr
PR: #1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.149Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: haiqi96
PR: #594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in components/clp-package-utils/clp_package_utils/scripts/
, note that it's preferred to keep error handling simple without adding extra complexity.
Learnt from: haiqi96
PR: #634
File: components/job-orchestration/job_orchestration/executor/compress/fs_compression_task.py:0-0
Timestamp: 2024-12-16T21:49:06.483Z
Learning: In fs_compression_task.py
, when refactoring the archive processing loop in Python, the src_archive_file.unlink()
operation should remain outside of the process_archive
function.
Learnt from: haiqi96
PR: #619
File: components/core/src/clp/clp/decompression.cpp:313-313
Timestamp: 2024-12-05T16:32:21.507Z
Learning: In the C++ FileDecompressor
implementations at components/core/src/clp/clp/FileDecompressor.cpp
and components/core/src/glt/glt/FileDecompressor.cpp
, the temp_output_path
variable and associated logic are used to handle multiple compressed files with the same name, and should be kept. This logic is separate from the temporary output directory code removed in PR #619 and is necessary for proper file handling.
Learnt from: haiqi96
PR: #651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Learnt from: haiqi96
PR: #594
File: components/package-template/src/sbin/del-archives.sh:6-9
Timestamp: 2024-11-15T16:28:08.644Z
Learning: For wrapper scripts in the components/package-template/src/sbin/
directory, keep them simple and avoid adding additional validation code.
Learnt from: Bill-hbrhbr
PR: #1036
File: components/clp-package-utils/clp_package_utils/general.py:564-579
Timestamp: 2025-06-28T07:10:47.295Z
Learning: The validate_dataset function in components/clp-package-utils/clp_package_utils/general.py is designed to be called once upon function startup for dataset validation, not repeatedly during execution, making caching optimizations unnecessary.
integration-tests/tests/fixtures/dataset_logs.py (2)
Learnt from: Bill-hbrhbr
PR: #1036
File: components/clp-package-utils/clp_package_utils/general.py:564-579
Timestamp: 2025-06-28T07:10:47.295Z
Learning: The validate_dataset function in components/clp-package-utils/clp_package_utils/general.py is designed to be called once upon function startup for dataset validation, not repeatedly during execution, making caching optimizations unnecessary.
Learnt from: Bill-hbrhbr
PR: #1036
File: components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py:204-211
Timestamp: 2025-07-03T12:58:18.407Z
Learning: In the CLP codebase, the validate_and_cache_dataset function in components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py uses in-place updates of the existing_datasets set parameter rather than returning a new set, as preferred by the development team.
🧬 Code Graph Analysis (2)
integration-tests/tests/conftest.py (2)
integration-tests/tests/fixtures/dataset_logs.py (4)
download_and_extract_dataset
(45-73)hive_24hr
(13-17)postgresql
(37-41)spark_event_logs
(29-33)integration-tests/tests/fixtures/package_config.py (1)
package_config
(9-22)
integration-tests/tests/test_identity_transformation.py (4)
integration-tests/tests/utils/config.py (2)
DatasetLogs
(15-17)PackageConfig
(6-11)integration-tests/tests/utils/utils.py (2)
is_dir_tree_content_equal
(7-14)run_and_assert
(23-26)integration-tests/tests/fixtures/package_config.py (1)
package_config
(9-22)integration-tests/tests/fixtures/dataset_logs.py (1)
download_and_extract_dataset
(45-73)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: build-macos (macos-14, true)
- GitHub Check: build-macos (macos-13, true)
- GitHub Check: build-macos (macos-13, false)
- GitHub Check: lint-check (ubuntu-latest)
- GitHub Check: lint-check (macos-latest)
🔇 Additional comments (10)
integration-tests/tests/utils/utils.py (1)
17-20
: Clean implementation with helpful error messaging.The function properly handles missing environment variables with a clear assertion message.
integration-tests/tests/conftest.py (1)
1-7
: Standard pytest fixture aggregation pattern.Clean implementation that properly imports and exposes fixtures for the integration test suite.
integration-tests/.pytest.ini (1)
1-15
: Well-structured pytest configuration for integration tests.The configuration properly sets up environment variables, custom markers, and test execution options. The markers for
binaries
,clp
, andclp_s
provide good test categorization capabilities.taskfile.yaml (3)
18-18
: Integration tests directory variable follows existing naming convention.The variable name and path structure are consistent with other directory variables in the file.
166-174
: Well-implemented test task with proper dependency.The task correctly depends on the
package
task and properly sets environment variables that match the pytest configuration. This addresses the dependency issue mentioned in past reviews.
176-181
: Useful cache clearing task for test maintenance.The cache clearing task provides a clean way to reset test state, which is valuable for debugging and ensuring clean test runs.
integration-tests/tests/fixtures/dataset_logs.py (4)
12-17
: Dataset fixture follows consistent pattern.The fixture properly returns a DatasetLogs instance with the correct Zenodo URL structure.
20-25
: Dataset fixture follows consistent pattern.The fixture properly returns a DatasetLogs instance with the correct Zenodo URL structure.
28-33
: Dataset fixture follows consistent pattern.The fixture properly returns a DatasetLogs instance with the correct Zenodo URL structure.
36-41
: Dataset fixture follows consistent pattern.The fixture properly returns a DatasetLogs instance with the correct Zenodo URL structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
integration-tests/tests/fixtures/dataset_logs.py (1)
52-84
: Improve error handling specificity and add archive cleanup.The download and extraction logic has areas for improvement:
- The bare
except Exception
clause is too broad and could mask unexpected errors- Downloaded archives are not cleaned up after extraction, potentially wasting disk space
Apply this diff to improve error handling and add cleanup:
- except Exception as e: - raise RuntimeError(f"Failed to download and extract dataset `{dataset_name}`: {e}") + except subprocess.CalledProcessError as e: + raise RuntimeError(f"Failed to download dataset `{dataset_name}`: {e}") + except shutil.ReadError as e: + raise RuntimeError(f"Failed to extract dataset `{dataset_name}`: {e}") + finally: + # Clean up downloaded archive to save disk space + download_path = package_config.uncompressed_logs_dir / f"{dataset_name}.tar.gz" + download_path.unlink(missing_ok=True)integration-tests/tests/test_identity_transformation.py (1)
127-139
: Fix redundant NamedTemporaryFile creation causing resource leak.Line 132 creates a new
NamedTemporaryFile
that shadows the one created in thewith
statement on line 131, causing a resource leak and incorrect behavior.Remove the redundant line:
@contextmanager def _sort_json_keys_and_rows(json_fp: Path) -> IO[str]: with NamedTemporaryFile(mode="w+", delete=True) as keys_sorted, NamedTemporaryFile( mode="w+", delete=True ) as flattened, NamedTemporaryFile(mode="w+", delete=True) as keys_and_rows_sorted: - keys_and_rows_sorted = NamedTemporaryFile(mode="w+", delete=True) subprocess.run(["jq", "--sort-keys", ".", str(json_fp)], check=True, stdout=keys_sorted) keys_sorted.flush() subprocess.run(["jq", ".", keys_sorted.name], check=True, stdout=flattened) flattened.flush() subprocess.run(["sort", flattened.name], check=True, stdout=keys_and_rows_sorted) keys_and_rows_sorted.flush() yield keys_and_rows_sorted
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
integration-tests/tests/conftest.py
(1 hunks)integration-tests/tests/fixtures/dataset_logs.py
(1 hunks)integration-tests/tests/test_identity_transformation.py
(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.149Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Learnt from: AVMatthews
PR: y-scope/clp#595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.
Learnt from: jackluo923
PR: y-scope/clp#718
File: components/core/tools/scripts/utils/create-debian-package.py:41-41
Timestamp: 2025-02-12T22:24:17.723Z
Learning: For the clp-core Debian package creation script, strict version format validation is considered unnecessary complexity and should be avoided.
Learnt from: kirkrodrigues
PR: y-scope/clp#881
File: components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh:35-41
Timestamp: 2025-05-06T09:48:55.408Z
Learning: For installation scripts in the CLP project, prefer explicit error handling over automatic dependency resolution (like `apt-get install -f`) when installing packages to give users more control over their system.
Learnt from: quinntaylormitchell
PR: y-scope/clp#968
File: docs/src/user-guide/quick-start/overview.md:53-54
Timestamp: 2025-06-18T20:48:48.990Z
Learning: CLP is designed to run on Linux systems where Python is typically pre-installed, so Python installation links are generally not needed in CLP documentation.
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like `apt-get update && apt-get install` or `apk update && apk add`), preserve this pattern for uniform script structure.
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like `apt-get update && apt-get install` or `apk update && apk add`), preserve this pattern for uniform script structure.
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like `apt-get update && apt-get install` or `apk update && apk add`), preserve this pattern for uniform script structure.
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in `install-prebuilt-packages.sh`, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like `--strip-components=1` to handle potential tarball layout changes.
integration-tests/tests/fixtures/dataset_logs.py (2)
Learnt from: Bill-hbrhbr
PR: #1036
File: components/clp-package-utils/clp_package_utils/general.py:564-579
Timestamp: 2025-06-28T07:10:47.295Z
Learning: The validate_dataset function in components/clp-package-utils/clp_package_utils/general.py is designed to be called once upon function startup for dataset validation, not repeatedly during execution, making caching optimizations unnecessary.
Learnt from: Bill-hbrhbr
PR: #1036
File: components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py:204-211
Timestamp: 2025-07-03T12:58:18.407Z
Learning: In the CLP codebase, the validate_and_cache_dataset function in components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py uses in-place updates of the existing_datasets set parameter rather than returning a new set, as preferred by the development team.
integration-tests/tests/test_identity_transformation.py (20)
Learnt from: AVMatthews
PR: #595
File: components/core/tests/test-end_to_end.cpp:59-65
Timestamp: 2024-11-19T17:30:04.970Z
Learning: In 'components/core/tests/test-end_to_end.cpp', during the 'clp-s_compression_and_extraction_no_floats' test, files and directories are intentionally removed at the beginning of the test to ensure that any existing content doesn't influence the test results.
Learnt from: haiqi96
PR: #651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in install-prebuilt-packages.sh
, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like --strip-components=1
to handle potential tarball layout changes.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install
or apk update && apk add
), preserve this pattern for uniform script structure.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install
or apk update && apk add
), preserve this pattern for uniform script structure.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install
or apk update && apk add
), preserve this pattern for uniform script structure.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across different platforms (e.g., using separate update and install commands like apk update && apk add
, apt update && apt install
, yum update && yum install
) rather than platform-specific optimizations, to ensure uniform script structure and readability.
Learnt from: Bill-hbrhbr
PR: #1004
File: components/clp-package-utils/clp_package_utils/scripts/native/decompress.py:139-144
Timestamp: 2025-06-24T08:54:14.438Z
Learning: In the CLP codebase, the get_orig_file_id function signature was changed after a recent merge to no longer accept a dataset parameter, making previous suggestions that reference this parameter invalid.
Learnt from: Bill-hbrhbr
PR: #1036
File: components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py:204-211
Timestamp: 2025-07-03T12:58:18.407Z
Learning: In the CLP codebase, the validate_and_cache_dataset function in components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py uses in-place updates of the existing_datasets set parameter rather than returning a new set, as preferred by the development team.
Learnt from: AVMatthews
PR: #595
File: components/core/tests/test-clp_s-end_to_end.cpp:109-110
Timestamp: 2024-11-29T22:50:17.206Z
Learning: In components/core/tests/test-clp_s-end_to_end.cpp
, the success of constructor.store()
is verified through REQUIRE
statements and subsequent comparisons.
Learnt from: AVMatthews
PR: #543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-07T20:10:08.254Z
Learning: In clp-s.cpp
, the run_serializer
function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.
Learnt from: AVMatthews
PR: #543
File: components/core/src/clp_s/clp-s.cpp:196-265
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In clp-s.cpp
, the run_serializer
function interleaves serialization and writing of IR files, making it difficult to restructure it into separate functions.
Learnt from: Bill-hbrhbr
PR: #1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.149Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: haiqi96
PR: #594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in components/clp-package-utils/clp_package_utils/scripts/
, note that it's preferred to keep error handling simple without adding extra complexity.
Learnt from: haiqi96
PR: #634
File: components/job-orchestration/job_orchestration/executor/compress/fs_compression_task.py:0-0
Timestamp: 2024-12-16T21:49:06.483Z
Learning: In fs_compression_task.py
, when refactoring the archive processing loop in Python, the src_archive_file.unlink()
operation should remain outside of the process_archive
function.
Learnt from: haiqi96
PR: #619
File: components/core/src/clp/clp/decompression.cpp:313-313
Timestamp: 2024-12-05T16:32:21.507Z
Learning: In the C++ FileDecompressor
implementations at components/core/src/clp/clp/FileDecompressor.cpp
and components/core/src/glt/glt/FileDecompressor.cpp
, the temp_output_path
variable and associated logic are used to handle multiple compressed files with the same name, and should be kept. This logic is separate from the temporary output directory code removed in PR #619 and is necessary for proper file handling.
Learnt from: haiqi96
PR: #594
File: components/package-template/src/sbin/del-archives.sh:6-9
Timestamp: 2024-11-15T16:28:08.644Z
Learning: For wrapper scripts in the components/package-template/src/sbin/
directory, keep them simple and avoid adding additional validation code.
Learnt from: Bill-hbrhbr
PR: #1050
File: components/clp-package-utils/clp_package_utils/scripts/search.py:100-106
Timestamp: 2025-07-03T20:10:43.789Z
Learning: In the current CLP codebase implementation, dataset validation using validate_dataset() is performed within the native scripts (like clp_package_utils/scripts/native/search.py) rather than at the wrapper script level, where the native scripts handle their own parameter validation.
Learnt from: Bill-hbrhbr
PR: #1050
File: components/clp-package-utils/clp_package_utils/scripts/search.py:100-106
Timestamp: 2025-07-03T20:10:43.789Z
Learning: In the current CLP codebase implementation, dataset validation using validate_dataset() is performed within the native scripts (like clp_package_utils/scripts/native/search.py) rather than at the wrapper script level, where the native scripts handle their own parameter validation.
Learnt from: Bill-hbrhbr
PR: #1036
File: components/clp-package-utils/clp_package_utils/general.py:564-579
Timestamp: 2025-06-28T07:10:47.295Z
Learning: The validate_dataset function in components/clp-package-utils/clp_package_utils/general.py is designed to be called once upon function startup for dataset validation, not repeatedly during execution, making caching optimizations unnecessary.
🧬 Code Graph Analysis (1)
integration-tests/tests/conftest.py (2)
integration-tests/tests/fixtures/dataset_logs.py (4)
elasticsearch
(23-29)hive_24hr
(13-19)postgresql
(43-49)spark_event_logs
(33-39)integration-tests/tests/fixtures/package_config.py (1)
package_config
(9-22)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: build-macos (macos-14, false)
- GitHub Check: build-macos (macos-14, true)
- GitHub Check: build-macos (macos-13, true)
- GitHub Check: build-macos (macos-13, false)
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
🔇 Additional comments (1)
integration-tests/tests/conftest.py (1)
1-7
: LGTM! Standard pytest fixture aggregation pattern.The conftest.py correctly aggregates fixtures for use across integration tests.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/src/dev-docs/building-package.md (1)
11-12
: Fix grammar in Requirements sectionReplace “a some” and avoid the comma splice.
- * It should be possible to build a package for a different environment, it just requires a some - extra configuration. + * It should be possible to build a package for a different environment; it just requires + some extra configuration.
♻️ Duplicate comments (1)
docs/src/dev-docs/building-package.md (1)
16-18
: Add a short rationale for Task pin and uv lower boundPre-empt reader confusion by noting why Task is pinned to 3.44.0 and uv uses a lower bound.
* [Task] 3.44.0 * [uv] >= 0.8 + - Note: Task is pinned to 3.44.0 due to a known issue in 3.44.1; uv uses a lower bound (>= 0.8) to ensure minimum required functionality across platforms.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
docs/src/dev-docs/building-package.md
(2 hunks)taskfile.yaml
(1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/pyproject.toml:11-13
Timestamp: 2025-08-16T10:24:29.316Z
Learning: In the CLP project's integration-tests directory, the integration tests package is not intended to be shipped or distributed - only the tests are leveraged for local testing purposes, so console script entry points should not be included in pyproject.toml.
📚 Learning: 2025-08-09T04:07:27.083Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.
Applied to files:
taskfile.yaml
📚 Learning: 2025-08-29T07:31:24.032Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1271
File: components/core/tools/scripts/lib_install/centos-stream-9/install-prebuilt-packages.sh:41-43
Timestamp: 2025-08-29T07:31:24.032Z
Learning: In CLP's build tool installation scripts, uv version constraints should use lower bound constraints (uv>=0.8) rather than exact version pinning, following the same philosophy used for other build tools to accommodate platform differences while ensuring minimum required functionality.
Applied to files:
docs/src/dev-docs/building-package.md
📚 Learning: 2025-08-25T00:45:05.464Z
Learnt from: junhaoliao
PR: y-scope/clp#1242
File: components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh:39-41
Timestamp: 2025-08-25T00:45:05.464Z
Learning: Task v3.44.1 causes issues in the clp-ffi-js project (issue #110), so CLP should avoid upgrading to v3.44.1 and stick with v3.44.0 for now.
Applied to files:
docs/src/dev-docs/building-package.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: package-image
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: build-macos (macos-15, true)
🔇 Additional comments (2)
docs/src/dev-docs/building-package.md (2)
17-17
: uv requirement — LGTMLower-bound constraint (>= 0.8) matches our tooling policy and avoids over-pinning.
84-84
: uv reference link — LGTMLink is correct and consistent with other docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (7)
taskfiles/lint.yaml (1)
780-786
: Gate mypy and add uv preflight (prior feedback)
- Consider running
mypy
only when a config exists to avoid false negatives on untyped projects.- Add a quick
command -v uv
guard so failures are explicit when uv isn’t installed.These were suggested earlier; re-stating for this block’s final form.
Example gating (inline in this cmd block):
uv run --directory "{{.ITEM}}" ruff format {{.RUFF_FORMAT_FLAGS}} . uv run --directory "{{.ITEM}}" ruff check {{.RUFF_CHECK_FLAGS}} . -uv run --directory "{{.ITEM}}" mypy . +if [ -f "{{.ITEM}}/mypy.ini" ] || \ + ( [ -f "{{.ITEM}}/setup.cfg" ] && grep -q '^\[mypy\]' "{{.ITEM}}/setup.cfg" ) || \ + ( [ -f "{{.ITEM}}/pyproject.toml" ] && grep -q '^\[tool\.mypy\]' "{{.ITEM}}/pyproject.toml" ); then + uv run --directory "{{.ITEM}}" mypy . +fiAnd at the start of the task (one-time check):
+command -v uv >/dev/null 2>&1 || { echo "uv not found on PATH; install uv to run Python linting." >&2; exit 1; }
taskfiles/tests/integration.yaml (6)
21-25
: Ensure ::package is resolvable from this TaskfileIf not already included via a parent Taskfile, add an includes: to expose package.
version: "3" +includes: + - path: ../../taskfile.yaml + prefix: "" + tasks:Run to verify resolution:
#!/bin/bash rg -nP -C2 '^\s*package:\s*$' -g 'Taskfile*.yml' -g 'Taskfile*.yaml' || true
3-7
: Scope variables to the task; avoid global leakage and indirectionMake these values task-local and reference the global G_* directly in env.
vars: - CLP_BUILD_DIR: "{{.G_BUILD_DIR}}" - CLP_CORE_BINS_DIR: "{{.G_CORE_COMPONENT_BUILD_DIR}}" - CLP_PACKAGE_DIR: "{{.G_PACKAGE_BUILD_DIR}}" - INTEGRATION_TESTS_DIR: "{{.ROOT_DIR}}/integration-tests" + INTEGRATION_TESTS_DIR: "{{.ROOT_DIR}}/integration-tests" @@ core: @@ - env: - CLP_BUILD_DIR: "{{.CLP_BUILD_DIR}}" - CLP_CORE_BINS_DIR: "{{.CLP_CORE_BINS_DIR}}" - CLP_PACKAGE_DIR: "{{.CLP_PACKAGE_DIR}}" + env: + CLP_BUILD_DIR: "{{.G_BUILD_DIR}}" + CLP_CORE_BINS_DIR: "{{.G_CORE_COMPONENT_BUILD_DIR}}" + CLP_PACKAGE_DIR: "{{.G_PACKAGE_BUILD_DIR}}"Also applies to: 26-30
10-13
: Make default an explicit alias to coreSlightly clearer intent than a deps-only task.
default: - deps: - - task: "core" + cmds: + - task: "core"
14-19
: Don’t discard pytest collection outputRemoving the redirection keeps helpful diagnostics.
cache-clear: dir: "{{.INTEGRATION_TESTS_DIR}}" cmds: - >- - uv run python -m pytest --cache-clear --collect-only --override-ini addopts="" --quiet - > /dev/null + uv run python -m pytest --cache-clear --collect-only --override-ini addopts="" --quiet
14-19
: Add preconditions to fail fast when uv/pytest are missing (cache-clear)Improves DX and CI errors.
cache-clear: dir: "{{.INTEGRATION_TESTS_DIR}}" + preconditions: + - sh: 'command -v uv >/dev/null' + msg: 'uv is required; see dev env setup in docs.' + - sh: 'uv run python -c "import pytest" >/dev/null 2>&1' + msg: 'pytest not available in the uv environment.' cmds: - >- uv run python -m pytest --cache-clear --collect-only --override-ini addopts="" --quiet
21-30
: Add preconditions for core task (uv/pytest presence)Prevents opaque failures before running tests.
core: deps: # TODO: https://github.com/y-scope/clp/issues/1281 - task: "::package" dir: "{{.INTEGRATION_TESTS_DIR}}" + preconditions: + - sh: 'command -v uv >/dev/null' + msg: 'uv is required; see dev env setup in docs.' + - sh: 'uv run python -c "import pytest" >/dev/null 2>&1' + msg: 'pytest not available in the uv environment.' env: CLP_BUILD_DIR: "{{.CLP_BUILD_DIR}}" CLP_CORE_BINS_DIR: "{{.CLP_CORE_BINS_DIR}}" CLP_PACKAGE_DIR: "{{.CLP_PACKAGE_DIR}}"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
taskfiles/lint.yaml
(3 hunks)taskfiles/tests/integration.yaml
(1 hunks)
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/pyproject.toml:11-13
Timestamp: 2025-08-16T10:24:29.316Z
Learning: In the CLP project's integration-tests directory, the integration tests package is not intended to be shipped or distributed - only the tests are leveraged for local testing purposes, so console script entry points should not be included in pyproject.toml.
📚 Learning: 2025-08-17T16:10:38.722Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
Applied to files:
taskfiles/lint.yaml
taskfiles/tests/integration.yaml
📚 Learning: 2025-08-16T10:24:29.316Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/pyproject.toml:11-13
Timestamp: 2025-08-16T10:24:29.316Z
Learning: In the CLP project's integration-tests directory, the integration tests package is not intended to be shipped or distributed - only the tests are leveraged for local testing purposes, so console script entry points should not be included in pyproject.toml.
Applied to files:
taskfiles/lint.yaml
taskfiles/tests/integration.yaml
📚 Learning: 2025-08-24T17:10:44.565Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: taskfiles/lint.yaml:780-781
Timestamp: 2025-08-24T17:10:44.565Z
Learning: The CLP repository is gradually migrating from Black to Ruff for Python formatting. Some modules haven't been set up for Ruff yet, so there's intentional mixed usage of Black and Ruff formatters during this transition period. This should not be flagged as an inconsistency in PR reviews.
Applied to files:
taskfiles/lint.yaml
📚 Learning: 2025-07-07T17:41:15.655Z
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in `install-prebuilt-packages.sh`, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like `--strip-components=1` to handle potential tarball layout changes.
Applied to files:
taskfiles/tests/integration.yaml
📚 Learning: 2025-08-09T04:07:27.083Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.
Applied to files:
taskfiles/tests/integration.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: package-image
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: build-macos (macos-15, false)
🔇 Additional comments (3)
taskfiles/lint.yaml (3)
81-83
: Confirm desired behaviour: ruff format with only --diff won’t fail CI
ruff format --diff
prints diffs but exits 0. If you want check-py to fail on formatting drift for UV projects, add--check
(or--exit-code
). If non-blocking is intentional, ignore.
89-91
: LGTM: fix flags align with auto-fixing
RUFF_CHECK_FLAGS="--fix"
with empty format flags will apply changes as intended.
761-766
: LGTM: per-project UV loop setupDeclaring
UV_PYTHON_PROJECTS
and requiring the flag vars matches the per-project linting model for integration-tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
integration-tests/pyproject.toml (3)
11-13
: Remove console script entry point — tests aren’t meant to be shipped.Per prior agreement for this PR, we’re not distributing an installable package here. Drop the script to avoid confusion and broken wheels.
-[project.scripts] -integration-tests = "integration_tests:main" -
61-61
: Fix Ruff isort configuration location — dotted key won’t be read.Move
order-by-type
under[tool.ruff.lint.isort]
; the currentisort.order-by-type
key is ignored.-isort.order-by-type = false + +[tool.ruff.lint.isort] +order-by-type = false
19-24
: Alphabetize dev dependency group for consistency.Keeps diffs clean and matches existing repo guidance.
dev = [ - "mypy>=1.16.0", - "ruff>=0.11.12", - "pytest>=8.4.1", - "pytest-env>=1.1.5", + "mypy>=1.16.0", + "pytest>=8.4.1", + "pytest-env>=1.1.5", + "ruff>=0.11.12", ]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
integration-tests/pyproject.toml
(1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/pyproject.toml:11-13
Timestamp: 2025-08-16T10:24:29.316Z
Learning: In the CLP project's integration-tests directory, the integration tests package is not intended to be shipped or distributed - only the tests are leveraged for local testing purposes, so console script entry points should not be included in pyproject.toml.
📚 Learning: 2025-08-16T10:24:29.316Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/pyproject.toml:11-13
Timestamp: 2025-08-16T10:24:29.316Z
Learning: In the CLP project's integration-tests directory, the integration tests package is not intended to be shipped or distributed - only the tests are leveraged for local testing purposes, so console script entry points should not be included in pyproject.toml.
Applied to files:
integration-tests/pyproject.toml
📚 Learning: 2025-08-17T16:10:38.722Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
Applied to files:
integration-tests/pyproject.toml
📚 Learning: 2025-08-24T17:10:44.565Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: taskfiles/lint.yaml:780-781
Timestamp: 2025-08-24T17:10:44.565Z
Learning: The CLP repository is gradually migrating from Black to Ruff for Python formatting. Some modules haven't been set up for Ruff yet, so there's intentional mixed usage of Black and Ruff formatters during this transition period. This should not be flagged as an inconsistency in PR reviews.
Applied to files:
integration-tests/pyproject.toml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: package-image
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
🔇 Additional comments (1)
integration-tests/pyproject.toml (1)
9-9
: Python 3.9 minimum looks correct.Matches CLP’s stated minimum; ensure CI runs the integration suite on 3.9 to keep it honest.
ignore = [ | ||
"ANN401", # Allow using `Any` type for function signatures | ||
"COM812", # Redundant and conflicts with ruff format | ||
"D203", # No blank line before docstrings (D211) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Clarify D203 comment to avoid contradiction with D211.
D203 conflicts with D211; we’re preferring D211. Update the note.
- "D203", # No blank line before docstrings (D211)
+ "D203", # Conflicts with D211; prefer no blank line before class docstring
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
"D203", # No blank line before docstrings (D211) | |
"D203", # Conflicts with D211; prefer no blank line before class docstring |
🤖 Prompt for AI Agents
In integration-tests/pyproject.toml around line 43, the inline comment for
"D203" incorrectly states "No blank line before docstrings (D211)" causing a
contradiction with D211; update the comment to clearly state that D203 conflicts
with D211 and that the project prefers D211 (e.g., "D203 conflicts with D211; we
prefer D211 — ensure blank line before multi-line docstrings") or remove/replace
the D203 note so it unambiguously reflects the prefered rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to update the PR title.
shutil.unpack_archive( | ||
integration_test_logs.tarball_path, integration_test_logs.extraction_dir | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also chmod -R gu+w
the files? Since the hive-24hr dataset is read-only by default, task clean
will always fail after running the integration tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added. Let me know if you need to open an issue for changing the implementation to use the os
library.
clp
& clp-s
compression.
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (5)
taskfiles/tests/integration.yaml (5)
11-16
: Don’t suppress pytest collection output; it hides useful failuresDrop the stdout redirection to /dev/null.
- - >- - uv run python -m pytest --cache-clear --collect-only --override-ini addopts="" --quiet - > /dev/null + - >- + uv run python -m pytest --cache-clear --collect-only --override-ini addopts="" --quiet
7-9
: Prefer explicit alias over deps-only defaultMake default a thin alias to core for clarity.
default: - deps: - - task: "core" + cmds: + - task: "core"
27-27
: Use cmds list for consistencyAlign with cache-clear’s cmds array.
- cmd: "uv run python -m pytest -m core" + cmds: + - uv run python -m pytest -m core
18-27
: Add preconditions to fail fast when uv/pytest are missing (core)Mirror cache-clear checks here.
core: deps: # TODO: https://github.com/y-scope/clp/issues/1281 - task: "::package" dir: "{{.G_INTEGRATION_TESTS_DIR}}" + preconditions: + - sh: 'command -v uv >/dev/null' + msg: 'uv is required; see dev env setup in docs.' + - sh: 'uv run python -c "import pytest" >/dev/null 2>&1' + msg: 'pytest not available in the uv environment.' env:
11-16
: Add preconditions to fail fast when uv/pytest are missing (cache-clear)Early checks improve CI/dev ergonomics.
cache-clear: dir: "{{.G_INTEGRATION_TESTS_DIR}}" + preconditions: + - sh: 'command -v uv >/dev/null' + msg: 'uv is required; see dev env setup in docs.' + - sh: 'uv run python -c "import pytest" >/dev/null 2>&1' + msg: 'pytest not available in the uv environment.' cmds:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
taskfiles/tests/integration.yaml
(1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/pyproject.toml:11-13
Timestamp: 2025-08-16T10:24:29.316Z
Learning: In the CLP project's integration-tests directory, the integration tests package is not intended to be shipped or distributed - only the tests are leveraged for local testing purposes, so console script entry points should not be included in pyproject.toml.
📚 Learning: 2025-07-07T17:41:15.655Z
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in `install-prebuilt-packages.sh`, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like `--strip-components=1` to handle potential tarball layout changes.
Applied to files:
taskfiles/tests/integration.yaml
📚 Learning: 2025-08-17T16:10:38.722Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
Applied to files:
taskfiles/tests/integration.yaml
📚 Learning: 2025-08-16T10:24:29.316Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/pyproject.toml:11-13
Timestamp: 2025-08-16T10:24:29.316Z
Learning: In the CLP project's integration-tests directory, the integration tests package is not intended to be shipped or distributed - only the tests are leveraged for local testing purposes, so console script entry points should not be included in pyproject.toml.
Applied to files:
taskfiles/tests/integration.yaml
📚 Learning: 2025-08-09T04:07:27.083Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.
Applied to files:
taskfiles/tests/integration.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: package-image
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: build-macos (macos-15, false)
🔇 Additional comments (2)
taskfiles/tests/integration.yaml (2)
3-5
: Good: global var follows naming convention and centralises pathG_INTEGRATION_TESTS_DIR is appropriately G_-prefixed and reused by tasks.
23-26
: Good: task-scoped env over inline exportsScoped CLP_* env keeps globals clean and matches house style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (2)
taskfile.yaml (1)
11-11
: Include wiring: verify thattests:integration:cache-clear
is actually exposed under thetests
namespaceThe include path and namespacing look right, but please confirm that:
taskfiles/tests/main.yaml
exists and hasversion: "3"
;- it (or its transitive includes) defines a task named
integration:cache-clear
so the root can calltests:integration:cache-clear
.Run this to verify:
#!/bin/bash set -euo pipefail # 1) File exists and is Task v3 test -f taskfiles/tests/main.yaml || { echo "Missing: taskfiles/tests/main.yaml"; exit 1; } rg -nP '^\s*version:\s*"?3"?\s*$' taskfiles/tests/main.yaml || { echo "taskfiles/tests/main.yaml: version != 3"; exit 1; } # 2) Check that some included file defines integration:cache-clear rg -nP '^\s{2,}integration:cache-clear\s*:' taskfiles/tests || { echo "No task named 'integration:cache-clear' found under taskfiles/tests/*"; exit 1; } # 3) If go-task is available, ensure the namespaced task is discoverable if command -v task >/dev/null 2>&1; then task -a | rg -n '^\s*tests:integration:cache-clear\b' || { echo "'tests:integration:cache-clear' not discoverable by go-task"; exit 1; } else echo "Note: go-task not installed; skipped discovery check." fiintegration-tests/tests/fixtures/integration_test_logs.py (1)
79-81
: Best-effort cleanup of partial artefacts on failureRe-raising is good; also delete any partial tarball to avoid corrupt leftovers.
@@ - except Exception as e: - err_msg = f"Failed to download and extract dataset `{name}`." - raise RuntimeError(err_msg) from e + except Exception as e: + # Best-effort cleanup of partial artefacts + try: + if integration_test_logs.tarball_path.exists(): + integration_test_logs.tarball_path.unlink() + except Exception: + pass + err_msg = f"Failed to download and extract dataset `{name}`." + raise RuntimeError(err_msg) from e
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
integration-tests/tests/fixtures/integration_test_logs.py
(1 hunks)taskfile.yaml
(3 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/pyproject.toml:11-13
Timestamp: 2025-08-16T10:24:29.316Z
Learning: In the CLP project's integration-tests directory, the integration tests package is not intended to be shipped or distributed - only the tests are leveraged for local testing purposes, so console script entry points should not be included in pyproject.toml.
📚 Learning: 2025-08-17T16:10:38.722Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
Applied to files:
integration-tests/tests/fixtures/integration_test_logs.py
taskfile.yaml
📚 Learning: 2025-08-08T21:15:10.905Z
Learnt from: haiqi96
PR: y-scope/clp#1100
File: integration-tests/tests/test_identity_transformation.py:87-95
Timestamp: 2025-08-08T21:15:10.905Z
Learning: In the CLP project's integration tests (Python code), variable names should use "logs" (plural) rather than "log" (singular) when referring to test logs or log-related entities, as this aligns with the naming conventions used throughout the codebase.
Applied to files:
integration-tests/tests/fixtures/integration_test_logs.py
📚 Learning: 2025-08-20T09:01:25.825Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/tests/fixtures/integration_test_logs.py:60-71
Timestamp: 2025-08-20T09:01:25.825Z
Learning: For integration tests using curl downloads in PR #1100, curl's built-in timeout and retry mechanisms (with --fail and --show-error flags) are sufficient, and additional subprocess timeout parameters are not needed.
Applied to files:
integration-tests/tests/fixtures/integration_test_logs.py
📚 Learning: 2025-08-09T04:07:27.083Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.
Applied to files:
taskfile.yaml
🧬 Code graph analysis (1)
integration-tests/tests/fixtures/integration_test_logs.py (2)
integration-tests/tests/utils/config.py (2)
IntegrationTestConfig
(74-95)IntegrationTestLogs
(99-123)integration-tests/tests/utils/utils.py (1)
unlink
(54-75)
🪛 GitHub Actions: clp-lint
integration-tests/tests/fixtures/integration_test_logs.py
[error] 75-75: lint:check-no-cpp failed: Ruff format would reformat 1 file (tests/fixtures/integration_test_logs.py). Run 'ruff format' to apply formatting changes.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: package-image
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
🔇 Additional comments (4)
taskfile.yaml (1)
610-610
: No action needed.integration-tests/tests/fixtures/integration_test_logs.py (3)
18-29
: Good fixture: hive_24hrSession-scoped, typed, and delegates to the shared helper cleanly.
32-43
: Good fixture: postgresqlMirrors hive_24hr; consistent and reusable.
46-60
: Helper structure reads wellInstantiation + early return via pytest cache is straightforward; logging is clear.
unlink(integration_test_logs.extraction_dir) | ||
shutil.unpack_archive( | ||
integration_test_logs.tarball_path, integration_test_logs.extraction_dir | ||
) | ||
subprocess.run(["chmod","-R", "gu+w", integration_test_logs.extraction_dir], check=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Make chmod cross‑platform (or guard by OS)
Directly invoking chmod will fail on Windows runners. Either gate by OS, or switch to a Python walk that sets write bits. Minimal guard shown below.
@@
- shutil.unpack_archive(
- integration_test_logs.tarball_path, integration_test_logs.extraction_dir
- )
- subprocess.run(["chmod","-R", "gu+w", integration_test_logs.extraction_dir], check=True)
+ shutil.unpack_archive(
+ integration_test_logs.tarball_path, integration_test_logs.extraction_dir
+ )
+ # Avoid POSIX-only chmod on Windows
+ if os.name != "nt":
+ subprocess.run(
+ ["chmod", "-R", "gu+w", str(integration_test_logs.extraction_dir)],
+ check=True,
+ )
Add import (outside this hunk):
import os
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="integration-tests/tests/fixtures/integration_test_logs.py"
echo "== existence =="
if [ ! -f "$file" ]; then
echo "MISSING: $file"
exit 0
fi
echo "== file (with line numbers) =="
nl -ba "$file" | sed -n '1,240p'
echo "== grep for relevant tokens =="
rg -n --hidden --line-number "chmod" || true
rg -n --hidden --line-number "unpack_archive" || true
rg -n --hidden --line-number "subprocess.run" || true
rg -n --hidden --line-number "import os" || true
Length of output: 88
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="integration-tests/tests/fixtures/integration_test_logs.py"
echo "== existence =="
if [ ! -f "$file" ]; then
echo "MISSING: $file"
exit 0
fi
echo "== file (with line numbers 1..240) =="
awk 'NR<=240 {printf "%4d: %s\n", NR, $0}' "$file"
echo "== search for relevant tokens =="
if command -v rg >/dev/null 2>&1; then
rg -n --hidden --line-number "chmod|unpack_archive|subprocess.run|import os" "$file" || true
else
grep -n -E "chmod|unpack_archive|subprocess.run|import os" "$file" || true
fi
Length of output: 3420
Guard POSIX chmod (avoid calling chmod on Windows)
subprocess.run(["chmod","-R","gu+w", ...]) at integration-tests/tests/fixtures/integration_test_logs.py:78 is POSIX-only and will fail on Windows CI; gate the call or set write bits with a Python os.walk.
@@
- shutil.unpack_archive(
- integration_test_logs.tarball_path, integration_test_logs.extraction_dir
- )
- subprocess.run(["chmod","-R", "gu+w", integration_test_logs.extraction_dir], check=True)
+ shutil.unpack_archive(
+ integration_test_logs.tarball_path, integration_test_logs.extraction_dir
+ )
+ # Avoid POSIX-only chmod on Windows
+ if os.name != "nt":
+ subprocess.run(
+ ["chmod", "-R", "gu+w", str(integration_test_logs.extraction_dir)],
+ check=True,
+ )
Add import (outside this hunk):
import os
🧰 Tools
🪛 GitHub Actions: clp-lint
[error] 75-75: lint:check-no-cpp failed: Ruff format would reformat 1 file (tests/fixtures/integration_test_logs.py). Run 'ruff format' to apply formatting changes.
vars: | ||
COMPONENT: "job-orchestration" | ||
- task: "clean-webui" | ||
- task: "tests:integration:cache-clear" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Avoid coupling clean
to Python/pytest; move cache-clear to an opt-in task
Making clean
invoke tests infra can fail on machines without uv/pytest and slows a common housekeeping command. Suggest splitting to a separate aggregate task and keeping clean
lightweight.
Apply:
clean:
cmds:
- "rm -rf '{{.G_BUILD_DIR}}'"
- task: "clean-python-component"
vars:
COMPONENT: "clp-package-utils"
- task: "clean-python-component"
vars:
COMPONENT: "clp-py-utils"
- task: "clean-python-component"
vars:
COMPONENT: "job-orchestration"
- task: "clean-webui"
- - task: "tests:integration:cache-clear"
+
+ clean-all:
+ desc: "Clean build artifacts and integration-test caches"
+ cmds:
+ - task: "clean"
+ - task: "tests:integration:cache-clear"
If you must keep it under clean
, at least guard it as a stopgap:
- - task: "tests:integration:cache-clear"
+ - task: "tests:integration:cache-clear"
+ ignore_error: true
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
- task: "tests:integration:cache-clear" | |
clean: | |
cmds: | |
- "rm -rf '{{.G_BUILD_DIR}}'" | |
- task: "clean-python-component" | |
vars: | |
COMPONENT: "clp-package-utils" | |
- task: "clean-python-component" | |
vars: | |
COMPONENT: "clp-py-utils" | |
- task: "clean-python-component" | |
vars: | |
COMPONENT: "job-orchestration" | |
- task: "clean-webui" | |
clean-all: | |
desc: "Clean build artifacts and integration-test caches" | |
cmds: | |
- task: "clean" | |
- task: "tests:integration:cache-clear" |
🤖 Prompt for AI Agents
In taskfile.yaml around line 67, the clean task currently runs the tests
integration cache-clear task which couples clean to Python/pytest and slows a
common housekeeping command; remove "tests:integration:cache-clear" from the
clean aggregate and create a new opt-in aggregate task (e.g.,
"clean:with-cache-clear" or "maintenance:cache-clear") that includes the
tests:integration:cache-clear entry, or if you must keep it under clean
temporarily, wrap the cache-clear invocation behind an explicit flag/environment
check so it only runs when requested (e.g., a parameter or env var) and update
task descriptions accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
integration-tests/tests/fixtures/integration_test_logs.py (2)
82-88
: Make permission fix cross‑platform; avoid hard‑failing when chmod is absentDon’t error out on Windows or environments without chmod. Gate by OS and provide a Python fallback to set write bits.
Apply this diff:
@@ - # Allow the extracted content to be deletable or overwritable - chmod_bin = shutil.which("chmod") - if chmod_bin is None: - err_msg = "chmod executable not found" - raise RuntimeError(err_msg) - subprocess.run([chmod_bin, "-R", "gu+w", integration_test_logs.extraction_dir], check=True) + # Allow the extracted content to be deletable or overwritable + chmod_bin = shutil.which("chmod") + if os.name != "nt" and chmod_bin is not None: + subprocess.run( + [chmod_bin, "-R", "u+w,g+w", str(integration_test_logs.extraction_dir)], + check=True, + ) + else: + # Python fallback: ensure write bit for user (and group where supported) + for p in [integration_test_logs.extraction_dir, *integration_test_logs.extraction_dir.rglob("*")]: + try: + mode = os.stat(p).st_mode + os.chmod(p, mode | stat.S_IWUSR | getattr(stat, "S_IWGRP", 0)) + except Exception: + passAdd imports at the top of the file:
import os import stat
78-80
: Clean up partial artefacts on failureBest‑effort remove a partially downloaded tarball and any partially extracted directory before re‑raising.
@@ - except Exception as e: - err_msg = f"Failed to download and extract dataset `{name}`." - raise RuntimeError(err_msg) from e + except Exception as e: + # Best-effort cleanup of partial artefacts + try: + if integration_test_logs.tarball_path.exists(): + integration_test_logs.tarball_path.unlink() + unlink(integration_test_logs.extraction_dir) + except Exception: + pass + err_msg = f"Failed to download and extract dataset `{name}`." + raise RuntimeError(err_msg) from e
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
integration-tests/tests/fixtures/integration_test_logs.py
(1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/pyproject.toml:11-13
Timestamp: 2025-08-16T10:24:29.316Z
Learning: In the CLP project's integration-tests directory, the integration tests package is not intended to be shipped or distributed - only the tests are leveraged for local testing purposes, so console script entry points should not be included in pyproject.toml.
📚 Learning: 2025-08-17T16:10:38.722Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
Applied to files:
integration-tests/tests/fixtures/integration_test_logs.py
📚 Learning: 2025-08-08T21:15:10.905Z
Learnt from: haiqi96
PR: y-scope/clp#1100
File: integration-tests/tests/test_identity_transformation.py:87-95
Timestamp: 2025-08-08T21:15:10.905Z
Learning: In the CLP project's integration tests (Python code), variable names should use "logs" (plural) rather than "log" (singular) when referring to test logs or log-related entities, as this aligns with the naming conventions used throughout the codebase.
Applied to files:
integration-tests/tests/fixtures/integration_test_logs.py
📚 Learning: 2025-08-20T09:01:25.825Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/tests/fixtures/integration_test_logs.py:60-71
Timestamp: 2025-08-20T09:01:25.825Z
Learning: For integration tests using curl downloads in PR #1100, curl's built-in timeout and retry mechanisms (with --fail and --show-error flags) are sufficient, and additional subprocess timeout parameters are not needed.
Applied to files:
integration-tests/tests/fixtures/integration_test_logs.py
🧬 Code graph analysis (1)
integration-tests/tests/fixtures/integration_test_logs.py (3)
integration-tests/tests/utils/config.py (2)
IntegrationTestConfig
(74-95)IntegrationTestLogs
(99-123)integration-tests/tests/utils/utils.py (1)
unlink
(54-75)integration-tests/tests/fixtures/integration_test_config.py (1)
integration_test_config
(16-29)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: package-image
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (1)
integration-tests/tests/fixtures/integration_test_logs.py (1)
18-30
: Fixtures look goodScopes, typing, and naming are consistent. No issues spotted here.
Also applies to: 32-44
chmod_bin = shutil.which("chmod") | ||
if chmod_bin is None: | ||
err_msg = "chmod executable not found" | ||
raise RuntimeError(err_msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this for chmod
but not for curl
? Won't subprocess.run
fail anyway if the command can't be found?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess the linter didn't complain because I turned off the formatting for the curl block.
Description
As the CLP codebase grows, C++ unit tests alone are no longer sufficient. We now need integration tests that validate the correctness of the entire codebase and to ensure components work correctly together. This not only helps developers catch issues earlier but also reduces review overhead.
This PR establishes a foundation for future, more comprehensive integration testing. Specifically, it introduces:
pytest
black
,ruff
, andmypy
uv
Within the
pytest
setup, this PR adds:pytest.ini
with default options, logging, and markersChecklist
breaking change.
Validation performed
Summary by CodeRabbit
New Features
Tests
Documentation
Chores