-
Notifications
You must be signed in to change notification settings - Fork 82
feat(clp-package): Add Spider and its dependencies to the build. #1318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughMigrates public/task/config surfaces from the "core" namespace to "cpp": adds Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant CI as CI Workflow
participant Tasks as Taskfiles (deps)
participant CMake as CMake (build/deps/cpp)
participant Spider as Spider build
CI->>Tasks: run deps:cpp / cpp-all-parallel
Note right of Tasks #E7F6FF: Tasks now use G_DEPS_CPP_* and\nG_CPP_MAX_PARALLELISM_PER_BUILD_TASK
Tasks->>CMake: configure / build / install (CMAKE_SETTINGS_DIR=.../cpp)
CMake-->>Tasks: deps installed under build/deps/cpp
CI->>Tasks: run spider / spider-all-parallel
Tasks->>Spider: invoke spider build with cpp CMake args
Spider-->>Tasks: spider artifacts produced
CI->>CoreComponent: configure components/core
CoreComponent->>CMake: optionally include ../../build/deps/cpp/cmake-settings/all.cmake
CoreComponent-->>CI: continue build using cpp deps if included
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
taskfiles/lint.yaml (2)
763-765
: Include spider-orchestration in UV-driven lint/type-check passes.It’s added to the first loop (black/ruff), but not to UV_PYTHON_PROJECTS, so it will miss ruff format and mypy under uv.
py: internal: true vars: UV_PYTHON_PROJECTS: - - "{{.G_INTEGRATION_TESTS_DIR}}" + - "{{.G_INTEGRATION_TESTS_DIR}}" + - "components/spider-orchestration"
769-779
: Optional: run tools from the project root for consistency.You currently lint the package subdir "components/spider-orchestration/spider_orchestration". Consider running from the project root to let tools read pyproject.toml configs consistently.
- - "components/spider-orchestration/spider_orchestration" + - "components/spider-orchestration"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (1)
components/spider-orchestration/poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (4)
components/spider-orchestration/README.md
(1 hunks)components/spider-orchestration/pyproject.toml
(1 hunks)taskfiles/deps/main.yaml
(9 hunks)taskfiles/lint.yaml
(1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-08-20T05:38:30.720Z
Learnt from: junhaoliao
PR: y-scope/clp#1242
File: taskfile.yaml:545-547
Timestamp: 2025-08-20T05:38:30.720Z
Learning: The python component directories in the y-scope/clp repository are kept very clean and only contain essential files: source code directories, pyproject.toml, poetry.lock, README.md, and the dist/ directory. No cache directories, virtual environments, or other transient files exist in the component directories, so additional exclude patterns beyond "dist/**/*" are unnecessary.
Applied to files:
taskfiles/lint.yaml
📚 Learning: 2025-01-16T16:58:43.190Z
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Applied to files:
components/spider-orchestration/README.md
📚 Learning: 2025-07-23T09:54:45.185Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Applied to files:
taskfiles/deps/main.yaml
📚 Learning: 2025-08-09T04:07:27.083Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.
Applied to files:
taskfiles/deps/main.yaml
📚 Learning: 2025-09-15T18:53:15.844Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1226
File: taskfiles/deps/main.yaml:522-527
Timestamp: 2025-09-15T18:53:15.844Z
Learning: When ystdlib is built with -Dystdlib_BUILD_TESTING=OFF in taskfiles/deps/main.yaml, it does not require Catch2 settings to be injected via CMAKE_GEN_ARGS since testing is disabled and Catch2 is not needed during the build process.
Applied to files:
taskfiles/deps/main.yaml
🪛 markdownlint-cli2 (0.18.1)
components/spider-orchestration/README.md
3-3: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: package-image
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: build-macos (macos-14, true)
- GitHub Check: build-macos (macos-14, false)
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (7)
components/spider-orchestration/pyproject.toml (1)
13-22
: Validate dependency pins and pre-release usage.Confirm that yscope-spider-py "~0.0.1-beta.2" is intended (pre-release) and compatible with Python 3.10+, and that Pydantic v1 is required (v2 is common now). If not intentional, update pins accordingly.
taskfiles/deps/main.yaml (6)
21-29
: LGTM: centralizing LIB_NAME variables improves maintainability.Good move; reduces duplication and eases future renames.
84-92
: Verify CI behaviour when spider/mariadb tasks are Linux/Ubuntu-gated.They’re declared as deps of core-all-parallel. Ensure the platform gating won’t hard-fail on non-Linux runners (Task should skip cleanly). If not, gate at the caller or split platform-specific aggregates.
578-593
: LGTM: ystdlib PIC enabled and LIB_NAME centralized.Matches prior learning about testing OFF; PIC is a good default for linkers.
210-217
: LGTM: adding Boost.Process target.Likely needed by Spider; no issues.
454-471
: LGTM: spdlog configured to use external fmt and centralized LIB_NAME.Consistent with fmt dep; good.
409-421
: LGTM: msgpack-cxx C++20 and CMP0074 NEW with centralized LIB_NAME.Settings look correct.
spider-orchestration
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (1)
components/spider-orchestration/poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (2)
components/spider-orchestration/README.md
(1 hunks)components/spider-orchestration/pyproject.toml
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-01-16T16:58:43.190Z
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Applied to files:
components/spider-orchestration/README.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
🔇 Additional comments (6)
components/spider-orchestration/README.md (3)
1-1
: Header reads wellTitle is clear and scoped to Spider usage.
3-3
: Remove trailing whitespace (markdownlint MD009).There’s one trailing space at the end of Line 3.
-This Python module contains CLP's scheduler and worker to handle distributed compression.· +This Python module contains CLP's scheduler and worker to handle distributed compression.
3-4
: Fix README scope — don't claim the package ships a scheduler/worker.
This package orchestrates CLP via Spider and does not ship the scheduler or worker; tighten wording accordingly.-This Python module contains CLP's scheduler and worker to handle distributed compression. -CLP's Compression Job Handler can be used to interface and submit compression jobs to the CLP scheduler. +This Python module orchestrates CLP compression jobs via Spider, integrating with the CLP scheduler and workers. +The Compression Job Handler interfaces with the CLP scheduler to submit compression jobs.components/spider-orchestration/pyproject.toml (3)
11-13
: Confirm minimum Python version policy (3.9 EOL in Oct 2025).If you don’t need 3.9, consider bumping to ">=3.10,<4.0" to align with Ubuntu 22.04+ and reduce maintenance.
Would you like a follow-up patch to raise and add classifiers accordingly?
1-3
: Avoid duplicate Python version sources; drop the unused [project] block.With poetry-core, metadata comes from [tool.poetry]. Keeping [project] risks conflicting signals.
Apply:
-[project] -requires-python = ">= 3.9, <4.0" -
9-9
: README present — no action required.
components/spider-orchestration/README.md exists (verified); pyproject.toml's readme = "README.md" is valid.
[tool.poetry] | ||
name = "spider-orchestration" | ||
version = "0.5.1-dev" | ||
description = "Scheduler and worker cluster for CLP's distributed architecture." | ||
authors = ["YScope Inc. <dev@yscope.com>"] | ||
readme = "README.md" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick
Add license metadata.
Include the project licence in package metadata.
Apply:
readme = "README.md"
+license = "Apache-2.0"
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
[tool.poetry] | |
name = "spider-orchestration" | |
version = "0.5.1-dev" | |
description = "Scheduler and worker cluster for CLP's distributed architecture." | |
authors = ["YScope Inc. <dev@yscope.com>"] | |
readme = "README.md" | |
[tool.poetry] | |
name = "spider-orchestration" | |
version = "0.5.1-dev" | |
description = "Scheduler and worker cluster for CLP's distributed architecture." | |
authors = ["YScope Inc. <dev@yscope.com>"] | |
readme = "README.md" | |
license = "Apache-2.0" |
🤖 Prompt for AI Agents
In components/spider-orchestration/pyproject.toml around lines 4-9, the Poetry
project metadata is missing license information; add a license entry (use the
repository's SPDX identifier, e.g. license = "MIT") and include license-file =
"LICENSE" under the [tool.poetry] table so the package metadata exposes the
project's license and points to the LICENSE file.
|
||
[tool.poetry] | ||
name = "spider-orchestration" | ||
version = "0.5.1-dev" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a PEP 440–compliant version.
"0.5.1-dev" isn’t valid per PEP 440 and can break build/publish. Use a .devN pre-release.
Apply:
-version = "0.5.1-dev"
+version = "0.5.1.dev0"
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
version = "0.5.1-dev" | |
version = "0.5.1.dev0" |
🤖 Prompt for AI Agents
In components/spider-orchestration/pyproject.toml around line 6, the version
"0.5.1-dev" is not PEP 440–compliant; replace it with a PEP 440 .dev pre-release
(for example "0.5.1.dev0" or "0.5.1.dev1") by updating the version field
accordingly so packaging and publishing tools accept it.
# MariaDB version must be compatible with libmariadb-dev <3.3.1 installed in the runtime | ||
# environment. The default version of libmariadb-dev in Ubuntu 22.04 is 3.2.6. | ||
# See https://mariadb.com/docs/connectors/mariadb-connector-python/install#build-prerequisites | ||
mariadb = "~1.0.11" | ||
msgpack = "^1.0.7" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gate mariadb install to Linux to prevent cross‑platform failures.
The MariaDB Python package often relies on system libmariadb; on macOS/Windows this can fail unexpectedly.
Apply:
-mariadb = "~1.0.11"
+mariadb = { version = "~1.0.11", markers = "sys_platform == 'linux'" }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# MariaDB version must be compatible with libmariadb-dev <3.3.1 installed in the runtime | |
# environment. The default version of libmariadb-dev in Ubuntu 22.04 is 3.2.6. | |
# See https://mariadb.com/docs/connectors/mariadb-connector-python/install#build-prerequisites | |
mariadb = "~1.0.11" | |
msgpack = "^1.0.7" | |
# MariaDB version must be compatible with libmariadb-dev <3.3.1 installed in the runtime | |
# environment. The default version of libmariadb-dev in Ubuntu 22.04 is 3.2.6. | |
# See https://mariadb.com/docs/connectors/mariadb-connector-python/install#build-prerequisites | |
mariadb = { version = "~1.0.11", markers = "sys_platform == 'linux'" } | |
msgpack = "^1.0.7" |
🤖 Prompt for AI Agents
In components/spider-orchestration/pyproject.toml around lines 14 to 18, the
mariadb dependency must be gated to Linux to avoid install failures on
macOS/Windows; change the mariadb entry to be platform‑marked so it only
installs on Linux (use a PEP 508 marker such as sys_platform == "linux" on the
mariadb requirement), leaving msgpack unchanged.
# MariaDB version must be compatible with libmariadb-dev <3.3.1 installed in the runtime | ||
# environment. The default version of libmariadb-dev in Ubuntu 22.04 is 3.2.6. | ||
# See https://mariadb.com/docs/connectors/mariadb-connector-python/install#build-prerequisites |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Validate the libmariadb note and keep it current.
The comment references specific libmariadb-dev versions on Ubuntu 22.04. Please verify these constraints against the current MariaDB Connector/Python 1.0.x guidance.
🌐 Web query:
What libmariadb (Connector/C) versions are compatible with MariaDB Connector/Python 1.0.11 on Ubuntu 22.04, and is there any documented upper bound (e.g., <3.3.1)?
💡 Result:
Short answer
- MariaDB Connector/Python 1.0.11 requires MariaDB Connector/C (libmariadb) >= 3.1.5 per the official install docs for the 1.0.x line. [1][2]
- There was a reported installer/runtime check that required >= 3.2.4 for some installs (Jira CONPY-215), so in practice you may need Connector/C ≥ 3.2.4 on some systems. [3]
- There is no documented upper‑bound (e.g. “< 3.3.1”) listed for Connector/Python 1.0.11; later Connector/Python 1.1.x simply raised the documented minimum to Connector/C ≥ 3.3.1. [1][4]
Sources
- MariaDB Connector/Python installation / dependencies (shows 3.1.5 minimum for <1.1.0 and 3.3.1 for 1.1+). [1]
- Release notes for Connector/Python 1.0.11. [2]
- MariaDB Jira CONPY‑215 (installer reported requirement >= 3.2.4 for 1.0.11). [3]
- Connector/Python 1.1 install docs (Connector/C >= 3.3.1). [4]
If you want, I can check which libmariadb package versions are available on Ubuntu 22.04 (apt names/versions) and recommend a specific package to install.
Update libmariadb note: remove the '<3.3.1' upper bound and state the correct Connector/C minima
Connector/Python 1.0.11 documents MariaDB Connector/C (libmariadb) ≥ 3.1.5 (some installs reported needing ≥ 3.2.4); there is no documented upper bound such as "<3.3.1" — Connector/Python 1.1+ raises the minimum to ≥ 3.3.1. Update components/spider-orchestration/pyproject.toml (lines 14–16) to reflect this.
🤖 Prompt for AI Agents
In components/spider-orchestration/pyproject.toml around lines 14 to 16, remove
the erroneous "<3.3.1" upper bound and update the note to state the correct
Connector/C (libmariadb) minima: for Connector/Python 1.0.11 require libmariadb
≥ 3.1.5 (with some installs reporting need for ≥ 3.2.4), and note that
Connector/Python 1.1+ raises the minimum to ≥ 3.3.1; keep the Ubuntu 22.04
default version note and link reference intact.
[build-system] | ||
requires = ["poetry-core"] | ||
build-backend = "poetry.core.masonry.api" No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick
Pin a minimum poetry-core for reproducibility.
Unpinned build backends can change behaviour subtly across environments.
Apply:
-requires = ["poetry-core"]
+requires = ["poetry-core>=1.8.0"]
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
[build-system] | |
requires = ["poetry-core"] | |
build-backend = "poetry.core.masonry.api" | |
[build-system] | |
requires = ["poetry-core>=1.8.0"] | |
build-backend = "poetry.core.masonry.api" |
🤖 Prompt for AI Agents
In components/spider-orchestration/pyproject.toml around lines 25 to 27, pin the
poetry-core build dependency to a minimum version to ensure reproducible builds;
change the requires entry from ["poetry-core"] to a minimum-pinned spec such as
["poetry-core>=1.5.0"] (keep build-backend = "poetry.core.masonry.api"
unchanged).
# CLP Job Orchestration using Spider | ||
|
||
This Python module contains CLP's scheduler and worker to handle distributed compression. | ||
CLP's Compression Job Handler can be used to interface and submit compression jobs to the CLP scheduler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick
Add platform/build note for Spider and MariaDB Connector C++.
Readers should know these C++ deps are built only on Ubuntu as per this PR.
Suggested addition (new paragraph after Line 4):
+Note: As part of project setup, the Spider library and MariaDB Connector/C++ are built and installed on Ubuntu platforms only. Other platforms skip these native builds.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
CLP's Compression Job Handler can be used to interface and submit compression jobs to the CLP scheduler. | |
CLP's Compression Job Handler can be used to interface and submit compression jobs to the CLP scheduler. | |
Note: As part of project setup, the Spider library and MariaDB Connector/C++ are built and installed on Ubuntu platforms only. Other platforms skip these native builds. |
🤖 Prompt for AI Agents
In components/spider-orchestration/README.md around line 4, add a new paragraph
after the existing sentence stating that the Spider and MariaDB Connector C++
dependencies are built only on Ubuntu (per the referenced PR); briefly note the
supported OS (Ubuntu), any required build/toolchain/version constraints if
known, and recommend using Ubuntu or an Ubuntu-based Docker image for local
builds or CI; include a link or reference to the PR for details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before we proceed: do we need a new namespace under components? I thought spider related subcomponents can also be located in the current job-orchestration
: the current job orchestration already contains non-celery related components.
But as I'm also new to the package side of thing, let's wait for @junhaoliao @kirkrodrigues to confirm.
Yeah, name-wise, I think it makes more sense to be under |
I have two reasons to use a separate "component":
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
docs/src/dev-docs/components-core/index.md
(3 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.
📚 Learning: 2025-08-18T05:43:07.868Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1184
File: components/core/cmake/Modules/FindLibLZMA.cmake:21-24
Timestamp: 2025-08-18T05:43:07.868Z
Learning: In the CLP project, all supplied `<lib>_ROOT` variables will live within the `CLP_CORE_DEPS_DIR` as part of their architectural design. This means that using CLP_CORE_DEPS_DIR for library discovery in custom Find modules is the intended approach, and prioritizing individual `<lib>_ROOT` variables over CLP_CORE_DEPS_DIR is not necessary.
Applied to files:
docs/src/dev-docs/components-core/index.md
📚 Learning: 2025-07-23T09:54:45.185Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Applied to files:
docs/src/dev-docs/components-core/index.md
📚 Learning: 2025-06-18T20:48:48.990Z
Learnt from: quinntaylormitchell
PR: y-scope/clp#968
File: docs/src/user-guide/quick-start/overview.md:53-54
Timestamp: 2025-06-18T20:48:48.990Z
Learning: CLP is designed to run on Linux systems where Python is typically pre-installed, so Python installation links are generally not needed in CLP documentation.
Applied to files:
docs/src/dev-docs/components-core/index.md
📚 Learning: 2025-08-25T00:45:05.464Z
Learnt from: junhaoliao
PR: y-scope/clp#1242
File: components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh:39-41
Timestamp: 2025-08-25T00:45:05.464Z
Learning: Task v3.44.1 causes issues in the clp-ffi-js project (issue #110), so CLP should avoid upgrading to v3.44.1 and stick with v3.44.0 for now.
Applied to files:
docs/src/dev-docs/components-core/index.md
📚 Learning: 2025-08-09T04:07:27.083Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.
Applied to files:
docs/src/dev-docs/components-core/index.md
📚 Learning: 2025-08-29T07:31:24.032Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1271
File: components/core/tools/scripts/lib_install/centos-stream-9/install-prebuilt-packages.sh:41-43
Timestamp: 2025-08-29T07:31:24.032Z
Learning: In CLP's build tool installation scripts, uv version constraints should use lower bound constraints (uv>=0.8) rather than exact version pinning, following the same philosophy used for other build tools to accommodate platform differences while ensuring minimum required functionality.
Applied to files:
docs/src/dev-docs/components-core/index.md
🔇 Additional comments (1)
docs/src/dev-docs/components-core/index.md (1)
140-140
: Spider reference link LGTMLink is clear and matches usage above.
To get all dependencies required by CLP core, run the `deps:cpp` task: | ||
|
||
```shell | ||
task deps:core | ||
task deps:cpp | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick
🧩 Analysis chain
deps:cpp rename looks good; add a brief migration note and sweep remaining docs/usages
Add a one‑liner noting the rename to help users updating local scripts. Also sweep the repo for any remaining references to deps:core/core-all-parallel.
Apply this diff:
-To get all dependencies required by CLP core, run the `deps:cpp` task:
+To get all dependencies required by CLP core, run the `deps:cpp` task:
+
+Note: Previously `deps:core`. Update any local scripts accordingly.
Run this script to find stale references:
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Searching for deprecated task names..."
rg -n -C2 -S $'\bdeps:core\b|\bcore-all-parallel\b'
Length of output: 38
Add migration note for renamed deps:cpp
task
Add this after the example to help users update local scripts:
-To get all dependencies required by CLP core, run the `deps:cpp` task:
+To get all dependencies required by CLP core, run the `deps:cpp` task:
+
+Note: Previously `deps:core`. Update any local scripts accordingly.
No stale references to deps:core
or core-all-parallel
remain in the repo.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
To get all dependencies required by CLP core, run the `deps:cpp` task: | |
```shell | |
task deps:core | |
task deps:cpp | |
``` | |
To get all dependencies required by CLP core, run the `deps:cpp` task: | |
Note: Previously `deps:core`. Update any local scripts accordingly. | |
🤖 Prompt for AI Agents
In docs/src/dev-docs/components-core/index.md around lines 31 to 35, add a short
migration note immediately after the deps:cpp example explaining that the task
was renamed from deps:core (and previously core-all-parallel) to deps:cpp, show
the new command developers should use in their local scripts, and instruct them
to search the repo for and remove or update any stale references to deps:core or
core-all-parallel; keep the note brief and placed directly after the existing
example.
Notice that this task will also download, build, and install the following libraries that are not | ||
directly used by CLP core, but are required by some other C++ components such as [Spider]: | ||
|
||
| Library | Version/commit | Note | | ||
|---------------------------------------------------------------------------------------|-----------------|--------------------------------------------------------------------| | ||
| [mariadb-connector-cpp](https://github.com/mariadb-corporation/mariadb-connector-cpp) | v1.1.5 | Required by [Spider]. Built only when the host platform is Ubuntu. | | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick
🧩 Analysis chain
Clarify Ubuntu versions for mariadb-connector-cpp and verify task gating
Good call-out. Please specify supported Ubuntu releases (e.g., 22.04/24.04) to set expectations, and confirm the Task gating aligns.
Suggested wording tweak:
-| [mariadb-connector-cpp](https://github.com/mariadb-corporation/mariadb-connector-cpp) | v1.1.5 | Required by [Spider]. Built only when the host platform is Ubuntu. |
+| [mariadb-connector-cpp](https://github.com/mariadb-corporation/mariadb-connector-cpp) | v1.1.5 | Required by [Spider]. Built only when the host is Ubuntu (22.04/24.04). |
Verify gating and version consistency:
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Locate mariadb-connector-cpp task definitions and OS gating..."
rg -n -C3 -S 'mariadb-connector-cpp|ubuntu|ID=ubuntu|lsb_release|/etc/os-release|UBUNTU' --type sh --type toml --type yaml --type make --type txt
echo "Confirm documented version matches installer scripts/config..."
rg -n -C2 -S 'mariadb-connector-cpp.*1\.1\.5|1\.1\.5.*mariadb-connector-cpp'
Length of output: 9027
Clarify Ubuntu support for mariadb-connector-cpp — document Ubuntu 22.04 (jammy) and confirm gating
Verified: docs list v1.1.5 and taskfiles/deps/main.yaml pins tags/1.1.5; the task is gated to run only on Ubuntu via /etc/os-release ID checks, but build scripts and docker images target jammy (Ubuntu 22.04) — no evidence of 24.04 support.
Files to check/update:
- docs/src/dev-docs/components-core/index.md (existing entry)
- taskfiles/deps/main.yaml (mariadb-connector-cpp block, ~lines 342–364)
- components/core/tools/scripts/lib_install/mariadb-connector-c.sh (uses UBUNTU_CODENAME)
- components/core/tools/docker-images/-ubuntu-jammy/
Suggested wording tweak:
| mariadb-connector-cpp | v1.1.5 | Required by [Spider]. Built only when the host is Ubuntu (tested on Ubuntu 22.04 "jammy"). |
🤖 Prompt for AI Agents
In docs/src/dev-docs/components-core/index.md around lines 64–70, the
mariadb-connector-cpp note is ambiguous about which Ubuntu release is supported;
update the table cell to explicitly state it is built only on Ubuntu and tested
on Ubuntu 22.04 "jammy" (suggested wording: "...Built only when the host is
Ubuntu (tested on Ubuntu 22.04 'jammy')."). Also verify and, if needed, update
taskfiles/deps/main.yaml (mariadb-connector-cpp block ~lines 342–364) to ensure
the gating logic matches the docs (OS check only runs on Ubuntu), confirm
components/core/tools/scripts/lib_install/mariadb-connector-c.sh uses
UBUNTU_CODENAME and documents jammy, and ensure any
components/core/tools/docker-images/*-ubuntu-jammy/* references remain
consistent; if 24.04 is not supported, do not claim it and keep docs pointing to
jammy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add deps:spider
to the package workflow? It's better that the changes to this type of PR can be tested immediately on the CI.
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
taskfiles/deps/main.yaml
(17 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
📚 Learning: 2025-08-09T04:07:27.083Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.
Applied to files:
taskfiles/deps/main.yaml
📚 Learning: 2025-05-28T18:33:30.155Z
Learnt from: anlowee
PR: y-scope/clp#925
File: taskfiles/deps/main.yaml:97-106
Timestamp: 2025-05-28T18:33:30.155Z
Learning: In the taskfiles dependency system (taskfiles/deps/main.yaml), echo commands are used to generate .cmake settings files that are consumed by the main CMake build process. These files set variables like ANTLR_RUNTIME_HEADER to point to dependency locations for use during compilation.
Applied to files:
taskfiles/deps/main.yaml
📚 Learning: 2025-06-02T18:22:24.060Z
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Applied to files:
taskfiles/deps/main.yaml
📚 Learning: 2025-09-15T18:53:15.844Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1226
File: taskfiles/deps/main.yaml:522-527
Timestamp: 2025-09-15T18:53:15.844Z
Learning: When ystdlib is built with -Dystdlib_BUILD_TESTING=OFF in taskfiles/deps/main.yaml, it does not require Catch2 settings to be injected via CMAKE_GEN_ARGS since testing is disabled and Catch2 is not needed during the build process.
Applied to files:
taskfiles/deps/main.yaml
📚 Learning: 2025-08-19T07:08:15.583Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1229
File: taskfiles/deps/main.yaml:494-504
Timestamp: 2025-08-19T07:08:15.583Z
Learning: In zlib v1.3.1's CMake build system, only ZLIB_BUILD_EXAMPLES is a valid option. Other ZLIB_BUILD_* flags (MINIZIP, SHARED, STATIC, TESTING) and ZLIB_INSTALL are not defined in the CMakeLists.txt and have no effect. Use BUILD_SHARED_LIBS=ON for shared library control and ZLIB_BUILD_EXAMPLES=OFF to disable examples.
Applied to files:
taskfiles/deps/main.yaml
📚 Learning: 2025-07-23T09:54:45.185Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Applied to files:
taskfiles/deps/main.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: ubuntu-jammy-deps-image
- GitHub Check: musllinux_1_2-x86_64-deps-image
- GitHub Check: centos-stream-9-deps-image
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (6)
taskfiles/deps/main.yaml (6)
10-18
: LGTM! Directory structure changes are consistent.The migration from
core
tocpp
naming is applied consistently across all directory variables and paths. This aligns with the PR objective to reorganize the build layout.
91-111
: LGTM! Spider task implementation is correct.The spider task correctly uses
preconditions
instead ofstatus
, restricts to Ubuntu as intended, and includes proper dependency management through thespider-dep
task. The CMake arguments are appropriate for the spider build configuration.
113-125
: LGTM! Spider dependency task follows established patterns.The
spider-dep
task correctly mirrors the structure of thecore
task, ensuring spider dependencies are built and configured properly before the main spider build.
127-139
: LGTM! Spider parallel dependencies are correctly defined.The dependency list for
spider-all-parallel
includes all necessary libraries for building spider, including the MariaDB connector which is appropriately Ubuntu-gated.
391-411
: LGTM! MariaDB connector task correctly implements Ubuntu gating.The task properly uses
preconditions
with detailed comments explaining the Ubuntu limitation and DEB install layout requirement. The implementation addresses previous review feedback effectively.
596-596
: Remove unnecessary CMAKE_POSITION_INDEPENDENT_CODE flag.Based on the past review comments indicating this flag is not needed, consider removing it unless spider specifically requires position-independent code for shared library usage.
Apply this diff to remove the unnecessary flag:
CMAKE_GEN_ARGS: - "-C {{.G_DEPS_CPP_CMAKE_SETTINGS_DIR}}/{{.G_BOOST_LIB_NAME}}.cmake" - - "-DCMAKE_POSITION_INDEPENDENT_CODE=ON" - "-Dystdlib_BUILD_TESTING=OFF"
TARGETS: | ||
- "filesystem" | ||
- "iostreams" | ||
- "process" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick
❓ Verification inconclusive
Verify spider dependency on boost::process.
The addition of "process" to the boost targets suggests spider requires this library. Ensure this dependency is actually needed by spider's codebase.
The search results don't provide specific information about y-scope spider's boost::process dependency. Let me check for more specific information about the spider project directly.
🏁 Script executed:
#!/bin/bash
# Check if spider repository contains any references to boost process
rg -i "process" --type cpp --type hpp -C3 -g '!test*' | head -20
Length of output: 27
🏁 Script executed:
#!/bin/bash
# Search for any inclusion or direct usage of Boost.Process
rg -n "#include.*boost/process.hpp" -C2 -g '!test*'
rg -n "boost::process::" -C2 -g '!test*'
rg -n "Boost::process" -C2 -g '!test*'
Length of output: 446
🏁 Script executed:
#!/bin/bash
# Search entire repo for Boost.Process includes and namespace usages without file‐type filters
grep -RIn "#include\s*<boost/process.hpp>" .
grep -RIn "boost::process::" .
Length of output: 0
Remove unused Boost.Process dependency
Search across the repository (code and build files) shows no #include <boost/process.hpp>
or boost::process
usages; drop "process"
from the Boost targets in taskfiles/deps/main.yaml.
🤖 Prompt for AI Agents
In taskfiles/deps/main.yaml around line 254, the Boost target list includes
"process" but there are no usages of Boost.Process in the repository; remove the
"process" entry from the Boost targets list in that file so the dependency is no
longer declared in your taskfiles/deps/main.yaml.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the PR title, how about:
feat(clp-package): Add Spider and its dependencies to the build.
yscope-spider-py
as job-orchestration
dependency.- "core" | ||
- "clp-package-utils" | ||
- "clp-py-utils" | ||
- "deps:spider" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should let deps:spider
compute a checksum file and let add the checksum file to the package
task's source
list; otherwise, the package
might not be re-run if there are changes in the artifacts generated.
alternatively, we can simply add spider_scheduler
and spider_worker
to the package
task's source
list.
@coderabbitai create an issue to track
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Create GitHub issue to track spider artifact checksum problem
gh issue create \
--title "Add checksum tracking for spider artifacts in package task" \
--body "## Problem
The \`package\` task in \`taskfile.yaml\` currently depends on \`deps:spider\` but does not track changes to spider artifacts through checksums or direct source dependencies. This means the \`package\` task might not re-run when spider artifacts (\`spider_scheduler\` and \`spider_worker\`) change.
## Proposed Solutions
1. **Option 1**: Let \`deps:spider\` compute a checksum file and add it to the \`package\` task's \`source\` list
2. **Option 2**: Add \`spider_scheduler\` and \`spider_worker\` binaries directly to the \`package\` task's \`source\` list
## Context
- PR: https://github.com/y-scope/clp/pull/1318
- Comment: https://github.com/y-scope/clp/pull/1318#discussion_r2425015201
- Requested by: @junhaoliao" \
--assignee junhaoliao \
--label "enhancement"
Length of output: 42
@junhaoliao, I've created issue #1409 to track adding checksum tracking for spider artifacts in the package task: #1409
The issue includes both proposed solutions and references this PR and comment as backlinks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(first reported by @sitaowang1998 )
it was found that libmariadbcpp.so
, which both spider_scheduler
and spider_worker
dynamically link to, are not added into the package execution environment (i.e., the clp-package
Docker execution image)
@coderabbitai create an issue to track
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Create a GitHub issue for the missing libmariadbcpp.so library
gh issue create \
--title "Missing libmariadbcpp.so in clp-package Docker execution image" \
--body "## Description
The shared library \`libmariadbcpp.so\`, which both \`spider_scheduler\` and \`spider_worker\` dynamically link to, is not included in the package execution environment (the \`clp-package\` Docker execution image). This will cause runtime failures when attempting to execute these binaries.
## Context
- Originally reported by: @sitaowang1998
- Reported by: @junhaoliao
- Related PR: #1318
- Comment: https://github.com/y-scope/clp/pull/1318#discussion_r2425018371
## Impact
The spider scheduler and worker binaries cannot run in the Docker execution environment without this library present." \
--assignee "junhaoliao"
Length of output: 42
@junhaoliao, I've created issue #1410 to track the missing libmariadbcpp.so
library in the clp-package Docker execution environment. The issue includes references to this PR and the original reporters.
🐰✨
Description
This PR
yscope-spider-py
as a dependency ofjob-orchestration
.build/deps/cpp
.mariadb-connector-cpp
andspider
.Notice that
spider
andmariadb-connector-cpp
are both Ubuntu only tasks because of the build issue ofmariadb-connector-cpp
on other platforms and that we don't need to usespider
on other platforms.Checklist
breaking change.
Validation performed
spider
C++ package installed successfully in Ubuntu 24.04.deps:core
and thendeps:spider
or vice versa does not build libraries again in Ubuntu 24.04.job-orchestration
builds.Summary by CodeRabbit
New Features
Refactor
Chores