-
Notifications
You must be signed in to change notification settings - Fork 81
feat(presto-clp): Add Docker compose setup for Presto cluster that can connect to clp-json. #1132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis update introduces a comprehensive Presto deployment for CLP, including Docker Compose orchestration, configuration templates, environment variable management, and setup automation scripts. It also adds user documentation for using Presto with CLP, expands linting coverage, and provides environment and configuration files for both coordinator and worker nodes. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant SetupScript as set-up-config.sh
participant PyScript as generate-user-env-vars-file.py
participant DockerCompose
participant Coordinator
participant Worker
User->>SetupScript: Run with CLP package dir
SetupScript->>PyScript: Generate .env from CLP configs
PyScript->>SetupScript: Write .env file
User->>DockerCompose: docker-compose up
DockerCompose->>Coordinator: Start with entrypoint (generate-configs.sh)
DockerCompose->>Worker: Start with entrypoint (generate-configs.sh)
Coordinator->>Coordinator: Generate configs from templates
Worker->>Worker: Generate configs, fetch coordinator version, update properties
Coordinator->>Coordinator: Start Presto server
Worker->>Worker: Start Presto server
User->>Coordinator: Query logs via Presto CLI
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. ✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 17
🔭 Outside diff range comments (2)
tools/deployment/presto-clp/demo-assets/clp-config.yml (1)
39-40
: Remove trailing blank lines.Static analysis flagged excessive blank lines at the end of the file.
- - +tools/deployment/presto-clp/docker-compose.yaml (1)
39-40
: Remove trailing blank lines.Static analysis flagged excessive blank lines at the end of the file.
- - +
♻️ Duplicate comments (2)
tools/deployment/presto-clp/coordinator/scripts/$ (2)
8-8
: Fix typo in comment.Same spelling error as in the other script.
-# Substitute environemnt variables in config template +# Substitute environment variables in config template
8-11
: Security risk: Avoid shell evaluation for template processing.Same security issue as in the other generate-configs.sh script. The shell evaluation approach poses risks if template files contain malicious code.
Use
envsubst
for safer environment variable substitution as suggested in the other script review.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (18)
tools/deployment/presto-clp/.env
(1 hunks)tools/deployment/presto-clp/README.md
(1 hunks)tools/deployment/presto-clp/coordinator/config-template/clp.properties
(1 hunks)tools/deployment/presto-clp/coordinator/config-template/config.properties
(1 hunks)tools/deployment/presto-clp/coordinator/config-template/jvm.config
(1 hunks)tools/deployment/presto-clp/coordinator/config-template/log.properties
(1 hunks)tools/deployment/presto-clp/coordinator/config-template/metadata-filter.json
(1 hunks)tools/deployment/presto-clp/coordinator/config-template/node.properties
(1 hunks)tools/deployment/presto-clp/coordinator/scripts/$
(1 hunks)tools/deployment/presto-clp/coordinator/scripts/generate-configs.sh
(1 hunks)tools/deployment/presto-clp/demo-assets/clp-config.yml
(1 hunks)tools/deployment/presto-clp/demo-assets/init.sh
(1 hunks)tools/deployment/presto-clp/docker-compose.yaml
(1 hunks)tools/deployment/presto-clp/worker/config-template/clp.properties
(1 hunks)tools/deployment/presto-clp/worker/config-template/config.properties
(1 hunks)tools/deployment/presto-clp/worker/config-template/node.properties
(1 hunks)tools/deployment/presto-clp/worker/config-template/velox.properties
(1 hunks)tools/deployment/presto-clp/worker/scripts/generate-configs.sh
(1 hunks)
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: quinntaylormitchell
PR: y-scope/clp#968
File: docs/src/user-guide/quick-start/overview.md:73-109
Timestamp: 2025-06-18T20:39:05.899Z
Learning: The CLP project team prefers to use video content to demonstrate detailed procedural steps (like tarball extraction) rather than including every step in the written documentation, keeping the docs focused on conceptual guidance.
tools/deployment/presto-clp/worker/config-template/clp.properties (2)
Learnt from: gibber9809
PR: #955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Learnt from: quinntaylormitchell
PR: #961
File: docs/src/dev-guide/design-clp-structured/single-file-archive-format.md:216-219
Timestamp: 2025-06-18T14:35:20.485Z
Learning: In clp-s documentation, technical abbreviations like "MPT" (Merged Parse Tree) should be defined at first use to improve reader clarity and comprehension.
tools/deployment/presto-clp/.env (2)
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/
, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly
declarations.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/
, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly
declarations.
tools/deployment/presto-clp/coordinator/config-template/clp.properties (1)
Learnt from: gibber9809
PR: #955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
tools/deployment/presto-clp/coordinator/scripts/$ (6)
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/
, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly
declarations.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/
, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly
declarations.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in install-prebuilt-packages.sh
, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like --strip-components=1
to handle potential tarball layout changes.
Learnt from: haiqi96
PR: #594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in components/clp-package-utils/clp_package_utils/scripts/
, note that it's preferred to keep error handling simple without adding extra complexity.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-aarch64/build.sh:3-5
Timestamp: 2025-07-07T17:43:04.349Z
Learning: In CLP project build scripts (specifically build.sh files in docker-images directories), maintain consistency with the established pattern of using separate set -eu
and set -o pipefail
commands rather than combining them into set -euo pipefail
, to ensure uniform script structure across all platform build scripts.
Learnt from: kirkrodrigues
PR: #881
File: components/core/tools/scripts/lib_install/macos/install-all.sh:11-12
Timestamp: 2025-05-06T10:07:04.654Z
Learning: In CLP installation scripts, temporary directories with downloaded files should not be automatically cleaned up on failure (e.g., with EXIT traps) to preserve artifacts for debugging purposes.
tools/deployment/presto-clp/coordinator/scripts/generate-configs.sh (5)
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/
, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly
declarations.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/
, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly
declarations.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in install-prebuilt-packages.sh
, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like --strip-components=1
to handle potential tarball layout changes.
Learnt from: gibber9809
PR: #955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Learnt from: haiqi96
PR: #594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in components/clp-package-utils/clp_package_utils/scripts/
, note that it's preferred to keep error handling simple without adding extra complexity.
tools/deployment/presto-clp/demo-assets/clp-config.yml (3)
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/
, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly
declarations.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/
, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly
declarations.
Learnt from: Bill-hbrhbr
PR: #1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
tools/deployment/presto-clp/worker/scripts/generate-configs.sh (7)
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/
, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly
declarations.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/
, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly
declarations.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in install-prebuilt-packages.sh
, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like --strip-components=1
to handle potential tarball layout changes.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-x86/build.sh:18-24
Timestamp: 2025-07-01T14:52:02.418Z
Learning: In the CLP project, consistency across platform build scripts is prioritized over defensive programming when it comes to git remote handling. All build.sh files in docker-images directories should follow the same pattern for git metadata injection.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-x86/build.sh:18-24
Timestamp: 2025-07-01T14:52:02.418Z
Learning: In the CLP project, consistency across platform build scripts is prioritized over defensive programming when it comes to git remote handling. All build.sh files in docker-images directories should follow the same pattern for git metadata injection.
Learnt from: haiqi96
PR: #594
File: components/package-template/src/sbin/del-archives.sh:6-9
Timestamp: 2024-11-15T16:28:08.644Z
Learning: For wrapper scripts in the components/package-template/src/sbin/
directory, keep them simple and avoid adding additional validation code.
Learnt from: haiqi96
PR: #594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in components/clp-package-utils/clp_package_utils/scripts/
, note that it's preferred to keep error handling simple without adding extra complexity.
tools/deployment/presto-clp/demo-assets/init.sh (11)
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/
, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly
declarations.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/
, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly
declarations.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-x86/build.sh:18-24
Timestamp: 2025-07-01T14:52:02.418Z
Learning: In the CLP project, consistency across platform build scripts is prioritized over defensive programming when it comes to git remote handling. All build.sh files in docker-images directories should follow the same pattern for git metadata injection.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-x86/build.sh:18-24
Timestamp: 2025-07-01T14:52:02.418Z
Learning: In the CLP project, consistency across platform build scripts is prioritized over defensive programming when it comes to git remote handling. All build.sh files in docker-images directories should follow the same pattern for git metadata injection.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in install-prebuilt-packages.sh
, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like --strip-components=1
to handle potential tarball layout changes.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install
or apk update && apk add
), preserve this pattern for uniform script structure.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install
or apk update && apk add
), preserve this pattern for uniform script structure.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install
or apk update && apk add
), preserve this pattern for uniform script structure.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across different platforms (e.g., using separate update and install commands like apk update && apk add
, apt update && apt install
, yum update && yum install
) rather than platform-specific optimizations, to ensure uniform script structure and readability.
Learnt from: haiqi96
PR: #594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in components/clp-package-utils/clp_package_utils/scripts/
, note that it's preferred to keep error handling simple without adding extra complexity.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-aarch64/build.sh:3-5
Timestamp: 2025-07-07T17:43:04.349Z
Learning: In CLP project build scripts (specifically build.sh files in docker-images directories), maintain consistency with the established pattern of using separate set -eu
and set -o pipefail
commands rather than combining them into set -euo pipefail
, to ensure uniform script structure across all platform build scripts.
tools/deployment/presto-clp/README.md (4)
Learnt from: quinntaylormitchell
PR: #968
File: docs/src/user-guide/quick-start/overview.md:73-109
Timestamp: 2025-06-18T20:39:05.899Z
Learning: The CLP project team prefers to use video content to demonstrate detailed procedural steps (like tarball extraction) rather than including every step in the written documentation, keeping the docs focused on conceptual guidance.
Learnt from: Bill-hbrhbr
PR: #1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: quinntaylormitchell
PR: #1069
File: docs/src/user-guide/quick-start/clp-json.md:135-152
Timestamp: 2025-07-05T03:38:16.779Z
Learning: The CLP project team has decided to refrain from using include directives in their documentation at present, preferring to maintain duplicated content rather than using shared includes or partials for de-duplication.
Learnt from: quinntaylormitchell
PR: #961
File: docs/src/dev-guide/design-clp-structured/single-file-archive-format.md:216-219
Timestamp: 2025-06-18T14:35:20.485Z
Learning: In clp-s documentation, technical abbreviations like "MPT" (Merged Parse Tree) should be defined at first use to improve reader clarity and comprehension.
🪛 dotenv-linter (3.3.0)
tools/deployment/presto-clp/.env
[warning] 2-2: [QuoteCharacter] The value has quote characters (', ")
[warning] 3-3: [QuoteCharacter] The value has quote characters (', ")
[warning] 6-6: [QuoteCharacter] The value has quote characters (', ")
[warning] 7-7: [QuoteCharacter] The value has quote characters (', ")
[warning] 7-7: [UnorderedKey] The PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_URL key should go before the PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_PROVIDERTYPE key
[warning] 8-8: [QuoteCharacter] The value has quote characters (', ")
[warning] 8-8: [UnorderedKey] The PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_NAME key should go before the PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_URL key
[warning] 9-9: [QuoteCharacter] The value has quote characters (', ")
[warning] 9-9: [UnorderedKey] The PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_USER key should go before the PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_PROVIDERTYPE key
[warning] 10-10: [QuoteCharacter] The value has quote characters (', ")
[warning] 10-10: [UnorderedKey] The PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_PASSWORD key should go before the PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_URL key
[warning] 11-11: [QuoteCharacter] The value has quote characters (', ")
[warning] 12-12: [QuoteCharacter] The value has quote characters (', ")
[warning] 15-15: [QuoteCharacter] The value has quote characters (', ")
[warning] 16-16: [QuoteCharacter] The value has quote characters (', ")
[warning] 19-19: [QuoteCharacter] The value has quote characters (', ")
[warning] 20-20: [QuoteCharacter] The value has quote characters (', ")
[warning] 20-20: [UnorderedKey] The PRESTO_COORDINATOR_CONFIG_JVMCONFIG_G1HEAPREGIONSIZE key should go before the PRESTO_COORDINATOR_CONFIG_JVMCONFIG_MAXHEAPSIZE key
[warning] 23-23: [QuoteCharacter] The value has quote characters (', ")
[warning] 26-26: [QuoteCharacter] The value has quote characters (', ")
[warning] 29-29: [QuoteCharacter] The value has quote characters (', ")
[warning] 32-32: [QuoteCharacter] The value has quote characters (', ")
🪛 Shellcheck (0.10.0)
tools/deployment/presto-clp/coordinator/scripts/generate-configs.sh
[info] 10-10: Double quote to prevent globbing and word splitting.
(SC2086)
tools/deployment/presto-clp/worker/scripts/generate-configs.sh
[info] 10-10: Double quote to prevent globbing and word splitting.
(SC2086)
[warning] 81-81: In POSIX sh, HOSTNAME is undefined.
(SC3028)
[warning] 82-82: In POSIX sh, HOSTNAME is undefined.
(SC3028)
tools/deployment/presto-clp/demo-assets/init.sh
[warning] 6-6: Declare and assign separately to avoid masking return values.
(SC2155)
[warning] 20-20: Declare and assign separately to avoid masking return values.
(SC2155)
[info] 20-20: Double quote to prevent globbing and word splitting.
(SC2086)
[warning] 21-21: Declare and assign separately to avoid masking return values.
(SC2155)
[info] 21-21: Double quote to prevent globbing and word splitting.
(SC2086)
[info] 39-39: Double quote to prevent globbing and word splitting.
(SC2086)
🪛 YAMLlint (1.37.1)
tools/deployment/presto-clp/docker-compose.yaml
[error] 39-39: too many blank lines (1 > 0)
(empty-lines)
tools/deployment/presto-clp/demo-assets/clp-config.yml
[error] 39-39: too many blank lines (1 > 0)
(empty-lines)
🪛 markdownlint-cli2 (0.17.2)
tools/deployment/presto-clp/README.md
7-7: Multiple top-level headings in the same document
(MD025, single-title, single-h1)
22-22: Ordered list item prefix
Expected: 1; Actual: 5; Style: 1/1/1
(MD029, ol-prefix)
29-29: Ordered list item prefix
Expected: 1; Actual: 6; Style: 1/1/1
(MD029, ol-prefix)
35-35: Multiple top-level headings in the same document
(MD025, single-title, single-h1)
44-44: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
44-44: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
48-48: Multiple top-level headings in the same document
(MD025, single-title, single-h1)
48-48: Trailing punctuation in heading
Punctuation: ':'
(MD026, no-trailing-punctuation)
61-61: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
65-65: Multiple top-level headings in the same document
(MD025, single-title, single-h1)
71-71: Multiple consecutive blank lines
Expected: 1; Actual: 2
(MD012, no-multiple-blanks)
72-72: Multiple consecutive blank lines
Expected: 1; Actual: 3
(MD012, no-multiple-blanks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: lint-check (ubuntu-latest)
🔇 Additional comments (16)
tools/deployment/presto-clp/worker/config-template/node.properties (1)
1-4
: VerifyREPLACE_ME
placeholders are always substituted at container start-upIf
generate-configs.sh
fails, leavingnode.internal-address
ornode.id
asREPLACE_ME
, the worker will refuse to start. Consider adding a defensive check in the script (e.g.,grep -q REPLACE_ME
withexit 1
) so mis-substitution is caught early in CI rather than at runtime.tools/deployment/presto-clp/worker/config-template/clp.properties (1)
1-1
: LGTM – matches Presto catalogue naming conventionstools/deployment/presto-clp/worker/config-template/velox.properties (1)
1-1
: Confirm mutable config is desired inside containers
mutable-config=true
allows live reloads but also permits runtime edits that diverge from the committed template. Double-check this aligns with your security/operational requirements.tools/deployment/presto-clp/coordinator/config-template/log.properties (1)
1-1
: LGTM – log level wired to env vartools/deployment/presto-clp/coordinator/config-template/clp.properties (1)
9-9
: Hard-wired path may break ifetc
is relocated
clp.metadata-filter-config
is fixed to/opt/presto-server/etc/metadata-filter.json
. If the image’slauncher-config
overridesPRESTO_HOME
or mounts configs elsewhere, Presto will fail to resolve the file. Please verify the path or parameterize it like the other fields.tools/deployment/presto-clp/worker/config-template/config.properties (1)
1-6
: Ensurepresto.version
placeholder is replaced at run timeA literal
REPLACE_ME
will cause workers to refuse to join the cluster. Confirm thatscripts/generate-configs.sh
always substitutes the value before Presto starts, even on restart scenarios.tools/deployment/presto-clp/.env (1)
35-36
:CLP_PACKAGE_ARCHIVES
placeholder left
CLP_PACKAGE_ARCHIVES=REPLACE_ME
must be set to a valid host path or the volume mount in docker-compose.yaml will fail.tools/deployment/presto-clp/coordinator/config-template/config.properties (1)
1-14
: Well-structured Presto coordinator configuration.The configuration properly sets up the coordinator with appropriate settings for discovery, memory management, and feature flags. The environment variable placeholders follow a consistent naming convention and will be properly substituted by the generation scripts.
tools/deployment/presto-clp/demo-assets/clp-config.yml (2)
33-38
: Clarify the localhost vs placeholder IP usage.The web UI services use "localhost" while other services use "${REPLACE_IP}" placeholders. This might be intentional for security (keeping web UIs local-only), but please verify this is the intended behaviour.
1-39
: Comprehensive CLP service configuration.The YAML structure properly defines all necessary CLP services with appropriate connection details and configuration parameters. The placeholder pattern for IP addresses will be correctly processed by the associated scripts.
tools/deployment/presto-clp/docker-compose.yaml (4)
5-5
: Verify the use of "dev" tagged images.The services use images with "dev" tags from the GitHub Container Registry. Ensure these are the intended images for the deployment and consider whether versioned tags would be more appropriate for production use.
Also applies to: 17-17
6-6
: Well-designed custom entrypoint approach.The entrypoint configuration that runs config generation before starting services is a good pattern for dynamic configuration management in containerized deployments.
Also applies to: 21-21
28-28
: Good use of environment variable for flexible volume mounting.Using
${CLP_PACKAGE_ARCHIVES}
for the volume mount path allows flexible configuration of the data directory location.
1-39
: Well-structured Docker Compose deployment.The configuration properly orchestrates a Presto cluster with appropriate service dependencies, networking, and volume management. The integration with configuration templates and generation scripts provides a flexible deployment solution.
tools/deployment/presto-clp/worker/scripts/generate-configs.sh (2)
67-83
: Usehostname
for node ID –$HOSTNAME
is undefined in POSIX shBusyBox
/bin/sh
doesn’t set$HOSTNAME
. Use$(hostname)
.-if grep -q '^node.id=REPLACE_ME' "$NODE_PROPERTIES_FILE"; then - sed -i "s|^node.id=REPLACE_ME|node.id=$HOSTNAME|" "$NODE_PROPERTIES_FILE" +if grep -q '^node.id=REPLACE_ME' "$NODE_PROPERTIES_FILE"; then + sed -i "s|^node.id=REPLACE_ME|node.id=$(hostname)|" "$NODE_PROPERTIES_FILE"
30-37
: Unquoted DISCOVERY_URI may break on spaces; quote variable in wgetIf the value in
config.properties
ever contains spaces,wget
receives split tokens. Wrap it:-if response=$(wget -qO- --timeout=10 "${DISCOVERY_URI}/v1/info" 2>/dev/null); then +if response=$(wget -qO- --timeout=10 "${DISCOVERY_URI}/v1/info" 2>/dev/null); then(no functional diff needed, but keep double quotes around variable)
Likely an incorrect or invalid review comment.
tools/deployment/presto-clp/.env
Outdated
PRESTO_COORDINATOR_HTTPPORT="8080" | ||
PRESTO_COORDINATOR_SERVICENAME="presto-coordinator" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quoted values in .env
are passed verbatim – will include the quotes
Docker Compose does not strip "
in .env
, so PRESTO_COORDINATOR_HTTPPORT
will become "8080"
(quotes included).
That breaks integer expectations in config.properties and JDBC URLs.
Strip the quotes or switch to YAML-style environment:
blocks.
-PRESTO_COORDINATOR_HTTPPORT="8080"
-PRESTO_COORDINATOR_SERVICENAME="presto-coordinator"
+PRESTO_COORDINATOR_HTTPPORT=8080
+PRESTO_COORDINATOR_SERVICENAME=presto-coordinator
@@
-PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_PROVIDERTYPE="mysql"
-…
+PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_PROVIDERTYPE=mysql
+…
(Apply similarly to the remaining keys.)
Also applies to: 6-33
🧰 Tools
🪛 dotenv-linter (3.3.0)
[warning] 2-2: [QuoteCharacter] The value has quote characters (', ")
[warning] 3-3: [QuoteCharacter] The value has quote characters (', ")
🤖 Prompt for AI Agents
In tools/deployment/presto-clp/.env lines 2 to 33, the environment variable
values are enclosed in quotes, causing the quotes to be included literally in
the values. Remove the surrounding double quotes from all variable assignments
in this range to ensure the values are interpreted correctly by Docker Compose
and downstream configurations.
-server | ||
-Xmx${PRESTO_COORDINATOR_CONFIG_JVMCONFIG_MAXHEAPSIZE} | ||
-XX:+UseG1GC | ||
-XX:G1HeapRegionSize=${PRESTO_COORDINATOR_CONFIG_JVMCONFIG_G1HEAPREGIONSIZE} | ||
-XX:+UseGCOverheadLimit | ||
-XX:+ExplicitGCInvokesConcurrent | ||
-XX:+HeapDumpOnOutOfMemoryError | ||
-XX:+ExitOnOutOfMemoryError | ||
-Djdk.attach.allowAttachSelf=true | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Add container-aware memory flags & optional G1 tuning
-Xmx
hard-codes the ceiling and ignores the cgroup limit the container actually gets. Presto’s launcher (and Java ≥ 11) support:
-XX:+UseContainerSupport
-XX:MaxRAMPercentage=80
These keep the heap bounded to ~80 % of the container limit without manual tuning. Consider replacing -Xmx…
or combining it with these flags.
🤖 Prompt for AI Agents
In tools/deployment/presto-clp/coordinator/config-template/jvm.config lines 1 to
10, the current JVM options use a fixed -Xmx value which does not respect
container memory limits. To fix this, remove or complement the -Xmx flag with
container-aware flags by adding -XX:+UseContainerSupport and
-XX:MaxRAMPercentage=80. This will allow the JVM heap size to adapt dynamically
to the container's memory limit, improving resource management without manual
tuning.
{ | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Provide a minimal example or document expected schema
An empty JSON object is syntactically valid, but future maintainers may be unsure what keys are supported. A commented exemplar or pointer to docs beside this file would improve clarity without affecting runtime.
🤖 Prompt for AI Agents
In tools/deployment/presto-clp/coordinator/config-template/metadata-filter.json
at lines 1 to 2, the JSON file is currently empty, which may confuse future
maintainers about the expected keys. Add a minimal example JSON object with
typical keys and values or include comments or a reference to documentation
explaining the expected schema to improve clarity without impacting runtime
behavior.
node.environment=${PRESTO_COORDINATOR_CONFIG_NODEPROPERTIES_ENVIRONMENT} | ||
node.id=${PRESTO_COORDINATOR_SERVICENAME} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
node.data-dir
missing & static node.id
risks collisions
Presto will refuse to start without node.data-dir
, and using a constant value for node.id
means every fresh container (or an additional coordinator) will register with the same ID, leading to “Node already exists” errors.
node.environment=${PRESTO_COORDINATOR_CONFIG_NODEPROPERTIES_ENVIRONMENT}
-node.id=${PRESTO_COORDINATOR_SERVICENAME}
+# A unique ID per container – fall back to HOSTNAME if not supplied
+node.id=${PRESTO_COORDINATOR_NODE_ID:-${HOSTNAME}}
+# Mandatory data directory
+node.data-dir=/opt/presto-server/data
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
node.environment=${PRESTO_COORDINATOR_CONFIG_NODEPROPERTIES_ENVIRONMENT} | |
node.id=${PRESTO_COORDINATOR_SERVICENAME} | |
node.environment=${PRESTO_COORDINATOR_CONFIG_NODEPROPERTIES_ENVIRONMENT} | |
# A unique ID per container – fall back to HOSTNAME if not supplied | |
node.id=${PRESTO_COORDINATOR_NODE_ID:-${HOSTNAME}} | |
# Mandatory data directory | |
node.data-dir=/opt/presto-server/data |
🤖 Prompt for AI Agents
In tools/deployment/presto-clp/coordinator/config-template/node.properties at
lines 1 to 3, add the required property node.data-dir with an appropriate
environment variable or path to ensure Presto starts correctly. Also, replace
the static node.id value with a dynamic or unique identifier, such as an
environment variable or generated UUID, to prevent node ID collisions when
multiple coordinators or containers run simultaneously.
rm -rf ${PRESTO_CONFIG_DIR}/catalog | ||
mkdir -p ${PRESTO_CONFIG_DIR}/catalog |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider consistency with the other generate-configs.sh script.
This script uses rm -rf
and mkdir -p
for catalog directory management, while the other version uses rm -f
. Consider standardizing the approach across both scripts.
Also fix the quoting issue:
-rm -rf ${PRESTO_CONFIG_DIR}/catalog
-mkdir -p ${PRESTO_CONFIG_DIR}/catalog
+rm -rf "${PRESTO_CONFIG_DIR}/catalog"
+mkdir -p "${PRESTO_CONFIG_DIR}/catalog"
🤖 Prompt for AI Agents
In tools/deployment/presto-clp/coordinator/scripts/$ around lines 14 to 15, the
script uses `rm -rf` to remove the catalog directory, which is inconsistent with
the other generate-configs.sh script that uses `rm -f`. Change the removal
command to use `rm -f` if only files are being removed or adjust both scripts to
use the same method for directory cleanup. Additionally, fix any quoting issues
by properly quoting variables like "${PRESTO_CONFIG_DIR}/catalog" to prevent
word splitting or globbing errors.
# Setup local docker stack for presto + clp | ||
|
||
## Install docker | ||
|
||
Follow the guide here: [docker] | ||
|
||
# Launch clp-package |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Convert secondary titles to H2 and fix punctuation for Markdown lint-cleanliness
The document currently contains five #
-level headings (Setup…
, Launch…
, Create Docker Cluster
, Use cli:
and Delete docker Cluster
).
Markdown-lint (MD025/MD026) flags this as multiple H1s and trailing punctuation. Keeping a single top-level H1 improves SEO and stops most linters from failing CI.
-# Launch clp-package
+# ## Launch clp-package
…
-# Create Docker Cluster
+# ## Create Docker Cluster
…
-# Use cli:
+# ## Use CLI
…
-# Delete docker Cluster
+# ## Delete Docker Cluster
Also applies to: 35-41, 48-57, 65-69
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
7-7: Multiple top-level headings in the same document
(MD025, single-title, single-h1)
🤖 Prompt for AI Agents
In tools/deployment/presto-clp/README.md around lines 1 to 7 and also lines
35-41, 48-57, and 65-69, change all secondary titles currently using a single #
(H1) to use ## (H2) to ensure only one top-level H1 heading in the document.
Additionally, remove any trailing punctuation from these headings to comply with
markdown lint rules MD025 and MD026. This will improve SEO and prevent linter
failures in CI.
```sql | ||
SELECT * FROM default LIMIT 1; | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Minor: qualify SQL fenced block for syntax highlighting
Add sql
after the back-ticks.
-```sql
+```sql
SELECT * FROM default LIMIT 1;
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 markdownlint-cli2 (0.17.2)</summary>
61-61: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
</details>
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
In tools/deployment/presto-clp/README.md around lines 61 to 63, the SQL code
block is missing the language specifier for syntax highlighting. Add "sql"
immediately after the opening triple backticks to enable proper SQL syntax
highlighting for the code block.
</details>
<!-- fingerprinting:phantom:triton:cougar -->
<!-- This is an auto-generated comment by CodeRabbit -->
# Substitute environemnt variables in config template | ||
find /configs -type f | while read -r f; do | ||
( echo "cat <<EOF"; cat $f; echo "EOF" ) | sh > "${PRESTO_CONFIG_DIR}/$(basename "$f")" | ||
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Template expansion executes arbitrary shell – switch to envsubst
Piping the template through sh
(cat <<EOF … | sh
) will execute every line, not just expand variables. If anyone edits a template and slips in rm -rf /
, the worker container obeys. Use envsubst
(in gettext-base
) or a safe substitution tool instead.
- ( echo "cat <<EOF"; cat "$f"; echo "EOF" ) | sh > "${PRESTO_CONFIG_DIR}/$(basename "$f")"
+ envsubst < "$f" > "${PRESTO_CONFIG_DIR}/$(basename "$f")"
Also quote $f
to satisfy SC2086.
🧰 Tools
🪛 Shellcheck (0.10.0)
[info] 10-10: Double quote to prevent globbing and word splitting.
(SC2086)
🤖 Prompt for AI Agents
In tools/deployment/presto-clp/worker/scripts/generate-configs.sh around lines 8
to 11, the current template expansion uses a shell execution that runs arbitrary
commands, which is unsafe. Replace the use of `cat <<EOF ... | sh` with
`envsubst` to safely substitute environment variables without executing
commands. Also, ensure to quote the variable `$f` in the `cat` command to
prevent word splitting and globbing issues.
… Python script; Split environment variable file into multiple files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 14
♻️ Duplicate comments (13)
tools/deployment/presto-clp/coordinator/config-template/metadata-filter.json (1)
1-2
: Add minimal schema example or link to docsAn entirely empty JSON object gives no clue what keys are supported. Provide a commented exemplar or reference documentation to aid maintainers.
tools/deployment/presto-clp/coordinator/config-template/node.properties (1)
1-2
:node.data-dir
still missing andnode.id
is static
(See earlier review)Presto fails to boot without
node.data-dir
, and a constantnode.id
causes collisions when multiple containers start.node.environment=${PRESTO_COORDINATOR_CONFIG_NODEPROPERTIES_ENVIRONMENT} -node.id=${PRESTO_COORDINATOR_SERVICENAME} +# Unique per container; fall back to the host id +node.id=${PRESTO_COORDINATOR_NODE_ID:-${HOSTNAME}} +node.data-dir=/opt/presto-server/datatools/deployment/presto-clp/coordinator/config-template/jvm.config (1)
1-9
: Consider container-aware memory flags instead of fixed-Xmx
Java 11 already honours cgroup limits when given:
-XX:+UseContainerSupport -XX:MaxRAMPercentage=80
This avoids hard-coding
${…MAXHEAPSIZE}
and keeps the heap proportional to the container limit.tools/deployment/presto-clp/coordinator/scripts/generate-configs.sh (4)
8-8
: Typo: “environemnt” → “environment”
9-11
: Replace unsafe eval-based template substitution
The pipeline that funnels templates throughsh
executes their contents, creating a command-injection vector and breaking on filenames with spaces. Preferenvsubst
(orawk
/sed
) with proper quoting.
14-14
: Quote variable inrm
command
Unquoted expansion may misbehave on paths containing spaces.
17-17
: Quote variables inmv
command
Protects against word-splitting and globbing issues.tools/deployment/presto-clp/README.md (5)
1-8
: Multiple H1 headings (MD025) – change subsequent#
headings to##
.
20-29
: Ordered list numbering jumps (MD029) – restart numbering or use bullets.
41-44
: Add language hint & blank lines around fenced block (MD031/MD040)
46-46
: Heading ends with “:” and is H1 – use## Use CLI
without trailing punctuation.
59-61
: Surround SQL block with blank lines & keepsql
tag (MD031)tools/deployment/presto-clp/worker/scripts/generate-configs.sh (1)
8-11
: Template expansion executes arbitrary shell – switch to envsubstThis security vulnerability was previously flagged. Piping the template through
sh
will execute every line, not just expand variables. Useenvsubst
for safe variable substitution instead.-# Substitute environemnt variables in config template +# Substitute environment variables in config template find /configs -type f | while read -r f; do - ( echo "cat <<EOF"; cat $f; echo "EOF" ) | sh > "${PRESTO_CONFIG_DIR}/$(basename "$f")" + envsubst < "$f" > "${PRESTO_CONFIG_DIR}/$(basename "$f")" done
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (22)
taskfiles/lint.yaml
(2 hunks)tools/deployment/presto-clp/README.md
(1 hunks)tools/deployment/presto-clp/coordinator-common.env
(1 hunks)tools/deployment/presto-clp/coordinator.env
(1 hunks)tools/deployment/presto-clp/coordinator/config-template/clp.properties
(1 hunks)tools/deployment/presto-clp/coordinator/config-template/config.properties
(1 hunks)tools/deployment/presto-clp/coordinator/config-template/jvm.config
(1 hunks)tools/deployment/presto-clp/coordinator/config-template/log.properties
(1 hunks)tools/deployment/presto-clp/coordinator/config-template/metadata-filter.json
(1 hunks)tools/deployment/presto-clp/coordinator/config-template/node.properties
(1 hunks)tools/deployment/presto-clp/coordinator/scripts/generate-configs.sh
(1 hunks)tools/deployment/presto-clp/docker-compose.yaml
(1 hunks)tools/deployment/presto-clp/scripts/.gitignore
(1 hunks)tools/deployment/presto-clp/scripts/generate-user-env-vars-file.py
(1 hunks)tools/deployment/presto-clp/scripts/requirements.txt
(1 hunks)tools/deployment/presto-clp/scripts/set-up-config.sh
(1 hunks)tools/deployment/presto-clp/worker.env
(1 hunks)tools/deployment/presto-clp/worker/config-template/clp.properties
(1 hunks)tools/deployment/presto-clp/worker/config-template/config.properties
(1 hunks)tools/deployment/presto-clp/worker/config-template/node.properties
(1 hunks)tools/deployment/presto-clp/worker/config-template/velox.properties
(1 hunks)tools/deployment/presto-clp/worker/scripts/generate-configs.sh
(1 hunks)
🧰 Additional context used
🧠 Learnings (11)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: quinntaylormitchell
PR: y-scope/clp#968
File: docs/src/user-guide/quick-start/overview.md:73-109
Timestamp: 2025-06-18T20:39:05.899Z
Learning: The CLP project team prefers to use video content to demonstrate detailed procedural steps (like tarball extraction) rather than including every step in the written documentation, keeping the docs focused on conceptual guidance.
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-aarch64/build.sh:3-5
Timestamp: 2025-07-07T17:43:04.349Z
Learning: In CLP project build scripts (specifically build.sh files in docker-images directories), maintain consistency with the established pattern of using separate `set -eu` and `set -o pipefail` commands rather than combining them into `set -euo pipefail`, to ensure uniform script structure across all platform build scripts.
tools/deployment/presto-clp/scripts/.gitignore (3)
Learnt from: Bill-hbrhbr
PR: #1126
File: .gitignore:5-5
Timestamp: 2025-07-25T21:29:48.947Z
Learning: In the CLP project, the .clang-format file is maintained in the yscope-dev-utils submodule and copied over to the main CLP repository, so it should be ignored in .gitignore to prevent accidental commits of the copied file and maintain the single source of truth in the submodule.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/
, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly
declarations.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/
, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly
declarations.
tools/deployment/presto-clp/worker/config-template/clp.properties (1)
Learnt from: gibber9809
PR: #955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
tools/deployment/presto-clp/coordinator/config-template/metadata-filter.json (4)
Learnt from: junhaoliao
PR: #596
File: components/log-viewer-webui/client/src/api/query.js:16-23
Timestamp: 2024-11-21T15:51:33.203Z
Learning: In components/log-viewer-webui/client/src/api/query.js
, the ExtractJsonResp
type definition is accurate as-is and does not require modification. When suggesting changes to type definitions, ensure they align with the server-side definitions, referencing the source code if necessary.
Learnt from: LinZhihao-723
PR: #873
File: components/core/src/clp/ffi/ir_stream/search/test/utils.cpp:78-84
Timestamp: 2025-05-02T23:49:30.798Z
Learning: The function trivial_new_projected_schema_tree_node_callback
in the KV-IR codebase is documented in the header file to explain its purpose for testing, which addresses static analyzer warnings about unused functions.
Learnt from: LinZhihao-723
PR: #873
File: components/core/src/clp/ffi/ir_stream/search/test/utils.cpp:78-84
Timestamp: 2025-05-02T23:49:30.798Z
Learning: The function trivial_new_projected_schema_tree_node_callback
in the KV-IR codebase is properly documented in the header file to indicate its purpose for testing, which addresses static analyzer warnings about unused functions.
Learnt from: LinZhihao-723
PR: #882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:378-402
Timestamp: 2025-05-07T16:56:35.687Z
Learning: In the CLP search component, the evaluate_wildcard_filter
function should return AstEvaluationResult::Pruned
when node_id_value_pairs
is empty, not AstEvaluationResult::False
. Empty node sets should be treated as "undetermined" rather than definitive non-matches.
tools/deployment/presto-clp/coordinator/config-template/clp.properties (1)
Learnt from: gibber9809
PR: #955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
tools/deployment/presto-clp/scripts/set-up-config.sh (12)
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/
, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly
declarations.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/
, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly
declarations.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-aarch64/build.sh:3-5
Timestamp: 2025-07-07T17:43:04.349Z
Learning: In CLP project build scripts (specifically build.sh files in docker-images directories), maintain consistency with the established pattern of using separate set -eu
and set -o pipefail
commands rather than combining them into set -euo pipefail
, to ensure uniform script structure across all platform build scripts.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in install-prebuilt-packages.sh
, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like --strip-components=1
to handle potential tarball layout changes.
Learnt from: kirkrodrigues
PR: #881
File: components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh:35-41
Timestamp: 2025-05-06T09:48:55.408Z
Learning: For installation scripts in the CLP project, prefer explicit error handling over automatic dependency resolution (like apt-get install -f
) when installing packages to give users more control over their system.
Learnt from: haiqi96
PR: #594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in components/clp-package-utils/clp_package_utils/scripts/
, note that it's preferred to keep error handling simple without adding extra complexity.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install
or apk update && apk add
), preserve this pattern for uniform script structure.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install
or apk update && apk add
), preserve this pattern for uniform script structure.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install
or apk update && apk add
), preserve this pattern for uniform script structure.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across different platforms (e.g., using separate update and install commands like apk update && apk add
, apt update && apt install
, yum update && yum install
) rather than platform-specific optimizations, to ensure uniform script structure and readability.
Learnt from: kirkrodrigues
PR: #881
File: components/core/tools/scripts/lib_install/macos/install-all.sh:11-12
Timestamp: 2025-05-06T10:07:04.654Z
Learning: In CLP installation scripts, temporary directories with downloaded files should not be automatically cleaned up on failure (e.g., with EXIT traps) to preserve artifacts for debugging purposes.
Learnt from: quinntaylormitchell
PR: #968
File: docs/src/user-guide/quick-start/overview.md:53-54
Timestamp: 2025-06-18T20:48:48.990Z
Learning: CLP is designed to run on Linux systems where Python is typically pre-installed, so Python installation links are generally not needed in CLP documentation.
tools/deployment/presto-clp/README.md (14)
Learnt from: quinntaylormitchell
PR: #968
File: docs/src/user-guide/quick-start/overview.md:73-109
Timestamp: 2025-06-18T20:39:05.899Z
Learning: The CLP project team prefers to use video content to demonstrate detailed procedural steps (like tarball extraction) rather than including every step in the written documentation, keeping the docs focused on conceptual guidance.
Learnt from: Bill-hbrhbr
PR: #1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: quinntaylormitchell
PR: #1069
File: docs/src/user-guide/quick-start/clp-json.md:135-152
Timestamp: 2025-07-05T03:38:16.779Z
Learning: The CLP project team has decided to refrain from using include directives in their documentation at present, preferring to maintain duplicated content rather than using shared includes or partials for de-duplication.
Learnt from: quinntaylormitchell
PR: #961
File: docs/src/dev-guide/design-clp-structured/single-file-archive-format.md:216-219
Timestamp: 2025-06-18T14:35:20.485Z
Learning: In clp-s documentation, technical abbreviations like "MPT" (Merged Parse Tree) should be defined at first use to improve reader clarity and comprehension.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-aarch64/build.sh:3-5
Timestamp: 2025-07-07T17:43:04.349Z
Learning: In CLP project build scripts (specifically build.sh files in docker-images directories), maintain consistency with the established pattern of using separate set -eu
and set -o pipefail
commands rather than combining them into set -euo pipefail
, to ensure uniform script structure across all platform build scripts.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/
, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly
declarations.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/
, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly
declarations.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-x86/build.sh:18-24
Timestamp: 2025-07-01T14:52:02.418Z
Learning: In the CLP project, consistency across platform build scripts is prioritized over defensive programming when it comes to git remote handling. All build.sh files in docker-images directories should follow the same pattern for git metadata injection.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-x86/build.sh:18-24
Timestamp: 2025-07-01T14:52:02.418Z
Learning: In the CLP project, consistency across platform build scripts is prioritized over defensive programming when it comes to git remote handling. All build.sh files in docker-images directories should follow the same pattern for git metadata injection.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install
or apk update && apk add
), preserve this pattern for uniform script structure.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install
or apk update && apk add
), preserve this pattern for uniform script structure.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install
or apk update && apk add
), preserve this pattern for uniform script structure.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in install-prebuilt-packages.sh
, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like --strip-components=1
to handle potential tarball layout changes.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across different platforms (e.g., using separate update and install commands like apk update && apk add
, apt update && apt install
, yum update && yum install
) rather than platform-specific optimizations, to ensure uniform script structure and readability.
taskfiles/lint.yaml (1)
Learnt from: anlowee
PR: #925
File: taskfiles/deps/main.yaml:97-106
Timestamp: 2025-05-28T18:33:30.155Z
Learning: In the taskfiles dependency system (taskfiles/deps/main.yaml), echo commands are used to generate .cmake settings files that are consumed by the main CMake build process. These files set variables like ANTLR_RUNTIME_HEADER to point to dependency locations for use during compilation.
tools/deployment/presto-clp/worker/scripts/generate-configs.sh (8)
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/
, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly
declarations.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/
, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly
declarations.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in install-prebuilt-packages.sh
, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like --strip-components=1
to handle potential tarball layout changes.
Learnt from: haiqi96
PR: #594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in components/clp-package-utils/clp_package_utils/scripts/
, note that it's preferred to keep error handling simple without adding extra complexity.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-x86/build.sh:18-24
Timestamp: 2025-07-01T14:52:02.418Z
Learning: In the CLP project, consistency across platform build scripts is prioritized over defensive programming when it comes to git remote handling. All build.sh files in docker-images directories should follow the same pattern for git metadata injection.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-x86/build.sh:18-24
Timestamp: 2025-07-01T14:52:02.418Z
Learning: In the CLP project, consistency across platform build scripts is prioritized over defensive programming when it comes to git remote handling. All build.sh files in docker-images directories should follow the same pattern for git metadata injection.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-aarch64/build.sh:3-5
Timestamp: 2025-07-07T17:43:04.349Z
Learning: In CLP project build scripts (specifically build.sh files in docker-images directories), maintain consistency with the established pattern of using separate set -eu
and set -o pipefail
commands rather than combining them into set -euo pipefail
, to ensure uniform script structure across all platform build scripts.
Learnt from: haiqi96
PR: #594
File: components/package-template/src/sbin/del-archives.sh:6-9
Timestamp: 2024-11-15T16:28:08.644Z
Learning: For wrapper scripts in the components/package-template/src/sbin/
directory, keep them simple and avoid adding additional validation code.
tools/deployment/presto-clp/scripts/generate-user-env-vars-file.py (2)
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/
, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly
declarations.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/
, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly
declarations.
tools/deployment/presto-clp/coordinator/scripts/generate-configs.sh (13)
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/
, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly
declarations.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/
, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly
declarations.
Learnt from: haiqi96
PR: #594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in components/clp-package-utils/clp_package_utils/scripts/
, note that it's preferred to keep error handling simple without adding extra complexity.
Learnt from: gibber9809
PR: #955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in install-prebuilt-packages.sh
, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like --strip-components=1
to handle potential tarball layout changes.
Learnt from: haiqi96
PR: #594
File: components/package-template/src/sbin/del-archives.sh:6-9
Timestamp: 2024-11-15T16:28:08.644Z
Learning: For wrapper scripts in the components/package-template/src/sbin/
directory, keep them simple and avoid adding additional validation code.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-aarch64/build.sh:3-5
Timestamp: 2025-07-07T17:43:04.349Z
Learning: In CLP project build scripts (specifically build.sh files in docker-images directories), maintain consistency with the established pattern of using separate set -eu
and set -o pipefail
commands rather than combining them into set -euo pipefail
, to ensure uniform script structure across all platform build scripts.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-x86/build.sh:18-24
Timestamp: 2025-07-01T14:52:02.418Z
Learning: In the CLP project, consistency across platform build scripts is prioritized over defensive programming when it comes to git remote handling. All build.sh files in docker-images directories should follow the same pattern for git metadata injection.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-x86/build.sh:18-24
Timestamp: 2025-07-01T14:52:02.418Z
Learning: In the CLP project, consistency across platform build scripts is prioritized over defensive programming when it comes to git remote handling. All build.sh files in docker-images directories should follow the same pattern for git metadata injection.
Learnt from: kirkrodrigues
PR: #881
File: components/core/tools/scripts/lib_install/macos/install-all.sh:11-12
Timestamp: 2025-05-06T10:07:04.654Z
Learning: In CLP installation scripts, temporary directories with downloaded files should not be automatically cleaned up on failure (e.g., with EXIT traps) to preserve artifacts for debugging purposes.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install
or apk update && apk add
), preserve this pattern for uniform script structure.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install
or apk update && apk add
), preserve this pattern for uniform script structure.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install
or apk update && apk add
), preserve this pattern for uniform script structure.
🪛 Shellcheck (0.10.0)
tools/deployment/presto-clp/scripts/set-up-config.sh
[info] 20-20: Not following: ./.venv/bin/activate was not specified as input (see shellcheck -x).
(SC1091)
tools/deployment/presto-clp/worker/scripts/generate-configs.sh
[info] 10-10: Double quote to prevent globbing and word splitting.
(SC2086)
[warning] 81-81: In POSIX sh, HOSTNAME is undefined.
(SC3028)
[warning] 82-82: In POSIX sh, HOSTNAME is undefined.
(SC3028)
tools/deployment/presto-clp/coordinator/scripts/generate-configs.sh
[info] 10-10: Double quote to prevent globbing and word splitting.
(SC2086)
🪛 markdownlint-cli2 (0.17.2)
tools/deployment/presto-clp/README.md
7-7: Multiple top-level headings in the same document
(MD025, single-title, single-h1)
20-20: Ordered list item prefix
Expected: 1; Actual: 5; Style: 1/1/1
(MD029, ol-prefix)
27-27: Ordered list item prefix
Expected: 1; Actual: 6; Style: 1/1/1
(MD029, ol-prefix)
33-33: Multiple top-level headings in the same document
(MD025, single-title, single-h1)
42-42: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
42-42: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
46-46: Multiple top-level headings in the same document
(MD025, single-title, single-h1)
46-46: Trailing punctuation in heading
Punctuation: ':'
(MD026, no-trailing-punctuation)
59-59: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
63-63: Multiple top-level headings in the same document
(MD025, single-title, single-h1)
69-69: Multiple consecutive blank lines
Expected: 1; Actual: 2
(MD012, no-multiple-blanks)
70-70: Multiple consecutive blank lines
Expected: 1; Actual: 3
(MD012, no-multiple-blanks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: build-macos (macos-14, false)
- GitHub Check: build-macos (macos-14, true)
- GitHub Check: build-macos (macos-13, false)
- GitHub Check: build-macos (macos-13, true)
🔇 Additional comments (10)
tools/deployment/presto-clp/scripts/.gitignore (1)
1-1
: Pattern will not match the intended directoryBecause this
.gitignore
resides two levels below the repo root, the leading “/” anchors the pattern at the repository root, sotools/deployment/presto-clp/scripts/.venv/
will still be tracked. Drop the leading slash:-/.venv/ +.venv/to correctly ignore the virtual-env folder created alongside these scripts.
⛔ Skipped due to learnings
Learnt from: Bill-hbrhbr PR: y-scope/clp#1126 File: .gitignore:5-5 Timestamp: 2025-07-25T21:29:48.947Z Learning: In the CLP project, the .clang-format file is maintained in the yscope-dev-utils submodule and copied over to the main CLP repository, so it should be ignored in .gitignore to prevent accidental commits of the copied file and maintain the single source of truth in the submodule.
tools/deployment/presto-clp/worker/config-template/clp.properties (1)
1-1
: LGTM – property is correct
connector.name=clp
matches the coordinator side and Presto expectations.tools/deployment/presto-clp/worker/config-template/velox.properties (1)
1-1
: Confirm thatmutable-config=true
is intended for production clustersTurning on dynamic config reloading can be handy during local experiments, but it also lets any user with the right privilege change Velox settings at runtime. If this worker image is used beyond a demo, double-check that:
- The feature is really needed.
- Access is restricted (e.g., via
web-ui.authentication
).Otherwise consider omitting the flag or gating it behind an env var.
taskfiles/lint.yaml (1)
106-108
: YAML directory inclusion looks fine – confirm yamllint rules cover Docker-Compose syntax.Adding
tools/deployment
to the yamllint target broadens coverage, which is good. The directory now containsdocker-compose.yaml
; some yamllint default rules (e.g.,truthy
,reserved-words
) frequently complain about Compose keywords. Double-check that your.yamllint.yml
either disables those rules for Compose files or uses ayamllint
extends: docker
profile to avoid false positives.tools/deployment/presto-clp/coordinator/config-template/config.properties (1)
4-5
: Environment variables verified in coordinator.envThe placeholders
PRESTO_COORDINATOR_CONFIG_CONFIGPROPERTIES_QUERY_MAXMEMORY
andPRESTO_COORDINATOR_CONFIG_CONFIGPROPERTIES_QUERY_MAXMEMORYPERNODE
are defined in tools/deployment/presto-clp/coordinator.env (lines 6–7). No further action required.tools/deployment/presto-clp/docker-compose.yaml (1)
32-33
: Ensure${CLP_PACKAGE_ARCHIVES}
is an absolute host pathThis bind-mount maps the host path to the same path inside the container. If the variable is unset or relative,
docker compose
will error at runtime. Validate or normalise it when generating.env
, or mount it to a fixed target path inside the container.tools/deployment/presto-clp/worker/scripts/generate-configs.sh (1)
18-66
: Well-implemented coordination logicThe retry mechanism for coordinator readiness and version detection is well-structured with appropriate error handling and busybox-compatible commands.
tools/deployment/presto-clp/scripts/generate-user-env-vars-file.py (3)
9-23
: Well-structured logging configurationThe logging setup follows Python best practices with proper formatting and console handling.
79-96
: Clean environment file generationThe environment variable generation is well-structured with consistent formatting and includes all necessary configuration parameters.
99-117
: Well-implemented configuration value extractionThe helper function properly handles nested dictionary traversal with dot notation and includes comprehensive documentation.
- "tools/deployment" | ||
- "tools/scripts" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Black/Ruff may recurse through large non-code trees under tools/deployment.
Running black .
from tools/deployment
will walk every sub-directory (configs, templates, logs). That can add noticeable time and, in worst cases, choke on extremely large data files a user may drop under that tree. Consider narrowing the path to tools/deployment/**/*.py
or adding an --extend-exclude
for obvious non-code directories (e.g., config-template
, scripts/generated
).
🤖 Prompt for AI Agents
In taskfiles/lint.yaml around lines 776 to 777, the current lint paths include
"tools/deployment" which causes Black and Ruff to recursively process large
non-code directories, slowing down linting and potentially causing errors.
Modify the lint paths to only include Python files under "tools/deployment" by
changing the path to "tools/deployment/**/*.py" or add an --extend-exclude
option to exclude known non-code directories like "configs" or "templates" to
prevent unnecessary recursion.
clp.metadata-provider-type=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_PROVIDERTYPE} | ||
clp.metadata-db-url=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_URL} | ||
clp.metadata-db-name=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_NAME} | ||
clp.metadata-db-user=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_USER} | ||
clp.metadata-db-password=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_PASSWORD} | ||
clp.metadata-table-prefix=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_TABLEPREFIX} | ||
clp.split-provider-type=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_SPLITPROVIDER} | ||
clp.metadata-filter-config=/opt/presto-server/etc/metadata-filter.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several placeholders have no matching definitions – coordinator fails to start.
coordinator.env
currently defines only METADATA_PROVIDERTYPE
and SPLITPROVIDER
.
The following variables are missing and will survive as literal ${…}
after envsubst
, causing Presto to error at startup:
PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_URL
…_DATABASE_NAME
…_DATABASE_USER
…_DATABASE_PASSWORD
…_METADATA_TABLEPREFIX
Add them to coordinator.env
(or remove the placeholders) before merging.
🤖 Prompt for AI Agents
In tools/deployment/presto-clp/coordinator/config-template/clp.properties lines
2 to 9, several environment variable placeholders used for database
configuration do not have corresponding definitions in coordinator.env, causing
the coordinator to fail at startup. To fix this, add the missing environment
variables PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_URL,
PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_NAME,
PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_USER,
PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_PASSWORD, and
PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_TABLEPREFIX to the
coordinator.env file with appropriate values, or alternatively remove these
placeholders from clp.properties if they are not needed.
@@ -0,0 +1 @@ | |||
com.facebook.presto=${PRESTO_COORDINATOR_CONFIG_LOGPROPERTIES_LEVEL} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid quoting the log level value in ${…}
Docker .env
files often wrap string values in quotes ("DEBUG"
). When the template engine substitutes the value here, the resulting line becomes:
com.facebook.presto="DEBUG"
The extra quotes are not a valid log level and Presto will fall back to the default (INFO
). Either:
-com.facebook.presto=${PRESTO_COORDINATOR_CONFIG_LOGPROPERTIES_LEVEL}
+com.facebook.presto=${PRESTO_COORDINATOR_CONFIG_LOGPROPERTIES_LEVEL:-INFO}
and remove surrounding quotes in the .env
file, or strip them inside the script that performs substitution.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
com.facebook.presto=${PRESTO_COORDINATOR_CONFIG_LOGPROPERTIES_LEVEL} | |
com.facebook.presto=${PRESTO_COORDINATOR_CONFIG_LOGPROPERTIES_LEVEL:-INFO} |
🤖 Prompt for AI Agents
In tools/deployment/presto-clp/coordinator/config-template/log.properties at
line 1, the log level value substituted from the environment variable may
include quotes, causing invalid log level syntax. To fix this, ensure that the
substitution process strips any surrounding quotes from the environment variable
value before inserting it into the config file, or update the .env file to
provide the log level without quotes.
clp_package_dir=$1 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Quote positional parameter when capturing it
Keeps paths with spaces intact.
-clp_package_dir=$1
+clp_package_dir="$1"
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
clp_package_dir=$1 | |
clp_package_dir="$1" |
🤖 Prompt for AI Agents
In tools/deployment/presto-clp/scripts/set-up-config.sh at line 13, the
positional parameter $1 is assigned to clp_package_dir without quotes, which can
break paths containing spaces. Fix this by enclosing $1 in double quotes when
assigning it to clp_package_dir to preserve any spaces in the path.
PRESTO_WORKER_HTTPPORT="8080" | ||
|
||
# node.properties | ||
PRESTO_WORKER_CONFIG_NODEPROPERTIES_LOCATION="worker-location" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Rename variable for readability & future-proofing
Using PRESTO_WORKER_HTTPPORT
without a separator makes the variable name harder to scan and deviates from the common *_HTTP_PORT
pattern used elsewhere in the repo (e.g., PRESTO_COORDINATOR_HTTP_PORT
). Consider renaming:
-PRESTO_WORKER_HTTPPORT="8080"
+PRESTO_WORKER_HTTP_PORT="8080"
and update every reference in the generation scripts accordingly.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
PRESTO_WORKER_HTTPPORT="8080" | |
# node.properties | |
PRESTO_WORKER_CONFIG_NODEPROPERTIES_LOCATION="worker-location" | |
PRESTO_WORKER_HTTP_PORT="8080" | |
# node.properties | |
PRESTO_WORKER_CONFIG_NODEPROPERTIES_LOCATION="worker-location" |
🤖 Prompt for AI Agents
In tools/deployment/presto-clp/worker.env lines 1 to 4, rename the environment
variable PRESTO_WORKER_HTTPPORT to PRESTO_WORKER_HTTP_PORT to improve
readability and maintain consistency with other variables like
PRESTO_COORDINATOR_HTTP_PORT. After renaming, update all references to this
variable in the generation scripts to match the new name.
@@ -0,0 +1,6 @@ | |||
discovery.uri=http://${PRESTO_COORDINATOR_SERVICENAME}:${PRESTO_COORDINATOR_HTTPPORT} | |||
presto.version=REPLACE_ME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Prefer an env placeholder over hard-coded REPLACE_ME
for Presto version.
Replacing REPLACE_ME
via sed in the bootstrap script works, but using an env var keeps the template declarative and avoids sed-ordering bugs:
-presto.version=REPLACE_ME
+presto.version=${PRESTO_VERSION}
The script can then simply export PRESTO_VERSION=$(curl …)
and rely on envsubst
.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
presto.version=REPLACE_ME | |
presto.version=${PRESTO_VERSION} |
🤖 Prompt for AI Agents
In tools/deployment/presto-clp/worker/config-template/config.properties at line
2, replace the hard-coded "REPLACE_ME" for presto.version with an environment
variable placeholder like "${PRESTO_VERSION}". This change allows the bootstrap
script to export PRESTO_VERSION dynamically and use envsubst for substitution,
making the template declarative and avoiding sed ordering issues.
node.environment=${PRESTO_COORDINATOR_CONFIG_NODEPROPERTIES_ENVIRONMENT} | ||
node.internal-address=REPLACE_ME | ||
node.location=${PRESTO_WORKER_CONFIG_NODEPROPERTIES_LOCATION} | ||
node.id=REPLACE_ME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Environment variable name may be misleading for the worker template.
node.environment
expands ${PRESTO_COORDINATOR_CONFIG_NODEPROPERTIES_ENVIRONMENT}
, which is defined in the coordinator env file. That couples the worker to the coordinator implementation detail and will break if someone supplies only worker.env
.
Rename to a worker-scoped variable or alias it in worker.env
to avoid hidden dependency.
-node.environment=${PRESTO_COORDINATOR_CONFIG_NODEPROPERTIES_ENVIRONMENT}
+node.environment=${PRESTO_WORKER_CONFIG_NODEPROPERTIES_ENVIRONMENT}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
node.environment=${PRESTO_COORDINATOR_CONFIG_NODEPROPERTIES_ENVIRONMENT} | |
node.internal-address=REPLACE_ME | |
node.location=${PRESTO_WORKER_CONFIG_NODEPROPERTIES_LOCATION} | |
node.id=REPLACE_ME | |
node.environment=${PRESTO_WORKER_CONFIG_NODEPROPERTIES_ENVIRONMENT} | |
node.internal-address=REPLACE_ME | |
node.location=${PRESTO_WORKER_CONFIG_NODEPROPERTIES_LOCATION} | |
node.id=REPLACE_ME |
🤖 Prompt for AI Agents
In tools/deployment/presto-clp/worker/config-template/node.properties lines 1 to
4, the environment variable used for node.environment is referencing a
coordinator-specific variable, creating an unintended dependency. Change the
variable to a worker-specific environment variable by renaming it to something
like PRESTO_WORKER_CONFIG_NODEPROPERTIES_ENVIRONMENT or create an alias in the
worker.env file that maps to the coordinator variable to avoid coupling and
ensure the worker config is self-contained.
…don't exist. Remove placeholder properties from property files.
database_password = _get_config_value(credentials, "database.password") | ||
if not database_user or not database_password: | ||
logger.error( | ||
"database.user and database.password must be specified in '%s'.", credentials_file_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we tell user that to launch clp-package first? Because that will generate the credentials.
If you didn't specify a dataset when compressing your logs in CLP, your logs will have been stored | ||
in the `default` dataset. To query the logs in this dataset: | ||
|
||
```sql | ||
SELECT * FROM default LIMIT 1; | ||
``` | ||
|
||
All kv-pairs in each log event can be queried directly using dot-notation. For example, if your logs | ||
contain the field `foo.bar`, you can query it using: | ||
|
||
```sql | ||
SELECT foo.bar FROM default LIMIT 1; | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we also mention that this only works when the user didn't setup any "required" metadata filters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. That should probably go in the section about configuring the metadata filter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
docs/src/user-guide/guides-using-presto.md (2)
60-64
: Fix grammar in scaling instructionMinor wording issue: missing “one”.
- * To use more than Presto worker, you can use the `--scale` option as follows: + * To use more than one Presto worker, use the `--scale` option:
34-36
: Rephrase to improve sentence flow and avoid repetitionThis mirrors the earlier suggestion that hasn’t been applied.
-Follow the [quick-start](./quick-start/index.md) guide to set up CLP and compress your logs. A -sample dataset that works well with Presto is [postgresql]. +Follow the [quick-start](./quick-start/index.md) guide to set up CLP and compress your logs. The [postgresql] dataset serves as an excellent sample for testing with Presto.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
docs/src/user-guide/guides-using-presto.md
(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: quinntaylormitchell
PR: y-scope/clp#968
File: docs/src/user-guide/quick-start/overview.md:73-109
Timestamp: 2025-06-18T20:39:05.899Z
Learning: The CLP project team prefers to use video content to demonstrate detailed procedural steps (like tarball extraction) rather than including every step in the written documentation, keeping the docs focused on conceptual guidance.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
docs/src/user-guide/guides-using-presto.md (4)
Learnt from: quinntaylormitchell
PR: #968
File: docs/src/user-guide/quick-start/overview.md:73-109
Timestamp: 2025-06-18T20:39:05.899Z
Learning: The CLP project team prefers to use video content to demonstrate detailed procedural steps (like tarball extraction) rather than including every step in the written documentation, keeping the docs focused on conceptual guidance.
Learnt from: quinntaylormitchell
PR: #1069
File: docs/src/user-guide/quick-start/clp-json.md:135-152
Timestamp: 2025-07-05T03:38:16.779Z
Learning: The CLP project team has decided to refrain from using include directives in their documentation at present, preferring to maintain duplicated content rather than using shared includes or partials for de-duplication.
Learnt from: quinntaylormitchell
PR: #961
File: docs/src/dev-guide/design-clp-structured/single-file-archive-format.md:216-219
Timestamp: 2025-06-18T14:35:20.485Z
Learning: In clp-s documentation, technical abbreviations like "MPT" (Merged Parse Tree) should be defined at first use to improve reader clarity and comprehension.
Learnt from: gibber9809
PR: #955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
🪛 LanguageTool
docs/src/user-guide/guides-using-presto.md
[style] ~32-~32: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...'s metadata database and archives. ### Setting up CLP Follow the [quick-start](./quic...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: build-macos (macos-14, false)
- GitHub Check: build-macos (macos-13, false)
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: lint-check (macos-latest)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
docs/src/user-guide/guides-using-presto.md
(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: quinntaylormitchell
PR: y-scope/clp#968
File: docs/src/user-guide/quick-start/overview.md:73-109
Timestamp: 2025-06-18T20:39:05.899Z
Learning: The CLP project team prefers to use video content to demonstrate detailed procedural steps (like tarball extraction) rather than including every step in the written documentation, keeping the docs focused on conceptual guidance.
docs/src/user-guide/guides-using-presto.md (5)
Learnt from: quinntaylormitchell
PR: #968
File: docs/src/user-guide/quick-start/overview.md:73-109
Timestamp: 2025-06-18T20:39:05.899Z
Learning: The CLP project team prefers to use video content to demonstrate detailed procedural steps (like tarball extraction) rather than including every step in the written documentation, keeping the docs focused on conceptual guidance.
Learnt from: quinntaylormitchell
PR: #1069
File: docs/src/user-guide/quick-start/clp-json.md:135-152
Timestamp: 2025-07-05T03:38:16.779Z
Learning: The CLP project team has decided to refrain from using include directives in their documentation at present, preferring to maintain duplicated content rather than using shared includes or partials for de-duplication.
Learnt from: quinntaylormitchell
PR: #961
File: docs/src/dev-guide/design-clp-structured/single-file-archive-format.md:216-219
Timestamp: 2025-06-18T14:35:20.485Z
Learning: In clp-s documentation, technical abbreviations like "MPT" (Merged Parse Tree) should be defined at first use to improve reader clarity and comprehension.
Learnt from: gibber9809
PR: #955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Learnt from: SharafMohamed
PR: #1033
File: components/core/config/schemas.txt:96-97
Timestamp: 2025-07-17T16:02:57.688Z
Learning: In log-surgeon, the .
character is treated as a literal non-delimiter character rather than a regex wildcard that matches any character. This means patterns like .*
in log-surgeon schemas match sequences of non-delimiter characters, not arbitrary characters like in standard regex.
🪛 LanguageTool
docs/src/user-guide/guides-using-presto.md
[style] ~32-~32: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...'s metadata database and archives. ### Setting up CLP Follow the [quick-start](./quic...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[uncategorized] ~77-~77: Although a hyphen is possible, it is not necessary in a compound modifier in which the first word is an adverb that ends in ‘ly’.
Context: ...wing: * A catalog name * A fully-qualified schema name * A fully-qualified t...
(HYPHENATED_LY_ADVERB_ADJECTIVE)
[uncategorized] ~78-~78: Although a hyphen is possible, it is not necessary in a compound modifier in which the first word is an adverb that ends in ‘ly’.
Context: ...A fully-qualified schema name * A fully-qualified table name Filter configs under ...
(HYPHENATED_LY_ADVERB_ADJECTIVE)
🪛 markdownlint-cli2 (0.17.2)
docs/src/user-guide/guides-using-presto.md
54-54: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: lint-check (macos-15)
- GitHub Check: build-macos (macos-15, false)
* To use more than Presto worker, you can use the `--scale` option as follows: | ||
|
||
```bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Grammar: insert missing word in scaling instruction
- * To use more than Presto worker, you can use the `--scale` option as follows:
+ * To use more than one Presto worker, you can use the `--scale` option as follows:
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
* To use more than Presto worker, you can use the `--scale` option as follows: | |
```bash | |
* To use more than one Presto worker, you can use the `--scale` option as follows: |
🤖 Prompt for AI Agents
In docs/src/user-guide/guides-using-presto.md around lines 111 to 113, the
scaling instruction is missing a word for correct grammar. Insert the missing
word "one" after "more than" in the sentence to read "To use more than one
Presto worker, you can use the `--scale` option as follows:" to fix the grammar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (4)
tools/deployment/presto-clp/scripts/generate-user-env-vars-file.py (4)
101-107
: Add early return after storage type validation error.The function logs an error when the storage type is not 'fs' but continues execution instead of returning early with failure status.
Apply this fix:
if "fs" != clp_archive_output_storage_type: logger.error( "Expected CLP's archive_output.storage.type to be fs but found '%s'. Presto" " currently only supports reading archives from the fs storage type.", clp_archive_output_storage_type, ) + return False
124-130
: Improve credential validation to handle whitespace.The current validation only checks if values are truthy, which could miss empty strings after whitespace stripping.
Consider more robust validation:
- database_user = _get_config_value(credentials, "database.user") - database_password = _get_config_value(credentials, "database.password") - if not database_user or not database_password: + database_user = _get_config_value(credentials, "database.user", "").strip() + database_password = _get_config_value(credentials, "database.password", "").strip() + if not database_user or not database_password:
152-158
: Enhance KeyError handling with clearer context.The current error handling doesn't clearly indicate which specific key is missing from the file.
Make the error message more informative:
except KeyError as e: logger.error( - "Missing required key '%s' in '%s'", - e, + "Missing required key %s in '%s'", + str(e), coordinator_common_env_file_path, )
10-23
: Consider simplifying logging configuration.The current manual logging setup could be more maintainable using a structured approach.
Consider using basicConfig for simpler setup:
-# Set up console logging -logging_console_handler = logging.StreamHandler() -logging_formatter = logging.Formatter( - "%(asctime)s.%(msecs)03d %(levelname)s [%(module)s] %(message)s", datefmt="%Y-%m-%dT%H:%M:%S" -) -logging_console_handler.setFormatter(logging_formatter) - -# Set up root logger -root_logger = logging.getLogger() -root_logger.setLevel(logging.INFO) -root_logger.addHandler(logging_console_handler) - -# Create logger -logger = logging.getLogger(__name__) +# Set up logging +logging.basicConfig( + level=logging.INFO, + format="%(asctime)s.%(msecs)03d %(levelname)s [%(module)s] %(message)s", + datefmt="%Y-%m-%dT%H:%M:%S" +) +logger = logging.getLogger(__name__)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
docs/src/user-guide/guides-using-presto.md
(1 hunks)tools/deployment/presto-clp/scripts/generate-user-env-vars-file.py
(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: quinntaylormitchell
PR: y-scope/clp#968
File: docs/src/user-guide/quick-start/overview.md:73-109
Timestamp: 2025-06-18T20:39:05.899Z
Learning: The CLP project team prefers to use video content to demonstrate detailed procedural steps (like tarball extraction) rather than including every step in the written documentation, keeping the docs focused on conceptual guidance.
tools/deployment/presto-clp/scripts/generate-user-env-vars-file.py (7)
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/
, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly
declarations.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/
, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly
declarations.
Learnt from: haiqi96
PR: #594
File: components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py:104-110
Timestamp: 2024-11-15T16:21:52.122Z
Learning: In clp_package_utils/scripts/native/del_archives.py
, when deleting archives, the archive
variable retrieved from the database is controlled and is always a single string without path components. Therefore, it's acceptable to skip additional validation checks for directory traversal in this context.
Learnt from: haiqi96
PR: #634
File: components/job-orchestration/job_orchestration/executor/compress/fs_compression_task.py:0-0
Timestamp: 2024-12-16T21:49:06.483Z
Learning: In fs_compression_task.py
, when refactoring the archive processing loop in Python, the src_archive_file.unlink()
operation should remain outside of the process_archive
function.
Learnt from: haiqi96
PR: #594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in components/clp-package-utils/clp_package_utils/scripts/
, note that it's preferred to keep error handling simple without adding extra complexity.
Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in install-prebuilt-packages.sh
, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like --strip-components=1
to handle potential tarball layout changes.
Learnt from: kirkrodrigues
PR: #881
File: components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh:35-41
Timestamp: 2025-05-06T09:48:55.408Z
Learning: For installation scripts in the CLP project, prefer explicit error handling over automatic dependency resolution (like apt-get install -f
) when installing packages to give users more control over their system.
docs/src/user-guide/guides-using-presto.md (5)
Learnt from: quinntaylormitchell
PR: #968
File: docs/src/user-guide/quick-start/overview.md:73-109
Timestamp: 2025-06-18T20:39:05.899Z
Learning: The CLP project team prefers to use video content to demonstrate detailed procedural steps (like tarball extraction) rather than including every step in the written documentation, keeping the docs focused on conceptual guidance.
Learnt from: quinntaylormitchell
PR: #1069
File: docs/src/user-guide/quick-start/clp-json.md:135-152
Timestamp: 2025-07-05T03:38:16.779Z
Learning: The CLP project team has decided to refrain from using include directives in their documentation at present, preferring to maintain duplicated content rather than using shared includes or partials for de-duplication.
Learnt from: quinntaylormitchell
PR: #961
File: docs/src/dev-guide/design-clp-structured/single-file-archive-format.md:216-219
Timestamp: 2025-06-18T14:35:20.485Z
Learning: In clp-s documentation, technical abbreviations like "MPT" (Merged Parse Tree) should be defined at first use to improve reader clarity and comprehension.
Learnt from: gibber9809
PR: #955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Learnt from: SharafMohamed
PR: #1033
File: components/core/config/schemas.txt:96-97
Timestamp: 2025-07-17T16:02:57.688Z
Learning: In log-surgeon, the .
character is treated as a literal non-delimiter character rather than a regex wildcard that matches any character. This means patterns like .*
in log-surgeon schemas match sequences of non-delimiter characters, not arbitrary characters like in standard regex.
🪛 LanguageTool
docs/src/user-guide/guides-using-presto.md
[style] ~32-~32: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...'s metadata database and archives. ### Setting up CLP Follow the [quick-start](./quic...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[typographical] ~58-~58: It appears that a comma is missing.
Context: ...e/metadata-filter.json`. * For each dataset you want to query, add a filter config ...
(DURING_THAT_TIME_COMMA)
[style] ~75-~75: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...aset>` with the name of the dataset you want to query. (If you didn't specify a ...
(REP_WANT_TO_VB)
🪛 markdownlint-cli2 (0.17.2)
docs/src/user-guide/guides-using-presto.md
74-74: Trailing spaces
Expected: 0 or 2; Actual: 6
(MD009, no-trailing-spaces)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: lint-check (macos-15)
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: build-macos (macos-15, false)
🔇 Additional comments (5)
tools/deployment/presto-clp/scripts/generate-user-env-vars-file.py (2)
26-56
: LGTM! Clean main function structure.The main function follows a clear structure with proper argument parsing, error handling, and file operations. The return values are consistent and the logic flow is straightforward.
163-181
: LGTM! Well-implemented config value retrieval.The function correctly handles dot notation key traversal with proper type checking and graceful fallback to default values.
docs/src/user-guide/guides-using-presto.md (3)
1-24
: LGTM! Clear introduction and comprehensive requirements.The introduction effectively explains Presto's role with CLP and the requirements section provides specific version information that users need.
105-155
: LGTM! Comprehensive querying guide and clear limitations.The querying section provides practical examples with SQL commands, and the limitations section appropriately sets user expectations about current constraints.
58-58
: Consider adding comma for clarity.The sentence structure could benefit from a comma for better readability.
- * For each dataset you want to query, add a filter config of the form: + * For each dataset you want to query, add a filter config of the form:Actually, upon review, the current comma placement is correct. The static analysis suggestion appears to be a false positive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
docs/src/user-guide/guides-using-presto.md (2)
87-93
: Add the missing word “one” in the scaling instructionThe sentence is still missing the determiner after more than.
-* To use more than Presto worker, you can use the `--scale` option as follows: +* To use more than one Presto worker, you can use the `--scale` option as follows:
34-36
: Vary sentence openings to improve flowTwo adjacent sentences begin with Follow / sample, which triggers the style checker. A quick tweak:
-Follow the [quick-start](./quick-start/index.md) guide to set up CLP and compress your logs. A -sample dataset that works well with Presto is [postgresql]. +Consult the [quick-start](./quick-start/index.md) guide to set up CLP and compress your logs. +One sample dataset that works well with Presto is [postgresql].
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
docs/src/user-guide/guides-using-presto.md
(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: quinntaylormitchell
PR: y-scope/clp#968
File: docs/src/user-guide/quick-start/overview.md:73-109
Timestamp: 2025-06-18T20:39:05.899Z
Learning: The CLP project team prefers to use video content to demonstrate detailed procedural steps (like tarball extraction) rather than including every step in the written documentation, keeping the docs focused on conceptual guidance.
docs/src/user-guide/guides-using-presto.md (4)
Learnt from: quinntaylormitchell
PR: #968
File: docs/src/user-guide/quick-start/overview.md:73-109
Timestamp: 2025-06-18T20:39:05.899Z
Learning: The CLP project team prefers to use video content to demonstrate detailed procedural steps (like tarball extraction) rather than including every step in the written documentation, keeping the docs focused on conceptual guidance.
Learnt from: quinntaylormitchell
PR: #1069
File: docs/src/user-guide/quick-start/clp-json.md:135-152
Timestamp: 2025-07-05T03:38:16.779Z
Learning: The CLP project team has decided to refrain from using include directives in their documentation at present, preferring to maintain duplicated content rather than using shared includes or partials for de-duplication.
Learnt from: quinntaylormitchell
PR: #961
File: docs/src/dev-guide/design-clp-structured/single-file-archive-format.md:216-219
Timestamp: 2025-06-18T14:35:20.485Z
Learning: In clp-s documentation, technical abbreviations like "MPT" (Merged Parse Tree) should be defined at first use to improve reader clarity and comprehension.
Learnt from: gibber9809
PR: #955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
🪛 LanguageTool
docs/src/user-guide/guides-using-presto.md
[style] ~32-~32: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...'s metadata database and archives. ### Setting up CLP Follow the [quick-start](./quic...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[typographical] ~58-~58: It appears that a comma is missing.
Context: ...e/metadata-filter.json`. * For each dataset you want to query, add a filter config ...
(DURING_THAT_TIME_COMMA)
[style] ~75-~75: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...aset>` with the name of the dataset you want to query. (If you didn't specify a ...
(REP_WANT_TO_VB)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
All kv-pairs in each log event can be queried directly using dot-notation. For example, if your logs | ||
contain the field `foo.bar`, you can query it using: | ||
|
||
```sql | ||
SELECT foo.bar FROM default LIMIT 1; | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Define the abbreviation “kv-pairs” for clarity
Readers who are new to CLP may not immediately recognise the shorthand.
-All kv-pairs in each log event can be queried directly using dot-notation.
+All key-value (KV) pairs in each log event can be queried directly using dot notation.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
All kv-pairs in each log event can be queried directly using dot-notation. For example, if your logs | |
contain the field `foo.bar`, you can query it using: | |
```sql | |
SELECT foo.bar FROM default LIMIT 1; | |
``` | |
All key-value (KV) pairs in each log event can be queried directly using dot notation. For example, if your logs | |
contain the field `foo.bar`, you can query it using: | |
🤖 Prompt for AI Agents
In docs/src/user-guide/guides-using-presto.md around lines 129 to 134, the term
"kv-pairs" is used without defining the abbreviation. Update the text to define
"kv-pairs" as "key-value pairs" the first time it appears to improve clarity for
readers unfamiliar with the shorthand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deferring to @kirkrodrigues 's approval
Description
This PR creates a docker-compose project that can quickly set up a docker stack with Presto coordinator and Presstimmo worker for CLP connector E2E demo.
Checklist
breaking change.
Validation performed
Did E2E testing with processed MongoDB 256MB dataset.
Summary by CodeRabbit
New Features
Documentation
Chores