Skip to content

Conversation

anlowee
Copy link
Member

@anlowee anlowee commented Jul 27, 2025

Description

This PR creates a docker-compose project that can quickly set up a docker stack with Presto coordinator and Presstimmo worker for CLP connector E2E demo.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Did E2E testing with processed MongoDB 256MB dataset.

Summary by CodeRabbit

New Features

  • Added deployment scripts, configuration templates, and environment files for setting up Presto with CLP integration using Docker Compose.
  • Introduced automated configuration generation for both coordinator and worker nodes.
  • Added environment variable support for flexible deployment.
  • Included JVM and logging configuration templates for the Presto coordinator.
  • Added scripts to generate user environment variable files and automate setup with Python virtual environments.

Documentation

  • Added a new user guide on using Presto with CLP, including setup instructions and usage examples.
  • Updated the documentation index and overview to include the new Presto guide.

Chores

  • Expanded linting to cover deployment files.
  • Added dependency requirements for deployment scripts.
  • Added .gitignore for deployment scripts directory.

@anlowee anlowee requested a review from a team as a code owner July 27, 2025 18:11
Copy link
Contributor

coderabbitai bot commented Jul 27, 2025

Walkthrough

This update introduces a comprehensive Presto deployment for CLP, including Docker Compose orchestration, configuration templates, environment variable management, and setup automation scripts. It also adds user documentation for using Presto with CLP, expands linting coverage, and provides environment and configuration files for both coordinator and worker nodes.

Changes

Cohort / File(s) Change Summary
Presto Coordinator Config Templates
tools/deployment/presto-clp/coordinator/config-template/clp.properties, .../config.properties, .../jvm.config, .../log.properties, .../metadata-filter.json, .../node.properties
Added configuration templates for Presto coordinator including connector, JVM, logging, metadata filter, and node properties with environment variable placeholders.
Presto Worker Config Templates
tools/deployment/presto-clp/worker/config-template/clp.properties, .../config.properties, .../node.properties, .../velox.properties
Added configuration templates for Presto worker with connector, config, node properties, and Velox settings using environment variables.
Coordinator Scripts
tools/deployment/presto-clp/coordinator/scripts/generate-configs.sh
New shell script to generate Presto coordinator config files from templates, substitute environment variables, and organize catalog files.
Worker Scripts
tools/deployment/presto-clp/worker/scripts/generate-configs.sh
New shell script for Presto worker: generates configs, substitutes env vars, fetches coordinator version, updates node properties, and logs changes.
Presto Docker Compose Deployment
tools/deployment/presto-clp/docker-compose.yaml
Added Docker Compose file defining coordinator and worker services, health checks, environment files, volumes, and network for Presto cluster deployment.
Environment Files
tools/deployment/presto-clp/coordinator-common.env, .../coordinator.env, .../worker.env
Added environment variable files for coordinator and worker, specifying ports, service names, memory limits, and other configuration values.
Deployment Setup Scripts
tools/deployment/presto-clp/scripts/generate-user-env-vars-file.py, .../set-up-config.sh, .../requirements.txt
Added Python and Bash scripts for environment setup, env var file generation from CLP configs, and requirements for Python dependencies.
Scripts Directory Ignore
tools/deployment/presto-clp/scripts/.gitignore
Added .gitignore to exclude .venv/ in scripts directory.
Documentation - User Guide
docs/src/user-guide/guides-overview.md, .../guides-using-presto.md, .../index.md
Added a new user guide for Presto with CLP, updated overview and index to reference the new guide.
Linting Configuration
taskfiles/lint.yaml
Expanded linting scope to include tools/deployment for YAML and Python linting tasks.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant SetupScript as set-up-config.sh
    participant PyScript as generate-user-env-vars-file.py
    participant DockerCompose
    participant Coordinator
    participant Worker

    User->>SetupScript: Run with CLP package dir
    SetupScript->>PyScript: Generate .env from CLP configs
    PyScript->>SetupScript: Write .env file
    User->>DockerCompose: docker-compose up
    DockerCompose->>Coordinator: Start with entrypoint (generate-configs.sh)
    DockerCompose->>Worker: Start with entrypoint (generate-configs.sh)
    Coordinator->>Coordinator: Generate configs from templates
    Worker->>Worker: Generate configs, fetch coordinator version, update properties
    Coordinator->>Coordinator: Start Presto server
    Worker->>Worker: Start Presto server
    User->>Coordinator: Query logs via Presto CLI
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 17

🔭 Outside diff range comments (2)
tools/deployment/presto-clp/demo-assets/clp-config.yml (1)

39-40: Remove trailing blank lines.

Static analysis flagged excessive blank lines at the end of the file.

-
-
+
tools/deployment/presto-clp/docker-compose.yaml (1)

39-40: Remove trailing blank lines.

Static analysis flagged excessive blank lines at the end of the file.

-
-
+
♻️ Duplicate comments (2)
tools/deployment/presto-clp/coordinator/scripts/$ (2)

8-8: Fix typo in comment.

Same spelling error as in the other script.

-# Substitute environemnt variables in config template
+# Substitute environment variables in config template

8-11: Security risk: Avoid shell evaluation for template processing.

Same security issue as in the other generate-configs.sh script. The shell evaluation approach poses risks if template files contain malicious code.

Use envsubst for safer environment variable substitution as suggested in the other script review.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b0a607f and 3a84840.

📒 Files selected for processing (18)
  • tools/deployment/presto-clp/.env (1 hunks)
  • tools/deployment/presto-clp/README.md (1 hunks)
  • tools/deployment/presto-clp/coordinator/config-template/clp.properties (1 hunks)
  • tools/deployment/presto-clp/coordinator/config-template/config.properties (1 hunks)
  • tools/deployment/presto-clp/coordinator/config-template/jvm.config (1 hunks)
  • tools/deployment/presto-clp/coordinator/config-template/log.properties (1 hunks)
  • tools/deployment/presto-clp/coordinator/config-template/metadata-filter.json (1 hunks)
  • tools/deployment/presto-clp/coordinator/config-template/node.properties (1 hunks)
  • tools/deployment/presto-clp/coordinator/scripts/$ (1 hunks)
  • tools/deployment/presto-clp/coordinator/scripts/generate-configs.sh (1 hunks)
  • tools/deployment/presto-clp/demo-assets/clp-config.yml (1 hunks)
  • tools/deployment/presto-clp/demo-assets/init.sh (1 hunks)
  • tools/deployment/presto-clp/docker-compose.yaml (1 hunks)
  • tools/deployment/presto-clp/worker/config-template/clp.properties (1 hunks)
  • tools/deployment/presto-clp/worker/config-template/config.properties (1 hunks)
  • tools/deployment/presto-clp/worker/config-template/node.properties (1 hunks)
  • tools/deployment/presto-clp/worker/config-template/velox.properties (1 hunks)
  • tools/deployment/presto-clp/worker/scripts/generate-configs.sh (1 hunks)
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: quinntaylormitchell
PR: y-scope/clp#968
File: docs/src/user-guide/quick-start/overview.md:73-109
Timestamp: 2025-06-18T20:39:05.899Z
Learning: The CLP project team prefers to use video content to demonstrate detailed procedural steps (like tarball extraction) rather than including every step in the written documentation, keeping the docs focused on conceptual guidance.
tools/deployment/presto-clp/worker/config-template/clp.properties (2)

Learnt from: gibber9809
PR: #955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.

Learnt from: quinntaylormitchell
PR: #961
File: docs/src/dev-guide/design-clp-structured/single-file-archive-format.md:216-219
Timestamp: 2025-06-18T14:35:20.485Z
Learning: In clp-s documentation, technical abbreviations like "MPT" (Merged Parse Tree) should be defined at first use to improve reader clarity and comprehension.

tools/deployment/presto-clp/.env (2)

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly declarations.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly declarations.

tools/deployment/presto-clp/coordinator/config-template/clp.properties (1)

Learnt from: gibber9809
PR: #955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.

tools/deployment/presto-clp/coordinator/scripts/$ (6)

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly declarations.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly declarations.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in install-prebuilt-packages.sh, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like --strip-components=1 to handle potential tarball layout changes.

Learnt from: haiqi96
PR: #594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in components/clp-package-utils/clp_package_utils/scripts/, note that it's preferred to keep error handling simple without adding extra complexity.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-aarch64/build.sh:3-5
Timestamp: 2025-07-07T17:43:04.349Z
Learning: In CLP project build scripts (specifically build.sh files in docker-images directories), maintain consistency with the established pattern of using separate set -eu and set -o pipefail commands rather than combining them into set -euo pipefail, to ensure uniform script structure across all platform build scripts.

Learnt from: kirkrodrigues
PR: #881
File: components/core/tools/scripts/lib_install/macos/install-all.sh:11-12
Timestamp: 2025-05-06T10:07:04.654Z
Learning: In CLP installation scripts, temporary directories with downloaded files should not be automatically cleaned up on failure (e.g., with EXIT traps) to preserve artifacts for debugging purposes.

tools/deployment/presto-clp/coordinator/scripts/generate-configs.sh (5)

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly declarations.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly declarations.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in install-prebuilt-packages.sh, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like --strip-components=1 to handle potential tarball layout changes.

Learnt from: gibber9809
PR: #955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.

Learnt from: haiqi96
PR: #594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in components/clp-package-utils/clp_package_utils/scripts/, note that it's preferred to keep error handling simple without adding extra complexity.

tools/deployment/presto-clp/demo-assets/clp-config.yml (3)

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly declarations.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly declarations.

Learnt from: Bill-hbrhbr
PR: #1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.

tools/deployment/presto-clp/worker/scripts/generate-configs.sh (7)

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly declarations.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly declarations.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in install-prebuilt-packages.sh, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like --strip-components=1 to handle potential tarball layout changes.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-x86/build.sh:18-24
Timestamp: 2025-07-01T14:52:02.418Z
Learning: In the CLP project, consistency across platform build scripts is prioritized over defensive programming when it comes to git remote handling. All build.sh files in docker-images directories should follow the same pattern for git metadata injection.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-x86/build.sh:18-24
Timestamp: 2025-07-01T14:52:02.418Z
Learning: In the CLP project, consistency across platform build scripts is prioritized over defensive programming when it comes to git remote handling. All build.sh files in docker-images directories should follow the same pattern for git metadata injection.

Learnt from: haiqi96
PR: #594
File: components/package-template/src/sbin/del-archives.sh:6-9
Timestamp: 2024-11-15T16:28:08.644Z
Learning: For wrapper scripts in the components/package-template/src/sbin/ directory, keep them simple and avoid adding additional validation code.

Learnt from: haiqi96
PR: #594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in components/clp-package-utils/clp_package_utils/scripts/, note that it's preferred to keep error handling simple without adding extra complexity.

tools/deployment/presto-clp/demo-assets/init.sh (11)

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly declarations.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly declarations.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-x86/build.sh:18-24
Timestamp: 2025-07-01T14:52:02.418Z
Learning: In the CLP project, consistency across platform build scripts is prioritized over defensive programming when it comes to git remote handling. All build.sh files in docker-images directories should follow the same pattern for git metadata injection.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-x86/build.sh:18-24
Timestamp: 2025-07-01T14:52:02.418Z
Learning: In the CLP project, consistency across platform build scripts is prioritized over defensive programming when it comes to git remote handling. All build.sh files in docker-images directories should follow the same pattern for git metadata injection.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in install-prebuilt-packages.sh, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like --strip-components=1 to handle potential tarball layout changes.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install or apk update && apk add), preserve this pattern for uniform script structure.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install or apk update && apk add), preserve this pattern for uniform script structure.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install or apk update && apk add), preserve this pattern for uniform script structure.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across different platforms (e.g., using separate update and install commands like apk update && apk add, apt update && apt install, yum update && yum install) rather than platform-specific optimizations, to ensure uniform script structure and readability.

Learnt from: haiqi96
PR: #594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in components/clp-package-utils/clp_package_utils/scripts/, note that it's preferred to keep error handling simple without adding extra complexity.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-aarch64/build.sh:3-5
Timestamp: 2025-07-07T17:43:04.349Z
Learning: In CLP project build scripts (specifically build.sh files in docker-images directories), maintain consistency with the established pattern of using separate set -eu and set -o pipefail commands rather than combining them into set -euo pipefail, to ensure uniform script structure across all platform build scripts.

tools/deployment/presto-clp/README.md (4)

Learnt from: quinntaylormitchell
PR: #968
File: docs/src/user-guide/quick-start/overview.md:73-109
Timestamp: 2025-06-18T20:39:05.899Z
Learning: The CLP project team prefers to use video content to demonstrate detailed procedural steps (like tarball extraction) rather than including every step in the written documentation, keeping the docs focused on conceptual guidance.

Learnt from: Bill-hbrhbr
PR: #1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.

Learnt from: quinntaylormitchell
PR: #1069
File: docs/src/user-guide/quick-start/clp-json.md:135-152
Timestamp: 2025-07-05T03:38:16.779Z
Learning: The CLP project team has decided to refrain from using include directives in their documentation at present, preferring to maintain duplicated content rather than using shared includes or partials for de-duplication.

Learnt from: quinntaylormitchell
PR: #961
File: docs/src/dev-guide/design-clp-structured/single-file-archive-format.md:216-219
Timestamp: 2025-06-18T14:35:20.485Z
Learning: In clp-s documentation, technical abbreviations like "MPT" (Merged Parse Tree) should be defined at first use to improve reader clarity and comprehension.

🪛 dotenv-linter (3.3.0)
tools/deployment/presto-clp/.env

[warning] 2-2: [QuoteCharacter] The value has quote characters (', ")


[warning] 3-3: [QuoteCharacter] The value has quote characters (', ")


[warning] 6-6: [QuoteCharacter] The value has quote characters (', ")


[warning] 7-7: [QuoteCharacter] The value has quote characters (', ")


[warning] 7-7: [UnorderedKey] The PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_URL key should go before the PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_PROVIDERTYPE key


[warning] 8-8: [QuoteCharacter] The value has quote characters (', ")


[warning] 8-8: [UnorderedKey] The PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_NAME key should go before the PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_URL key


[warning] 9-9: [QuoteCharacter] The value has quote characters (', ")


[warning] 9-9: [UnorderedKey] The PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_USER key should go before the PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_PROVIDERTYPE key


[warning] 10-10: [QuoteCharacter] The value has quote characters (', ")


[warning] 10-10: [UnorderedKey] The PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_PASSWORD key should go before the PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_URL key


[warning] 11-11: [QuoteCharacter] The value has quote characters (', ")


[warning] 12-12: [QuoteCharacter] The value has quote characters (', ")


[warning] 15-15: [QuoteCharacter] The value has quote characters (', ")


[warning] 16-16: [QuoteCharacter] The value has quote characters (', ")


[warning] 19-19: [QuoteCharacter] The value has quote characters (', ")


[warning] 20-20: [QuoteCharacter] The value has quote characters (', ")


[warning] 20-20: [UnorderedKey] The PRESTO_COORDINATOR_CONFIG_JVMCONFIG_G1HEAPREGIONSIZE key should go before the PRESTO_COORDINATOR_CONFIG_JVMCONFIG_MAXHEAPSIZE key


[warning] 23-23: [QuoteCharacter] The value has quote characters (', ")


[warning] 26-26: [QuoteCharacter] The value has quote characters (', ")


[warning] 29-29: [QuoteCharacter] The value has quote characters (', ")


[warning] 32-32: [QuoteCharacter] The value has quote characters (', ")

🪛 Shellcheck (0.10.0)
tools/deployment/presto-clp/coordinator/scripts/generate-configs.sh

[info] 10-10: Double quote to prevent globbing and word splitting.

(SC2086)

tools/deployment/presto-clp/worker/scripts/generate-configs.sh

[info] 10-10: Double quote to prevent globbing and word splitting.

(SC2086)


[warning] 81-81: In POSIX sh, HOSTNAME is undefined.

(SC3028)


[warning] 82-82: In POSIX sh, HOSTNAME is undefined.

(SC3028)

tools/deployment/presto-clp/demo-assets/init.sh

[warning] 6-6: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 20-20: Declare and assign separately to avoid masking return values.

(SC2155)


[info] 20-20: Double quote to prevent globbing and word splitting.

(SC2086)


[warning] 21-21: Declare and assign separately to avoid masking return values.

(SC2155)


[info] 21-21: Double quote to prevent globbing and word splitting.

(SC2086)


[info] 39-39: Double quote to prevent globbing and word splitting.

(SC2086)

🪛 YAMLlint (1.37.1)
tools/deployment/presto-clp/docker-compose.yaml

[error] 39-39: too many blank lines (1 > 0)

(empty-lines)

tools/deployment/presto-clp/demo-assets/clp-config.yml

[error] 39-39: too many blank lines (1 > 0)

(empty-lines)

🪛 markdownlint-cli2 (0.17.2)
tools/deployment/presto-clp/README.md

7-7: Multiple top-level headings in the same document

(MD025, single-title, single-h1)


22-22: Ordered list item prefix
Expected: 1; Actual: 5; Style: 1/1/1

(MD029, ol-prefix)


29-29: Ordered list item prefix
Expected: 1; Actual: 6; Style: 1/1/1

(MD029, ol-prefix)


35-35: Multiple top-level headings in the same document

(MD025, single-title, single-h1)


44-44: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


44-44: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


48-48: Multiple top-level headings in the same document

(MD025, single-title, single-h1)


48-48: Trailing punctuation in heading
Punctuation: ':'

(MD026, no-trailing-punctuation)


61-61: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


65-65: Multiple top-level headings in the same document

(MD025, single-title, single-h1)


71-71: Multiple consecutive blank lines
Expected: 1; Actual: 2

(MD012, no-multiple-blanks)


72-72: Multiple consecutive blank lines
Expected: 1; Actual: 3

(MD012, no-multiple-blanks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: lint-check (ubuntu-latest)
🔇 Additional comments (16)
tools/deployment/presto-clp/worker/config-template/node.properties (1)

1-4: Verify REPLACE_ME placeholders are always substituted at container start-up

If generate-configs.sh fails, leaving node.internal-address or node.id as REPLACE_ME, the worker will refuse to start. Consider adding a defensive check in the script (e.g., grep -q REPLACE_ME with exit 1) so mis-substitution is caught early in CI rather than at runtime.

tools/deployment/presto-clp/worker/config-template/clp.properties (1)

1-1: LGTM – matches Presto catalogue naming conventions

tools/deployment/presto-clp/worker/config-template/velox.properties (1)

1-1: Confirm mutable config is desired inside containers

mutable-config=true allows live reloads but also permits runtime edits that diverge from the committed template. Double-check this aligns with your security/operational requirements.

tools/deployment/presto-clp/coordinator/config-template/log.properties (1)

1-1: LGTM – log level wired to env var

tools/deployment/presto-clp/coordinator/config-template/clp.properties (1)

9-9: Hard-wired path may break if etc is relocated

clp.metadata-filter-config is fixed to /opt/presto-server/etc/metadata-filter.json. If the image’s launcher-config overrides PRESTO_HOME or mounts configs elsewhere, Presto will fail to resolve the file. Please verify the path or parameterize it like the other fields.

tools/deployment/presto-clp/worker/config-template/config.properties (1)

1-6: Ensure presto.version placeholder is replaced at run time

A literal REPLACE_ME will cause workers to refuse to join the cluster. Confirm that scripts/generate-configs.sh always substitutes the value before Presto starts, even on restart scenarios.

tools/deployment/presto-clp/.env (1)

35-36: CLP_PACKAGE_ARCHIVES placeholder left

CLP_PACKAGE_ARCHIVES=REPLACE_ME must be set to a valid host path or the volume mount in docker-compose.yaml will fail.

tools/deployment/presto-clp/coordinator/config-template/config.properties (1)

1-14: Well-structured Presto coordinator configuration.

The configuration properly sets up the coordinator with appropriate settings for discovery, memory management, and feature flags. The environment variable placeholders follow a consistent naming convention and will be properly substituted by the generation scripts.

tools/deployment/presto-clp/demo-assets/clp-config.yml (2)

33-38: Clarify the localhost vs placeholder IP usage.

The web UI services use "localhost" while other services use "${REPLACE_IP}" placeholders. This might be intentional for security (keeping web UIs local-only), but please verify this is the intended behaviour.


1-39: Comprehensive CLP service configuration.

The YAML structure properly defines all necessary CLP services with appropriate connection details and configuration parameters. The placeholder pattern for IP addresses will be correctly processed by the associated scripts.

tools/deployment/presto-clp/docker-compose.yaml (4)

5-5: Verify the use of "dev" tagged images.

The services use images with "dev" tags from the GitHub Container Registry. Ensure these are the intended images for the deployment and consider whether versioned tags would be more appropriate for production use.

Also applies to: 17-17


6-6: Well-designed custom entrypoint approach.

The entrypoint configuration that runs config generation before starting services is a good pattern for dynamic configuration management in containerized deployments.

Also applies to: 21-21


28-28: Good use of environment variable for flexible volume mounting.

Using ${CLP_PACKAGE_ARCHIVES} for the volume mount path allows flexible configuration of the data directory location.


1-39: Well-structured Docker Compose deployment.

The configuration properly orchestrates a Presto cluster with appropriate service dependencies, networking, and volume management. The integration with configuration templates and generation scripts provides a flexible deployment solution.

tools/deployment/presto-clp/worker/scripts/generate-configs.sh (2)

67-83: Use hostname for node ID – $HOSTNAME is undefined in POSIX sh

BusyBox /bin/sh doesn’t set $HOSTNAME. Use $(hostname).

-if grep -q '^node.id=REPLACE_ME' "$NODE_PROPERTIES_FILE"; then
-    sed -i "s|^node.id=REPLACE_ME|node.id=$HOSTNAME|" "$NODE_PROPERTIES_FILE"
+if grep -q '^node.id=REPLACE_ME' "$NODE_PROPERTIES_FILE"; then
+    sed -i "s|^node.id=REPLACE_ME|node.id=$(hostname)|" "$NODE_PROPERTIES_FILE"

30-37: Unquoted DISCOVERY_URI may break on spaces; quote variable in wget

If the value in config.properties ever contains spaces, wget receives split tokens. Wrap it:

-if response=$(wget -qO- --timeout=10 "${DISCOVERY_URI}/v1/info" 2>/dev/null); then
+if response=$(wget -qO- --timeout=10 "${DISCOVERY_URI}/v1/info" 2>/dev/null); then

(no functional diff needed, but keep double quotes around variable)

Likely an incorrect or invalid review comment.

Comment on lines 2 to 3
PRESTO_COORDINATOR_HTTPPORT="8080"
PRESTO_COORDINATOR_SERVICENAME="presto-coordinator"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Quoted values in .env are passed verbatim – will include the quotes

Docker Compose does not strip " in .env, so PRESTO_COORDINATOR_HTTPPORT will become "8080" (quotes included).
That breaks integer expectations in config.properties and JDBC URLs.
Strip the quotes or switch to YAML-style environment: blocks.

-PRESTO_COORDINATOR_HTTPPORT="8080"
-PRESTO_COORDINATOR_SERVICENAME="presto-coordinator"
+PRESTO_COORDINATOR_HTTPPORT=8080
+PRESTO_COORDINATOR_SERVICENAME=presto-coordinator
@@
-PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_PROVIDERTYPE="mysql"
-
+PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_PROVIDERTYPE=mysql
+

(Apply similarly to the remaining keys.)

Also applies to: 6-33

🧰 Tools
🪛 dotenv-linter (3.3.0)

[warning] 2-2: [QuoteCharacter] The value has quote characters (', ")


[warning] 3-3: [QuoteCharacter] The value has quote characters (', ")

🤖 Prompt for AI Agents
In tools/deployment/presto-clp/.env lines 2 to 33, the environment variable
values are enclosed in quotes, causing the quotes to be included literally in
the values. Remove the surrounding double quotes from all variable assignments
in this range to ensure the values are interpreted correctly by Docker Compose
and downstream configurations.

Comment on lines 1 to 10
-server
-Xmx${PRESTO_COORDINATOR_CONFIG_JVMCONFIG_MAXHEAPSIZE}
-XX:+UseG1GC
-XX:G1HeapRegionSize=${PRESTO_COORDINATOR_CONFIG_JVMCONFIG_G1HEAPREGIONSIZE}
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
-Djdk.attach.allowAttachSelf=true

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Add container-aware memory flags & optional G1 tuning

-Xmx hard-codes the ceiling and ignores the cgroup limit the container actually gets. Presto’s launcher (and Java ≥ 11) support:

-XX:+UseContainerSupport
-XX:MaxRAMPercentage=80

These keep the heap bounded to ~80 % of the container limit without manual tuning. Consider replacing -Xmx… or combining it with these flags.

🤖 Prompt for AI Agents
In tools/deployment/presto-clp/coordinator/config-template/jvm.config lines 1 to
10, the current JVM options use a fixed -Xmx value which does not respect
container memory limits. To fix this, remove or complement the -Xmx flag with
container-aware flags by adding -XX:+UseContainerSupport and
-XX:MaxRAMPercentage=80. This will allow the JVM heap size to adapt dynamically
to the container's memory limit, improving resource management without manual
tuning.

Comment on lines +1 to +2
{
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Provide a minimal example or document expected schema

An empty JSON object is syntactically valid, but future maintainers may be unsure what keys are supported. A commented exemplar or pointer to docs beside this file would improve clarity without affecting runtime.

🤖 Prompt for AI Agents
In tools/deployment/presto-clp/coordinator/config-template/metadata-filter.json
at lines 1 to 2, the JSON file is currently empty, which may confuse future
maintainers about the expected keys. Add a minimal example JSON object with
typical keys and values or include comments or a reference to documentation
explaining the expected schema to improve clarity without impacting runtime
behavior.

Comment on lines 1 to 3
node.environment=${PRESTO_COORDINATOR_CONFIG_NODEPROPERTIES_ENVIRONMENT}
node.id=${PRESTO_COORDINATOR_SERVICENAME}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

node.data-dir missing & static node.id risks collisions

Presto will refuse to start without node.data-dir, and using a constant value for node.id means every fresh container (or an additional coordinator) will register with the same ID, leading to “Node already exists” errors.

 node.environment=${PRESTO_COORDINATOR_CONFIG_NODEPROPERTIES_ENVIRONMENT}
-node.id=${PRESTO_COORDINATOR_SERVICENAME}
+# A unique ID per container – fall back to HOSTNAME if not supplied
+node.id=${PRESTO_COORDINATOR_NODE_ID:-${HOSTNAME}}
+# Mandatory data directory
+node.data-dir=/opt/presto-server/data
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
node.environment=${PRESTO_COORDINATOR_CONFIG_NODEPROPERTIES_ENVIRONMENT}
node.id=${PRESTO_COORDINATOR_SERVICENAME}
node.environment=${PRESTO_COORDINATOR_CONFIG_NODEPROPERTIES_ENVIRONMENT}
# A unique ID per container – fall back to HOSTNAME if not supplied
node.id=${PRESTO_COORDINATOR_NODE_ID:-${HOSTNAME}}
# Mandatory data directory
node.data-dir=/opt/presto-server/data
🤖 Prompt for AI Agents
In tools/deployment/presto-clp/coordinator/config-template/node.properties at
lines 1 to 3, add the required property node.data-dir with an appropriate
environment variable or path to ensure Presto starts correctly. Also, replace
the static node.id value with a dynamic or unique identifier, such as an
environment variable or generated UUID, to prevent node ID collisions when
multiple coordinators or containers run simultaneously.

Comment on lines 14 to 15
rm -rf ${PRESTO_CONFIG_DIR}/catalog
mkdir -p ${PRESTO_CONFIG_DIR}/catalog
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider consistency with the other generate-configs.sh script.

This script uses rm -rf and mkdir -p for catalog directory management, while the other version uses rm -f. Consider standardizing the approach across both scripts.

Also fix the quoting issue:

-rm -rf ${PRESTO_CONFIG_DIR}/catalog
-mkdir -p ${PRESTO_CONFIG_DIR}/catalog
+rm -rf "${PRESTO_CONFIG_DIR}/catalog"
+mkdir -p "${PRESTO_CONFIG_DIR}/catalog"
🤖 Prompt for AI Agents
In tools/deployment/presto-clp/coordinator/scripts/$ around lines 14 to 15, the
script uses `rm -rf` to remove the catalog directory, which is inconsistent with
the other generate-configs.sh script that uses `rm -f`. Change the removal
command to use `rm -f` if only files are being removed or adjust both scripts to
use the same method for directory cleanup. Additionally, fix any quoting issues
by properly quoting variables like "${PRESTO_CONFIG_DIR}/catalog" to prevent
word splitting or globbing errors.

Comment on lines 1 to 7
# Setup local docker stack for presto + clp

## Install docker

Follow the guide here: [docker]

# Launch clp-package
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Convert secondary titles to H2 and fix punctuation for Markdown lint-cleanliness

The document currently contains five #-level headings (Setup…, Launch…, Create Docker Cluster, Use cli: and Delete docker Cluster).
Markdown-lint (MD025/MD026) flags this as multiple H1s and trailing punctuation. Keeping a single top-level H1 improves SEO and stops most linters from failing CI.

-# Launch clp-package
+# ## Launch clp-package-# Create Docker Cluster
+# ## Create Docker Cluster-# Use cli:
+# ## Use CLI-# Delete docker Cluster
+# ## Delete Docker Cluster

Also applies to: 35-41, 48-57, 65-69

🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

7-7: Multiple top-level headings in the same document

(MD025, single-title, single-h1)

🤖 Prompt for AI Agents
In tools/deployment/presto-clp/README.md around lines 1 to 7 and also lines
35-41, 48-57, and 65-69, change all secondary titles currently using a single #
(H1) to use ## (H2) to ensure only one top-level H1 heading in the document.
Additionally, remove any trailing punctuation from these headings to comply with
markdown lint rules MD025 and MD026. This will improve SEO and prevent linter
failures in CI.

Comment on lines 61 to 63
```sql
SELECT * FROM default LIMIT 1;
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Minor: qualify SQL fenced block for syntax highlighting

Add sql after the back-ticks.

-```sql
+```sql
 SELECT * FROM default LIMIT 1;

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.17.2)</summary>

61-61: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

In tools/deployment/presto-clp/README.md around lines 61 to 63, the SQL code
block is missing the language specifier for syntax highlighting. Add "sql"
immediately after the opening triple backticks to enable proper SQL syntax
highlighting for the code block.


</details>

<!-- fingerprinting:phantom:triton:cougar -->

<!-- This is an auto-generated comment by CodeRabbit -->

Comment on lines 8 to 11
# Substitute environemnt variables in config template
find /configs -type f | while read -r f; do
( echo "cat <<EOF"; cat $f; echo "EOF" ) | sh > "${PRESTO_CONFIG_DIR}/$(basename "$f")"
done
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Template expansion executes arbitrary shell – switch to envsubst

Piping the template through sh (cat <<EOF … | sh) will execute every line, not just expand variables. If anyone edits a template and slips in rm -rf /, the worker container obeys. Use envsubst (in gettext-base) or a safe substitution tool instead.

-  ( echo "cat <<EOF"; cat "$f"; echo "EOF" ) | sh > "${PRESTO_CONFIG_DIR}/$(basename "$f")"
+  envsubst < "$f" > "${PRESTO_CONFIG_DIR}/$(basename "$f")"

Also quote $f to satisfy SC2086.

🧰 Tools
🪛 Shellcheck (0.10.0)

[info] 10-10: Double quote to prevent globbing and word splitting.

(SC2086)

🤖 Prompt for AI Agents
In tools/deployment/presto-clp/worker/scripts/generate-configs.sh around lines 8
to 11, the current template expansion uses a shell execution that runs arbitrary
commands, which is unsafe. Replace the use of `cat <<EOF ... | sh` with
`envsubst` to safely substitute environment variables without executing
commands. Also, ensure to quote the variable `$f` in the `cat` command to
prevent word splitting and globbing issues.

@kirkrodrigues kirkrodrigues changed the title tools: Add docker compose for Presto-Velox-CLP cluster. feat(presto-clp): Add Docker compose set up for Presto cluster that can connect to CLP package. Jul 27, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

♻️ Duplicate comments (13)
tools/deployment/presto-clp/coordinator/config-template/metadata-filter.json (1)

1-2: Add minimal schema example or link to docs

An entirely empty JSON object gives no clue what keys are supported. Provide a commented exemplar or reference documentation to aid maintainers.

tools/deployment/presto-clp/coordinator/config-template/node.properties (1)

1-2: node.data-dir still missing and node.id is static
(See earlier review)

Presto fails to boot without node.data-dir, and a constant node.id causes collisions when multiple containers start.

 node.environment=${PRESTO_COORDINATOR_CONFIG_NODEPROPERTIES_ENVIRONMENT}
-node.id=${PRESTO_COORDINATOR_SERVICENAME}
+# Unique per container; fall back to the host id
+node.id=${PRESTO_COORDINATOR_NODE_ID:-${HOSTNAME}}
+node.data-dir=/opt/presto-server/data
tools/deployment/presto-clp/coordinator/config-template/jvm.config (1)

1-9: Consider container-aware memory flags instead of fixed -Xmx

Java 11 already honours cgroup limits when given:

-XX:+UseContainerSupport
-XX:MaxRAMPercentage=80

This avoids hard-coding ${…MAXHEAPSIZE} and keeps the heap proportional to the container limit.

tools/deployment/presto-clp/coordinator/scripts/generate-configs.sh (4)

8-8: Typo: “environemnt” → “environment”


9-11: Replace unsafe eval-based template substitution
The pipeline that funnels templates through sh executes their contents, creating a command-injection vector and breaking on filenames with spaces. Prefer envsubst (or awk/sed) with proper quoting.


14-14: Quote variable in rm command
Unquoted expansion may misbehave on paths containing spaces.


17-17: Quote variables in mv command
Protects against word-splitting and globbing issues.

tools/deployment/presto-clp/README.md (5)

1-8: Multiple H1 headings (MD025) – change subsequent # headings to ##.


20-29: Ordered list numbering jumps (MD029) – restart numbering or use bullets.


41-44: Add language hint & blank lines around fenced block (MD031/MD040)


46-46: Heading ends with “:” and is H1 – use ## Use CLI without trailing punctuation.


59-61: Surround SQL block with blank lines & keep sql tag (MD031)

tools/deployment/presto-clp/worker/scripts/generate-configs.sh (1)

8-11: Template expansion executes arbitrary shell – switch to envsubst

This security vulnerability was previously flagged. Piping the template through sh will execute every line, not just expand variables. Use envsubst for safe variable substitution instead.

-# Substitute environemnt variables in config template
+# Substitute environment variables in config template
 find /configs -type f | while read -r f; do
-  ( echo "cat <<EOF"; cat $f; echo "EOF" ) | sh > "${PRESTO_CONFIG_DIR}/$(basename "$f")"
+  envsubst < "$f" > "${PRESTO_CONFIG_DIR}/$(basename "$f")"
 done
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3a84840 and 2015a4a.

📒 Files selected for processing (22)
  • taskfiles/lint.yaml (2 hunks)
  • tools/deployment/presto-clp/README.md (1 hunks)
  • tools/deployment/presto-clp/coordinator-common.env (1 hunks)
  • tools/deployment/presto-clp/coordinator.env (1 hunks)
  • tools/deployment/presto-clp/coordinator/config-template/clp.properties (1 hunks)
  • tools/deployment/presto-clp/coordinator/config-template/config.properties (1 hunks)
  • tools/deployment/presto-clp/coordinator/config-template/jvm.config (1 hunks)
  • tools/deployment/presto-clp/coordinator/config-template/log.properties (1 hunks)
  • tools/deployment/presto-clp/coordinator/config-template/metadata-filter.json (1 hunks)
  • tools/deployment/presto-clp/coordinator/config-template/node.properties (1 hunks)
  • tools/deployment/presto-clp/coordinator/scripts/generate-configs.sh (1 hunks)
  • tools/deployment/presto-clp/docker-compose.yaml (1 hunks)
  • tools/deployment/presto-clp/scripts/.gitignore (1 hunks)
  • tools/deployment/presto-clp/scripts/generate-user-env-vars-file.py (1 hunks)
  • tools/deployment/presto-clp/scripts/requirements.txt (1 hunks)
  • tools/deployment/presto-clp/scripts/set-up-config.sh (1 hunks)
  • tools/deployment/presto-clp/worker.env (1 hunks)
  • tools/deployment/presto-clp/worker/config-template/clp.properties (1 hunks)
  • tools/deployment/presto-clp/worker/config-template/config.properties (1 hunks)
  • tools/deployment/presto-clp/worker/config-template/node.properties (1 hunks)
  • tools/deployment/presto-clp/worker/config-template/velox.properties (1 hunks)
  • tools/deployment/presto-clp/worker/scripts/generate-configs.sh (1 hunks)
🧰 Additional context used
🧠 Learnings (11)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: quinntaylormitchell
PR: y-scope/clp#968
File: docs/src/user-guide/quick-start/overview.md:73-109
Timestamp: 2025-06-18T20:39:05.899Z
Learning: The CLP project team prefers to use video content to demonstrate detailed procedural steps (like tarball extraction) rather than including every step in the written documentation, keeping the docs focused on conceptual guidance.
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-aarch64/build.sh:3-5
Timestamp: 2025-07-07T17:43:04.349Z
Learning: In CLP project build scripts (specifically build.sh files in docker-images directories), maintain consistency with the established pattern of using separate `set -eu` and `set -o pipefail` commands rather than combining them into `set -euo pipefail`, to ensure uniform script structure across all platform build scripts.
tools/deployment/presto-clp/scripts/.gitignore (3)

Learnt from: Bill-hbrhbr
PR: #1126
File: .gitignore:5-5
Timestamp: 2025-07-25T21:29:48.947Z
Learning: In the CLP project, the .clang-format file is maintained in the yscope-dev-utils submodule and copied over to the main CLP repository, so it should be ignored in .gitignore to prevent accidental commits of the copied file and maintain the single source of truth in the submodule.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly declarations.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly declarations.

tools/deployment/presto-clp/worker/config-template/clp.properties (1)

Learnt from: gibber9809
PR: #955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.

tools/deployment/presto-clp/coordinator/config-template/metadata-filter.json (4)

Learnt from: junhaoliao
PR: #596
File: components/log-viewer-webui/client/src/api/query.js:16-23
Timestamp: 2024-11-21T15:51:33.203Z
Learning: In components/log-viewer-webui/client/src/api/query.js, the ExtractJsonResp type definition is accurate as-is and does not require modification. When suggesting changes to type definitions, ensure they align with the server-side definitions, referencing the source code if necessary.

Learnt from: LinZhihao-723
PR: #873
File: components/core/src/clp/ffi/ir_stream/search/test/utils.cpp:78-84
Timestamp: 2025-05-02T23:49:30.798Z
Learning: The function trivial_new_projected_schema_tree_node_callback in the KV-IR codebase is documented in the header file to explain its purpose for testing, which addresses static analyzer warnings about unused functions.

Learnt from: LinZhihao-723
PR: #873
File: components/core/src/clp/ffi/ir_stream/search/test/utils.cpp:78-84
Timestamp: 2025-05-02T23:49:30.798Z
Learning: The function trivial_new_projected_schema_tree_node_callback in the KV-IR codebase is properly documented in the header file to indicate its purpose for testing, which addresses static analyzer warnings about unused functions.

Learnt from: LinZhihao-723
PR: #882
File: components/core/src/clp/ffi/ir_stream/search/QueryHandlerImpl.cpp:378-402
Timestamp: 2025-05-07T16:56:35.687Z
Learning: In the CLP search component, the evaluate_wildcard_filter function should return AstEvaluationResult::Pruned when node_id_value_pairs is empty, not AstEvaluationResult::False. Empty node sets should be treated as "undetermined" rather than definitive non-matches.

tools/deployment/presto-clp/coordinator/config-template/clp.properties (1)

Learnt from: gibber9809
PR: #955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.

tools/deployment/presto-clp/scripts/set-up-config.sh (12)

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly declarations.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly declarations.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-aarch64/build.sh:3-5
Timestamp: 2025-07-07T17:43:04.349Z
Learning: In CLP project build scripts (specifically build.sh files in docker-images directories), maintain consistency with the established pattern of using separate set -eu and set -o pipefail commands rather than combining them into set -euo pipefail, to ensure uniform script structure across all platform build scripts.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in install-prebuilt-packages.sh, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like --strip-components=1 to handle potential tarball layout changes.

Learnt from: kirkrodrigues
PR: #881
File: components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh:35-41
Timestamp: 2025-05-06T09:48:55.408Z
Learning: For installation scripts in the CLP project, prefer explicit error handling over automatic dependency resolution (like apt-get install -f) when installing packages to give users more control over their system.

Learnt from: haiqi96
PR: #594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in components/clp-package-utils/clp_package_utils/scripts/, note that it's preferred to keep error handling simple without adding extra complexity.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install or apk update && apk add), preserve this pattern for uniform script structure.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install or apk update && apk add), preserve this pattern for uniform script structure.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install or apk update && apk add), preserve this pattern for uniform script structure.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across different platforms (e.g., using separate update and install commands like apk update && apk add, apt update && apt install, yum update && yum install) rather than platform-specific optimizations, to ensure uniform script structure and readability.

Learnt from: kirkrodrigues
PR: #881
File: components/core/tools/scripts/lib_install/macos/install-all.sh:11-12
Timestamp: 2025-05-06T10:07:04.654Z
Learning: In CLP installation scripts, temporary directories with downloaded files should not be automatically cleaned up on failure (e.g., with EXIT traps) to preserve artifacts for debugging purposes.

Learnt from: quinntaylormitchell
PR: #968
File: docs/src/user-guide/quick-start/overview.md:53-54
Timestamp: 2025-06-18T20:48:48.990Z
Learning: CLP is designed to run on Linux systems where Python is typically pre-installed, so Python installation links are generally not needed in CLP documentation.

tools/deployment/presto-clp/README.md (14)

Learnt from: quinntaylormitchell
PR: #968
File: docs/src/user-guide/quick-start/overview.md:73-109
Timestamp: 2025-06-18T20:39:05.899Z
Learning: The CLP project team prefers to use video content to demonstrate detailed procedural steps (like tarball extraction) rather than including every step in the written documentation, keeping the docs focused on conceptual guidance.

Learnt from: Bill-hbrhbr
PR: #1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.

Learnt from: quinntaylormitchell
PR: #1069
File: docs/src/user-guide/quick-start/clp-json.md:135-152
Timestamp: 2025-07-05T03:38:16.779Z
Learning: The CLP project team has decided to refrain from using include directives in their documentation at present, preferring to maintain duplicated content rather than using shared includes or partials for de-duplication.

Learnt from: quinntaylormitchell
PR: #961
File: docs/src/dev-guide/design-clp-structured/single-file-archive-format.md:216-219
Timestamp: 2025-06-18T14:35:20.485Z
Learning: In clp-s documentation, technical abbreviations like "MPT" (Merged Parse Tree) should be defined at first use to improve reader clarity and comprehension.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-aarch64/build.sh:3-5
Timestamp: 2025-07-07T17:43:04.349Z
Learning: In CLP project build scripts (specifically build.sh files in docker-images directories), maintain consistency with the established pattern of using separate set -eu and set -o pipefail commands rather than combining them into set -euo pipefail, to ensure uniform script structure across all platform build scripts.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly declarations.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly declarations.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-x86/build.sh:18-24
Timestamp: 2025-07-01T14:52:02.418Z
Learning: In the CLP project, consistency across platform build scripts is prioritized over defensive programming when it comes to git remote handling. All build.sh files in docker-images directories should follow the same pattern for git metadata injection.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-x86/build.sh:18-24
Timestamp: 2025-07-01T14:52:02.418Z
Learning: In the CLP project, consistency across platform build scripts is prioritized over defensive programming when it comes to git remote handling. All build.sh files in docker-images directories should follow the same pattern for git metadata injection.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install or apk update && apk add), preserve this pattern for uniform script structure.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install or apk update && apk add), preserve this pattern for uniform script structure.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install or apk update && apk add), preserve this pattern for uniform script structure.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in install-prebuilt-packages.sh, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like --strip-components=1 to handle potential tarball layout changes.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across different platforms (e.g., using separate update and install commands like apk update && apk add, apt update && apt install, yum update && yum install) rather than platform-specific optimizations, to ensure uniform script structure and readability.

taskfiles/lint.yaml (1)

Learnt from: anlowee
PR: #925
File: taskfiles/deps/main.yaml:97-106
Timestamp: 2025-05-28T18:33:30.155Z
Learning: In the taskfiles dependency system (taskfiles/deps/main.yaml), echo commands are used to generate .cmake settings files that are consumed by the main CMake build process. These files set variables like ANTLR_RUNTIME_HEADER to point to dependency locations for use during compilation.

tools/deployment/presto-clp/worker/scripts/generate-configs.sh (8)

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly declarations.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly declarations.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in install-prebuilt-packages.sh, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like --strip-components=1 to handle potential tarball layout changes.

Learnt from: haiqi96
PR: #594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in components/clp-package-utils/clp_package_utils/scripts/, note that it's preferred to keep error handling simple without adding extra complexity.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-x86/build.sh:18-24
Timestamp: 2025-07-01T14:52:02.418Z
Learning: In the CLP project, consistency across platform build scripts is prioritized over defensive programming when it comes to git remote handling. All build.sh files in docker-images directories should follow the same pattern for git metadata injection.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-x86/build.sh:18-24
Timestamp: 2025-07-01T14:52:02.418Z
Learning: In the CLP project, consistency across platform build scripts is prioritized over defensive programming when it comes to git remote handling. All build.sh files in docker-images directories should follow the same pattern for git metadata injection.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-aarch64/build.sh:3-5
Timestamp: 2025-07-07T17:43:04.349Z
Learning: In CLP project build scripts (specifically build.sh files in docker-images directories), maintain consistency with the established pattern of using separate set -eu and set -o pipefail commands rather than combining them into set -euo pipefail, to ensure uniform script structure across all platform build scripts.

Learnt from: haiqi96
PR: #594
File: components/package-template/src/sbin/del-archives.sh:6-9
Timestamp: 2024-11-15T16:28:08.644Z
Learning: For wrapper scripts in the components/package-template/src/sbin/ directory, keep them simple and avoid adding additional validation code.

tools/deployment/presto-clp/scripts/generate-user-env-vars-file.py (2)

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly declarations.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly declarations.

tools/deployment/presto-clp/coordinator/scripts/generate-configs.sh (13)

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly declarations.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly declarations.

Learnt from: haiqi96
PR: #594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in components/clp-package-utils/clp_package_utils/scripts/, note that it's preferred to keep error handling simple without adding extra complexity.

Learnt from: gibber9809
PR: #955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in install-prebuilt-packages.sh, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like --strip-components=1 to handle potential tarball layout changes.

Learnt from: haiqi96
PR: #594
File: components/package-template/src/sbin/del-archives.sh:6-9
Timestamp: 2024-11-15T16:28:08.644Z
Learning: For wrapper scripts in the components/package-template/src/sbin/ directory, keep them simple and avoid adding additional validation code.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-aarch64/build.sh:3-5
Timestamp: 2025-07-07T17:43:04.349Z
Learning: In CLP project build scripts (specifically build.sh files in docker-images directories), maintain consistency with the established pattern of using separate set -eu and set -o pipefail commands rather than combining them into set -euo pipefail, to ensure uniform script structure across all platform build scripts.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-x86/build.sh:18-24
Timestamp: 2025-07-01T14:52:02.418Z
Learning: In the CLP project, consistency across platform build scripts is prioritized over defensive programming when it comes to git remote handling. All build.sh files in docker-images directories should follow the same pattern for git metadata injection.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-x86/build.sh:18-24
Timestamp: 2025-07-01T14:52:02.418Z
Learning: In the CLP project, consistency across platform build scripts is prioritized over defensive programming when it comes to git remote handling. All build.sh files in docker-images directories should follow the same pattern for git metadata injection.

Learnt from: kirkrodrigues
PR: #881
File: components/core/tools/scripts/lib_install/macos/install-all.sh:11-12
Timestamp: 2025-05-06T10:07:04.654Z
Learning: In CLP installation scripts, temporary directories with downloaded files should not be automatically cleaned up on failure (e.g., with EXIT traps) to preserve artifacts for debugging purposes.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install or apk update && apk add), preserve this pattern for uniform script structure.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install or apk update && apk add), preserve this pattern for uniform script structure.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like apt-get update && apt-get install or apk update && apk add), preserve this pattern for uniform script structure.

🪛 Shellcheck (0.10.0)
tools/deployment/presto-clp/scripts/set-up-config.sh

[info] 20-20: Not following: ./.venv/bin/activate was not specified as input (see shellcheck -x).

(SC1091)

tools/deployment/presto-clp/worker/scripts/generate-configs.sh

[info] 10-10: Double quote to prevent globbing and word splitting.

(SC2086)


[warning] 81-81: In POSIX sh, HOSTNAME is undefined.

(SC3028)


[warning] 82-82: In POSIX sh, HOSTNAME is undefined.

(SC3028)

tools/deployment/presto-clp/coordinator/scripts/generate-configs.sh

[info] 10-10: Double quote to prevent globbing and word splitting.

(SC2086)

🪛 markdownlint-cli2 (0.17.2)
tools/deployment/presto-clp/README.md

7-7: Multiple top-level headings in the same document

(MD025, single-title, single-h1)


20-20: Ordered list item prefix
Expected: 1; Actual: 5; Style: 1/1/1

(MD029, ol-prefix)


27-27: Ordered list item prefix
Expected: 1; Actual: 6; Style: 1/1/1

(MD029, ol-prefix)


33-33: Multiple top-level headings in the same document

(MD025, single-title, single-h1)


42-42: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


42-42: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


46-46: Multiple top-level headings in the same document

(MD025, single-title, single-h1)


46-46: Trailing punctuation in heading
Punctuation: ':'

(MD026, no-trailing-punctuation)


59-59: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


63-63: Multiple top-level headings in the same document

(MD025, single-title, single-h1)


69-69: Multiple consecutive blank lines
Expected: 1; Actual: 2

(MD012, no-multiple-blanks)


70-70: Multiple consecutive blank lines
Expected: 1; Actual: 3

(MD012, no-multiple-blanks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: build-macos (macos-14, true)
  • GitHub Check: build-macos (macos-13, false)
  • GitHub Check: build-macos (macos-13, true)
🔇 Additional comments (10)
tools/deployment/presto-clp/scripts/.gitignore (1)

1-1: Pattern will not match the intended directory

Because this .gitignore resides two levels below the repo root, the leading “/” anchors the pattern at the repository root, so tools/deployment/presto-clp/scripts/.venv/ will still be tracked. Drop the leading slash:

-/.venv/
+.venv/

to correctly ignore the virtual-env folder created alongside these scripts.

⛔ Skipped due to learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1126
File: .gitignore:5-5
Timestamp: 2025-07-25T21:29:48.947Z
Learning: In the CLP project, the .clang-format file is maintained in the yscope-dev-utils submodule and copied over to the main CLP repository, so it should be ignored in .gitignore to prevent accidental commits of the copied file and maintain the single source of truth in the submodule.
tools/deployment/presto-clp/worker/config-template/clp.properties (1)

1-1: LGTM – property is correct

connector.name=clp matches the coordinator side and Presto expectations.

tools/deployment/presto-clp/worker/config-template/velox.properties (1)

1-1: Confirm that mutable-config=true is intended for production clusters

Turning on dynamic config reloading can be handy during local experiments, but it also lets any user with the right privilege change Velox settings at runtime. If this worker image is used beyond a demo, double-check that:

  1. The feature is really needed.
  2. Access is restricted (e.g., via web-ui.authentication).

Otherwise consider omitting the flag or gating it behind an env var.

taskfiles/lint.yaml (1)

106-108: YAML directory inclusion looks fine – confirm yamllint rules cover Docker-Compose syntax.

Adding tools/deployment to the yamllint target broadens coverage, which is good. The directory now contains docker-compose.yaml; some yamllint default rules (e.g., truthy, reserved-words) frequently complain about Compose keywords. Double-check that your .yamllint.yml either disables those rules for Compose files or uses a yamllint extends: docker profile to avoid false positives.

tools/deployment/presto-clp/coordinator/config-template/config.properties (1)

4-5: Environment variables verified in coordinator.env

The placeholders PRESTO_COORDINATOR_CONFIG_CONFIGPROPERTIES_QUERY_MAXMEMORY and PRESTO_COORDINATOR_CONFIG_CONFIGPROPERTIES_QUERY_MAXMEMORYPERNODE are defined in tools/deployment/presto-clp/coordinator.env (lines 6–7). No further action required.

tools/deployment/presto-clp/docker-compose.yaml (1)

32-33: Ensure ${CLP_PACKAGE_ARCHIVES} is an absolute host path

This bind-mount maps the host path to the same path inside the container. If the variable is unset or relative, docker compose will error at runtime. Validate or normalise it when generating .env, or mount it to a fixed target path inside the container.

tools/deployment/presto-clp/worker/scripts/generate-configs.sh (1)

18-66: Well-implemented coordination logic

The retry mechanism for coordinator readiness and version detection is well-structured with appropriate error handling and busybox-compatible commands.

tools/deployment/presto-clp/scripts/generate-user-env-vars-file.py (3)

9-23: Well-structured logging configuration

The logging setup follows Python best practices with proper formatting and console handling.


79-96: Clean environment file generation

The environment variable generation is well-structured with consistent formatting and includes all necessary configuration parameters.


99-117: Well-implemented configuration value extraction

The helper function properly handles nested dictionary traversal with dot notation and includes comprehensive documentation.

Comment on lines +776 to 777
- "tools/deployment"
- "tools/scripts"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Black/Ruff may recurse through large non-code trees under tools/deployment.

Running black . from tools/deployment will walk every sub-directory (configs, templates, logs). That can add noticeable time and, in worst cases, choke on extremely large data files a user may drop under that tree. Consider narrowing the path to tools/deployment/**/*.py or adding an --extend-exclude for obvious non-code directories (e.g., config-template, scripts/generated).

🤖 Prompt for AI Agents
In taskfiles/lint.yaml around lines 776 to 777, the current lint paths include
"tools/deployment" which causes Black and Ruff to recursively process large
non-code directories, slowing down linting and potentially causing errors.
Modify the lint paths to only include Python files under "tools/deployment" by
changing the path to "tools/deployment/**/*.py" or add an --extend-exclude
option to exclude known non-code directories like "configs" or "templates" to
prevent unnecessary recursion.

Comment on lines 2 to 9
clp.metadata-provider-type=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_PROVIDERTYPE}
clp.metadata-db-url=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_URL}
clp.metadata-db-name=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_NAME}
clp.metadata-db-user=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_USER}
clp.metadata-db-password=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_PASSWORD}
clp.metadata-table-prefix=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_TABLEPREFIX}
clp.split-provider-type=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_SPLITPROVIDER}
clp.metadata-filter-config=/opt/presto-server/etc/metadata-filter.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Several placeholders have no matching definitions – coordinator fails to start.

coordinator.env currently defines only METADATA_PROVIDERTYPE and SPLITPROVIDER.
The following variables are missing and will survive as literal ${…} after envsubst, causing Presto to error at startup:

  • PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_URL
  • …_DATABASE_NAME
  • …_DATABASE_USER
  • …_DATABASE_PASSWORD
  • …_METADATA_TABLEPREFIX

Add them to coordinator.env (or remove the placeholders) before merging.

🤖 Prompt for AI Agents
In tools/deployment/presto-clp/coordinator/config-template/clp.properties lines
2 to 9, several environment variable placeholders used for database
configuration do not have corresponding definitions in coordinator.env, causing
the coordinator to fail at startup. To fix this, add the missing environment
variables PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_URL,
PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_NAME,
PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_USER,
PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_PASSWORD, and
PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_TABLEPREFIX to the
coordinator.env file with appropriate values, or alternatively remove these
placeholders from clp.properties if they are not needed.

@@ -0,0 +1 @@
com.facebook.presto=${PRESTO_COORDINATOR_CONFIG_LOGPROPERTIES_LEVEL}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Avoid quoting the log level value in ${…}

Docker .env files often wrap string values in quotes ("DEBUG"). When the template engine substitutes the value here, the resulting line becomes:

com.facebook.presto="DEBUG"

The extra quotes are not a valid log level and Presto will fall back to the default (INFO). Either:

-com.facebook.presto=${PRESTO_COORDINATOR_CONFIG_LOGPROPERTIES_LEVEL}
+com.facebook.presto=${PRESTO_COORDINATOR_CONFIG_LOGPROPERTIES_LEVEL:-INFO}

and remove surrounding quotes in the .env file, or strip them inside the script that performs substitution.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
com.facebook.presto=${PRESTO_COORDINATOR_CONFIG_LOGPROPERTIES_LEVEL}
com.facebook.presto=${PRESTO_COORDINATOR_CONFIG_LOGPROPERTIES_LEVEL:-INFO}
🤖 Prompt for AI Agents
In tools/deployment/presto-clp/coordinator/config-template/log.properties at
line 1, the log level value substituted from the environment variable may
include quotes, causing invalid log level syntax. To fix this, ensure that the
substitution process strips any surrounding quotes from the environment variable
value before inserting it into the config file, or update the .env file to
provide the log level without quotes.

Comment on lines +13 to +14
clp_package_dir=$1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Quote positional parameter when capturing it

Keeps paths with spaces intact.

-clp_package_dir=$1
+clp_package_dir="$1"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
clp_package_dir=$1
clp_package_dir="$1"
🤖 Prompt for AI Agents
In tools/deployment/presto-clp/scripts/set-up-config.sh at line 13, the
positional parameter $1 is assigned to clp_package_dir without quotes, which can
break paths containing spaces. Fix this by enclosing $1 in double quotes when
assigning it to clp_package_dir to preserve any spaces in the path.

Comment on lines 1 to 4
PRESTO_WORKER_HTTPPORT="8080"

# node.properties
PRESTO_WORKER_CONFIG_NODEPROPERTIES_LOCATION="worker-location"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Rename variable for readability & future-proofing

Using PRESTO_WORKER_HTTPPORT without a separator makes the variable name harder to scan and deviates from the common *_HTTP_PORT pattern used elsewhere in the repo (e.g., PRESTO_COORDINATOR_HTTP_PORT). Consider renaming:

-PRESTO_WORKER_HTTPPORT="8080"
+PRESTO_WORKER_HTTP_PORT="8080"

and update every reference in the generation scripts accordingly.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
PRESTO_WORKER_HTTPPORT="8080"
# node.properties
PRESTO_WORKER_CONFIG_NODEPROPERTIES_LOCATION="worker-location"
PRESTO_WORKER_HTTP_PORT="8080"
# node.properties
PRESTO_WORKER_CONFIG_NODEPROPERTIES_LOCATION="worker-location"
🤖 Prompt for AI Agents
In tools/deployment/presto-clp/worker.env lines 1 to 4, rename the environment
variable PRESTO_WORKER_HTTPPORT to PRESTO_WORKER_HTTP_PORT to improve
readability and maintain consistency with other variables like
PRESTO_COORDINATOR_HTTP_PORT. After renaming, update all references to this
variable in the generation scripts to match the new name.

@@ -0,0 +1,6 @@
discovery.uri=http://${PRESTO_COORDINATOR_SERVICENAME}:${PRESTO_COORDINATOR_HTTPPORT}
presto.version=REPLACE_ME
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Prefer an env placeholder over hard-coded REPLACE_ME for Presto version.

Replacing REPLACE_ME via sed in the bootstrap script works, but using an env var keeps the template declarative and avoids sed-ordering bugs:

-presto.version=REPLACE_ME
+presto.version=${PRESTO_VERSION}

The script can then simply export PRESTO_VERSION=$(curl …) and rely on envsubst.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
presto.version=REPLACE_ME
presto.version=${PRESTO_VERSION}
🤖 Prompt for AI Agents
In tools/deployment/presto-clp/worker/config-template/config.properties at line
2, replace the hard-coded "REPLACE_ME" for presto.version with an environment
variable placeholder like "${PRESTO_VERSION}". This change allows the bootstrap
script to export PRESTO_VERSION dynamically and use envsubst for substitution,
making the template declarative and avoiding sed ordering issues.

Comment on lines 1 to 4
node.environment=${PRESTO_COORDINATOR_CONFIG_NODEPROPERTIES_ENVIRONMENT}
node.internal-address=REPLACE_ME
node.location=${PRESTO_WORKER_CONFIG_NODEPROPERTIES_LOCATION}
node.id=REPLACE_ME
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Environment variable name may be misleading for the worker template.

node.environment expands ${PRESTO_COORDINATOR_CONFIG_NODEPROPERTIES_ENVIRONMENT}, which is defined in the coordinator env file. That couples the worker to the coordinator implementation detail and will break if someone supplies only worker.env.
Rename to a worker-scoped variable or alias it in worker.env to avoid hidden dependency.

-node.environment=${PRESTO_COORDINATOR_CONFIG_NODEPROPERTIES_ENVIRONMENT}
+node.environment=${PRESTO_WORKER_CONFIG_NODEPROPERTIES_ENVIRONMENT}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
node.environment=${PRESTO_COORDINATOR_CONFIG_NODEPROPERTIES_ENVIRONMENT}
node.internal-address=REPLACE_ME
node.location=${PRESTO_WORKER_CONFIG_NODEPROPERTIES_LOCATION}
node.id=REPLACE_ME
node.environment=${PRESTO_WORKER_CONFIG_NODEPROPERTIES_ENVIRONMENT}
node.internal-address=REPLACE_ME
node.location=${PRESTO_WORKER_CONFIG_NODEPROPERTIES_LOCATION}
node.id=REPLACE_ME
🤖 Prompt for AI Agents
In tools/deployment/presto-clp/worker/config-template/node.properties lines 1 to
4, the environment variable used for node.environment is referencing a
coordinator-specific variable, creating an unintended dependency. Change the
variable to a worker-specific environment variable by renaming it to something
like PRESTO_WORKER_CONFIG_NODEPROPERTIES_ENVIRONMENT or create an alias in the
worker.env file that maps to the coordinator variable to avoid coupling and
ensure the worker config is self-contained.

database_password = _get_config_value(credentials, "database.password")
if not database_user or not database_password:
logger.error(
"database.user and database.password must be specified in '%s'.", credentials_file_path
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we tell user that to launch clp-package first? Because that will generate the credentials.

Comment on lines +95 to +107
If you didn't specify a dataset when compressing your logs in CLP, your logs will have been stored
in the `default` dataset. To query the logs in this dataset:

```sql
SELECT * FROM default LIMIT 1;
```

All kv-pairs in each log event can be queried directly using dot-notation. For example, if your logs
contain the field `foo.bar`, you can query it using:

```sql
SELECT foo.bar FROM default LIMIT 1;
```
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we also mention that this only works when the user didn't setup any "required" metadata filters

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. That should probably go in the section about configuring the metadata filter.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
docs/src/user-guide/guides-using-presto.md (2)

60-64: Fix grammar in scaling instruction

Minor wording issue: missing “one”.

-    * To use more than Presto worker, you can use the `--scale` option as follows:
+    * To use more than one Presto worker, use the `--scale` option:

34-36: Rephrase to improve sentence flow and avoid repetition

This mirrors the earlier suggestion that hasn’t been applied.

-Follow the [quick-start](./quick-start/index.md) guide to set up CLP and compress your logs. A
-sample dataset that works well with Presto is [postgresql].
+Follow the [quick-start](./quick-start/index.md) guide to set up CLP and compress your logs. The [postgresql] dataset serves as an excellent sample for testing with Presto.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9d2146b and ff06c62.

📒 Files selected for processing (1)
  • docs/src/user-guide/guides-using-presto.md (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: quinntaylormitchell
PR: y-scope/clp#968
File: docs/src/user-guide/quick-start/overview.md:73-109
Timestamp: 2025-06-18T20:39:05.899Z
Learning: The CLP project team prefers to use video content to demonstrate detailed procedural steps (like tarball extraction) rather than including every step in the written documentation, keeping the docs focused on conceptual guidance.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
docs/src/user-guide/guides-using-presto.md (4)

Learnt from: quinntaylormitchell
PR: #968
File: docs/src/user-guide/quick-start/overview.md:73-109
Timestamp: 2025-06-18T20:39:05.899Z
Learning: The CLP project team prefers to use video content to demonstrate detailed procedural steps (like tarball extraction) rather than including every step in the written documentation, keeping the docs focused on conceptual guidance.

Learnt from: quinntaylormitchell
PR: #1069
File: docs/src/user-guide/quick-start/clp-json.md:135-152
Timestamp: 2025-07-05T03:38:16.779Z
Learning: The CLP project team has decided to refrain from using include directives in their documentation at present, preferring to maintain duplicated content rather than using shared includes or partials for de-duplication.

Learnt from: quinntaylormitchell
PR: #961
File: docs/src/dev-guide/design-clp-structured/single-file-archive-format.md:216-219
Timestamp: 2025-06-18T14:35:20.485Z
Learning: In clp-s documentation, technical abbreviations like "MPT" (Merged Parse Tree) should be defined at first use to improve reader clarity and comprehension.

Learnt from: gibber9809
PR: #955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.

🪛 LanguageTool
docs/src/user-guide/guides-using-presto.md

[style] ~32-~32: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...'s metadata database and archives. ### Setting up CLP Follow the [quick-start](./quic...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: build-macos (macos-13, false)
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: lint-check (macos-latest)

@kirkrodrigues kirkrodrigues changed the title feat(presto-clp): Add Docker compose set up for Presto cluster that can connect to clp-json. feat(presto-clp): Add Docker compose setup for Presto cluster that can connect to clp-json. Jul 28, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ff06c62 and cf11694.

📒 Files selected for processing (1)
  • docs/src/user-guide/guides-using-presto.md (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: quinntaylormitchell
PR: y-scope/clp#968
File: docs/src/user-guide/quick-start/overview.md:73-109
Timestamp: 2025-06-18T20:39:05.899Z
Learning: The CLP project team prefers to use video content to demonstrate detailed procedural steps (like tarball extraction) rather than including every step in the written documentation, keeping the docs focused on conceptual guidance.
docs/src/user-guide/guides-using-presto.md (5)

Learnt from: quinntaylormitchell
PR: #968
File: docs/src/user-guide/quick-start/overview.md:73-109
Timestamp: 2025-06-18T20:39:05.899Z
Learning: The CLP project team prefers to use video content to demonstrate detailed procedural steps (like tarball extraction) rather than including every step in the written documentation, keeping the docs focused on conceptual guidance.

Learnt from: quinntaylormitchell
PR: #1069
File: docs/src/user-guide/quick-start/clp-json.md:135-152
Timestamp: 2025-07-05T03:38:16.779Z
Learning: The CLP project team has decided to refrain from using include directives in their documentation at present, preferring to maintain duplicated content rather than using shared includes or partials for de-duplication.

Learnt from: quinntaylormitchell
PR: #961
File: docs/src/dev-guide/design-clp-structured/single-file-archive-format.md:216-219
Timestamp: 2025-06-18T14:35:20.485Z
Learning: In clp-s documentation, technical abbreviations like "MPT" (Merged Parse Tree) should be defined at first use to improve reader clarity and comprehension.

Learnt from: gibber9809
PR: #955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.

Learnt from: SharafMohamed
PR: #1033
File: components/core/config/schemas.txt:96-97
Timestamp: 2025-07-17T16:02:57.688Z
Learning: In log-surgeon, the . character is treated as a literal non-delimiter character rather than a regex wildcard that matches any character. This means patterns like .* in log-surgeon schemas match sequences of non-delimiter characters, not arbitrary characters like in standard regex.

🪛 LanguageTool
docs/src/user-guide/guides-using-presto.md

[style] ~32-~32: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...'s metadata database and archives. ### Setting up CLP Follow the [quick-start](./quic...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[uncategorized] ~77-~77: Although a hyphen is possible, it is not necessary in a compound modifier in which the first word is an adverb that ends in ‘ly’.
Context: ...wing: * A catalog name * A fully-qualified schema name * A fully-qualified t...

(HYPHENATED_LY_ADVERB_ADJECTIVE)


[uncategorized] ~78-~78: Although a hyphen is possible, it is not necessary in a compound modifier in which the first word is an adverb that ends in ‘ly’.
Context: ...A fully-qualified schema name * A fully-qualified table name Filter configs under ...

(HYPHENATED_LY_ADVERB_ADJECTIVE)

🪛 markdownlint-cli2 (0.17.2)
docs/src/user-guide/guides-using-presto.md

54-54: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: build-macos (macos-15, false)

Comment on lines +111 to +113
* To use more than Presto worker, you can use the `--scale` option as follows:

```bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Grammar: insert missing word in scaling instruction

-    * To use more than Presto worker, you can use the `--scale` option as follows:
+    * To use more than one Presto worker, you can use the `--scale` option as follows:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
* To use more than Presto worker, you can use the `--scale` option as follows:
```bash
* To use more than one Presto worker, you can use the `--scale` option as follows:
🤖 Prompt for AI Agents
In docs/src/user-guide/guides-using-presto.md around lines 111 to 113, the
scaling instruction is missing a word for correct grammar. Insert the missing
word "one" after "more than" in the sentence to read "To use more than one
Presto worker, you can use the `--scale` option as follows:" to fix the grammar.

@anlowee anlowee requested a review from kirkrodrigues July 28, 2025 15:10
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
tools/deployment/presto-clp/scripts/generate-user-env-vars-file.py (4)

101-107: Add early return after storage type validation error.

The function logs an error when the storage type is not 'fs' but continues execution instead of returning early with failure status.

Apply this fix:

 if "fs" != clp_archive_output_storage_type:
     logger.error(
         "Expected CLP's archive_output.storage.type to be fs but found '%s'. Presto"
         " currently only supports reading archives from the fs storage type.",
         clp_archive_output_storage_type,
     )
+    return False

124-130: Improve credential validation to handle whitespace.

The current validation only checks if values are truthy, which could miss empty strings after whitespace stripping.

Consider more robust validation:

-    database_user = _get_config_value(credentials, "database.user")
-    database_password = _get_config_value(credentials, "database.password")
-    if not database_user or not database_password:
+    database_user = _get_config_value(credentials, "database.user", "").strip()  
+    database_password = _get_config_value(credentials, "database.password", "").strip()
+    if not database_user or not database_password:

152-158: Enhance KeyError handling with clearer context.

The current error handling doesn't clearly indicate which specific key is missing from the file.

Make the error message more informative:

     except KeyError as e:
         logger.error(
-            "Missing required key '%s' in '%s'",
-            e,
+            "Missing required key %s in '%s'",
+            str(e),
             coordinator_common_env_file_path,
         )

10-23: Consider simplifying logging configuration.

The current manual logging setup could be more maintainable using a structured approach.

Consider using basicConfig for simpler setup:

-# Set up console logging
-logging_console_handler = logging.StreamHandler()
-logging_formatter = logging.Formatter(
-    "%(asctime)s.%(msecs)03d %(levelname)s [%(module)s] %(message)s", datefmt="%Y-%m-%dT%H:%M:%S"
-)
-logging_console_handler.setFormatter(logging_formatter)
-
-# Set up root logger
-root_logger = logging.getLogger()
-root_logger.setLevel(logging.INFO)
-root_logger.addHandler(logging_console_handler)
-
-# Create logger
-logger = logging.getLogger(__name__)
+# Set up logging
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s.%(msecs)03d %(levelname)s [%(module)s] %(message)s",
+    datefmt="%Y-%m-%dT%H:%M:%S"
+)
+logger = logging.getLogger(__name__)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cf11694 and 4d896e5.

📒 Files selected for processing (2)
  • docs/src/user-guide/guides-using-presto.md (1 hunks)
  • tools/deployment/presto-clp/scripts/generate-user-env-vars-file.py (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: quinntaylormitchell
PR: y-scope/clp#968
File: docs/src/user-guide/quick-start/overview.md:73-109
Timestamp: 2025-06-18T20:39:05.899Z
Learning: The CLP project team prefers to use video content to demonstrate detailed procedural steps (like tarball extraction) rather than including every step in the written documentation, keeping the docs focused on conceptual guidance.
tools/deployment/presto-clp/scripts/generate-user-env-vars-file.py (7)

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly declarations.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within components/core/tools/scripts/lib_install/, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like readonly declarations.

Learnt from: haiqi96
PR: #594
File: components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py:104-110
Timestamp: 2024-11-15T16:21:52.122Z
Learning: In clp_package_utils/scripts/native/del_archives.py, when deleting archives, the archive variable retrieved from the database is controlled and is always a single string without path components. Therefore, it's acceptable to skip additional validation checks for directory traversal in this context.

Learnt from: haiqi96
PR: #634
File: components/job-orchestration/job_orchestration/executor/compress/fs_compression_task.py:0-0
Timestamp: 2024-12-16T21:49:06.483Z
Learning: In fs_compression_task.py, when refactoring the archive processing loop in Python, the src_archive_file.unlink() operation should remain outside of the process_archive function.

Learnt from: haiqi96
PR: #594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in components/clp-package-utils/clp_package_utils/scripts/, note that it's preferred to keep error handling simple without adding extra complexity.

Learnt from: jackluo923
PR: #1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in install-prebuilt-packages.sh, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like --strip-components=1 to handle potential tarball layout changes.

Learnt from: kirkrodrigues
PR: #881
File: components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh:35-41
Timestamp: 2025-05-06T09:48:55.408Z
Learning: For installation scripts in the CLP project, prefer explicit error handling over automatic dependency resolution (like apt-get install -f) when installing packages to give users more control over their system.

docs/src/user-guide/guides-using-presto.md (5)

Learnt from: quinntaylormitchell
PR: #968
File: docs/src/user-guide/quick-start/overview.md:73-109
Timestamp: 2025-06-18T20:39:05.899Z
Learning: The CLP project team prefers to use video content to demonstrate detailed procedural steps (like tarball extraction) rather than including every step in the written documentation, keeping the docs focused on conceptual guidance.

Learnt from: quinntaylormitchell
PR: #1069
File: docs/src/user-guide/quick-start/clp-json.md:135-152
Timestamp: 2025-07-05T03:38:16.779Z
Learning: The CLP project team has decided to refrain from using include directives in their documentation at present, preferring to maintain duplicated content rather than using shared includes or partials for de-duplication.

Learnt from: quinntaylormitchell
PR: #961
File: docs/src/dev-guide/design-clp-structured/single-file-archive-format.md:216-219
Timestamp: 2025-06-18T14:35:20.485Z
Learning: In clp-s documentation, technical abbreviations like "MPT" (Merged Parse Tree) should be defined at first use to improve reader clarity and comprehension.

Learnt from: gibber9809
PR: #955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.

Learnt from: SharafMohamed
PR: #1033
File: components/core/config/schemas.txt:96-97
Timestamp: 2025-07-17T16:02:57.688Z
Learning: In log-surgeon, the . character is treated as a literal non-delimiter character rather than a regex wildcard that matches any character. This means patterns like .* in log-surgeon schemas match sequences of non-delimiter characters, not arbitrary characters like in standard regex.

🪛 LanguageTool
docs/src/user-guide/guides-using-presto.md

[style] ~32-~32: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...'s metadata database and archives. ### Setting up CLP Follow the [quick-start](./quic...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[typographical] ~58-~58: It appears that a comma is missing.
Context: ...e/metadata-filter.json`. * For each dataset you want to query, add a filter config ...

(DURING_THAT_TIME_COMMA)


[style] ~75-~75: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...aset>` with the name of the dataset you want to query. (If you didn't specify a ...

(REP_WANT_TO_VB)

🪛 markdownlint-cli2 (0.17.2)
docs/src/user-guide/guides-using-presto.md

74-74: Trailing spaces
Expected: 0 or 2; Actual: 6

(MD009, no-trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build-macos (macos-15, false)
🔇 Additional comments (5)
tools/deployment/presto-clp/scripts/generate-user-env-vars-file.py (2)

26-56: LGTM! Clean main function structure.

The main function follows a clear structure with proper argument parsing, error handling, and file operations. The return values are consistent and the logic flow is straightforward.


163-181: LGTM! Well-implemented config value retrieval.

The function correctly handles dot notation key traversal with proper type checking and graceful fallback to default values.

docs/src/user-guide/guides-using-presto.md (3)

1-24: LGTM! Clear introduction and comprehensive requirements.

The introduction effectively explains Presto's role with CLP and the requirements section provides specific version information that users need.


105-155: LGTM! Comprehensive querying guide and clear limitations.

The querying section provides practical examples with SQL commands, and the limitations section appropriately sets user expectations about current constraints.


58-58: Consider adding comma for clarity.

The sentence structure could benefit from a comma for better readability.

-    * For each dataset you want to query, add a filter config of the form:
+    * For each dataset you want to query, add a filter config of the form:

Actually, upon review, the current comma placement is correct. The static analysis suggestion appears to be a false positive.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
docs/src/user-guide/guides-using-presto.md (2)

87-93: Add the missing word “one” in the scaling instruction

The sentence is still missing the determiner after more than.

-* To use more than Presto worker, you can use the `--scale` option as follows:
+* To use more than one Presto worker, you can use the `--scale` option as follows:

34-36: Vary sentence openings to improve flow

Two adjacent sentences begin with Follow / sample, which triggers the style checker. A quick tweak:

-Follow the [quick-start](./quick-start/index.md) guide to set up CLP and compress your logs. A
-sample dataset that works well with Presto is [postgresql].
+Consult the [quick-start](./quick-start/index.md) guide to set up CLP and compress your logs.
+One sample dataset that works well with Presto is [postgresql].
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4d896e5 and 713670a.

📒 Files selected for processing (1)
  • docs/src/user-guide/guides-using-presto.md (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: quinntaylormitchell
PR: y-scope/clp#968
File: docs/src/user-guide/quick-start/overview.md:73-109
Timestamp: 2025-06-18T20:39:05.899Z
Learning: The CLP project team prefers to use video content to demonstrate detailed procedural steps (like tarball extraction) rather than including every step in the written documentation, keeping the docs focused on conceptual guidance.
docs/src/user-guide/guides-using-presto.md (4)

Learnt from: quinntaylormitchell
PR: #968
File: docs/src/user-guide/quick-start/overview.md:73-109
Timestamp: 2025-06-18T20:39:05.899Z
Learning: The CLP project team prefers to use video content to demonstrate detailed procedural steps (like tarball extraction) rather than including every step in the written documentation, keeping the docs focused on conceptual guidance.

Learnt from: quinntaylormitchell
PR: #1069
File: docs/src/user-guide/quick-start/clp-json.md:135-152
Timestamp: 2025-07-05T03:38:16.779Z
Learning: The CLP project team has decided to refrain from using include directives in their documentation at present, preferring to maintain duplicated content rather than using shared includes or partials for de-duplication.

Learnt from: quinntaylormitchell
PR: #961
File: docs/src/dev-guide/design-clp-structured/single-file-archive-format.md:216-219
Timestamp: 2025-06-18T14:35:20.485Z
Learning: In clp-s documentation, technical abbreviations like "MPT" (Merged Parse Tree) should be defined at first use to improve reader clarity and comprehension.

Learnt from: gibber9809
PR: #955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.

🪛 LanguageTool
docs/src/user-guide/guides-using-presto.md

[style] ~32-~32: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...'s metadata database and archives. ### Setting up CLP Follow the [quick-start](./quic...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[typographical] ~58-~58: It appears that a comma is missing.
Context: ...e/metadata-filter.json`. * For each dataset you want to query, add a filter config ...

(DURING_THAT_TIME_COMMA)


[style] ~75-~75: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...aset>` with the name of the dataset you want to query. (If you didn't specify a ...

(REP_WANT_TO_VB)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: build-macos (macos-15, false)
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins

Comment on lines +129 to +134
All kv-pairs in each log event can be queried directly using dot-notation. For example, if your logs
contain the field `foo.bar`, you can query it using:

```sql
SELECT foo.bar FROM default LIMIT 1;
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Define the abbreviation “kv-pairs” for clarity

Readers who are new to CLP may not immediately recognise the shorthand.

-All kv-pairs in each log event can be queried directly using dot-notation.
+All key-value (KV) pairs in each log event can be queried directly using dot notation.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
All kv-pairs in each log event can be queried directly using dot-notation. For example, if your logs
contain the field `foo.bar`, you can query it using:
```sql
SELECT foo.bar FROM default LIMIT 1;
```
All key-value (KV) pairs in each log event can be queried directly using dot notation. For example, if your logs
contain the field `foo.bar`, you can query it using:
🤖 Prompt for AI Agents
In docs/src/user-guide/guides-using-presto.md around lines 129 to 134, the term
"kv-pairs" is used without defining the abbreviation. Update the text to define
"kv-pairs" as "key-value pairs" the first time it appears to improve clarity for
readers unfamiliar with the shorthand.

@anlowee anlowee requested a review from junhaoliao July 28, 2025 21:15
Copy link
Member

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deferring to @kirkrodrigues 's approval

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants