Skip to content

Conversation

davemarco
Copy link
Contributor

@davemarco davemarco commented Jul 10, 2025

Description

Background:

The UI team is working on making the package compatible with the Presto connector.

This PR disables unnecessary components when using Presto and passes a corresponding flag to the Web UI.

Changes:

Introduced a new query_engine option in the package config, which can be set to either "native" or "presto".

"native" refers to the current behavior of the package.

"presto" disables internal query-related components, as querying will now be handled by Presto.

Note:
The stop-clp.sh script has not been modified, as it already gracefully handles the case where components are not running (i.e., it does not fail if it attempts to stop components that were never started).

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

No docs for this until fully implemented

Validation performed

Started and stopped the package with both presto and native options

Summary by CodeRabbit

  • New Features

    • Added configurable query engine options (CLP, CLP‑S, Presto) across the app, including config files and the web UI.
    • Startup now launches only components compatible with the selected query engine and provides a clear message when a requested target can’t run with the current configuration.
  • Chores

    • Updated packaging to set the query engine automatically, aligning it with the selected storage engine.
    • Refreshed configuration template to include an explicit query_engine setting.

Copy link
Contributor

coderabbitai bot commented Jul 10, 2025

Walkthrough

Adds query engine awareness and runtime component filtering. The config introduces QueryEngine options and component groupings, exposes a get_runnable_components API, and maps targets to components. The start script now intersects target components with runnable ones, exiting if none. UI exports query engine enum. Packaging sets query_engine, and template config enables it.

Changes

Cohort / File(s) Summary
Runtime component filtering in startup
components/clp-package-utils/clp_package_utils/scripts/start_clp.py
Loads config, gets runnable components, intersects with target-mapped components; exits early if empty. All component start decisions now check membership in this filtered set. Adds ClpQueryEngine to web UI settings and adjusts controller-specific initialization.
Config: query engine, component sets, target mapping
components/clp-py-utils/clp_py_utils/clp_config.py
Adds QueryEngine enum; defines component groups and TARGET_TO_COMPONENTS; exposes VALID_QUERY_ENGINES. Extends Package with query_engine, validators, and cross-field compatibility. Adds CLPConfig.get_runnable_components and get_components_for_target helper.
Template config default
components/package-template/src/etc/clp-config.yml
Uncomments package.query_engine: "clp".
Web UI settings/types
components/webui/client/src/config/index.ts
Adds and exports CLP_QUERY_ENGINES enum; types SETTINGS_QUERY_ENGINE using it.
Build/packaging adjustments
taskfile.yaml
package-tar now also sets query_engine in clp_config.py and clp-config.yml using STORAGE_ENGINE value via sed-based replace tasks.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant start_clp.py
  participant CLPConfig
  participant Components

  User->>start_clp.py: invoke with target
  start_clp.py->>CLPConfig: load config
  CLPConfig-->>start_clp.py: get_runnable_components()
  start_clp.py->>CLPConfig: get_components_for_target(target)
  CLPConfig-->>start_clp.py: component set
  start_clp.py->>start_clp.py: intersect sets
  alt empty intersection
    start_clp.py-->>User: exit -1 (target not startable)
  else components present
    start_clp.py->>Components: start only components in intersection
    Components-->>start_clp.py: started
    start_clp.py-->>User: done
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • haiqi96
  • kirkrodrigues
  • junhaoliao
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@davemarco davemarco marked this pull request as ready for review July 10, 2025 17:16
@davemarco davemarco requested a review from a team as a code owner July 10, 2025 17:16
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 448a4d6 and 5634b61.

📒 Files selected for processing (1)
  • components/clp-package-utils/clp_package_utils/scripts/start_clp.py (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
components/clp-package-utils/clp_package_utils/scripts/start_clp.py (2)
Learnt from: haiqi96
PR: y-scope/clp#594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in `components/clp-package-utils/clp_package_utils/scripts/`, note that it's preferred to keep error handling simple without adding extra complexity.
Learnt from: haiqi96
PR: y-scope/clp#569
File: components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py:380-392
Timestamp: 2024-11-15T20:07:22.256Z
Learning: The current implementation assumes single-threaded execution, so race conditions in functions like `is_target_extracted` in `components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py` are not currently a concern.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: build (ubuntu-latest)
  • GitHub Check: build (macos-latest)
  • GitHub Check: lint-check (macos-latest)
  • GitHub Check: lint-check (ubuntu-latest)
🔇 Additional comments (2)
components/clp-package-utils/clp_package_utils/scripts/start_clp.py (2)

28-28: LGTM: Correct import for new functionality.

The import of QueryEngine is necessary and properly placed with other related imports.


884-884: LGTM: Proper integration with web UI settings.

The query engine configuration is correctly passed to the client settings, allowing the web UI to adjust its behaviour based on the configured query engine.

Comment on lines 1214 to 1215
if QueryEngine.PRESTO != clp_config.package.query_engine:
start_query_scheduler(instance_id, clp_config, container_clp_config, mounts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Consider using positive conditionals for better readability.

The negative conditional QueryEngine.PRESTO != clp_config.package.query_engine works but could be more explicit. Consider using QueryEngine.NATIVE == clp_config.package.query_engine for clarity, assuming native is the alternative.

-            if QueryEngine.PRESTO != clp_config.package.query_engine:
+            if QueryEngine.NATIVE == clp_config.package.query_engine:
                start_query_scheduler(instance_id, clp_config, container_clp_config, mounts)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if QueryEngine.PRESTO != clp_config.package.query_engine:
start_query_scheduler(instance_id, clp_config, container_clp_config, mounts)
if QueryEngine.NATIVE == clp_config.package.query_engine:
start_query_scheduler(instance_id, clp_config, container_clp_config, mounts)
🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/scripts/start_clp.py around
lines 1214 to 1215, replace the negative conditional checking if the query
engine is not PRESTO with a positive conditional explicitly checking if it
equals NATIVE. This improves readability by clearly stating the intended
condition. Change the if statement to use `QueryEngine.NATIVE ==
clp_config.package.query_engine` instead of `QueryEngine.PRESTO !=
clp_config.package.query_engine`.

@davemarco davemarco requested review from haiqi96 and gibber9809 July 10, 2025 17:26
@davemarco
Copy link
Contributor Author

@haiqi96 @gibber9809 , since this isnt really a UI pr, i asked kirk for who should review, and he mentioned u guys

QUERY_WORKER_COMPONENT_NAME,
REDUCER_COMPONENT_NAME,
):
logger.error(f"{target} not available when using Presto as query engine")
Copy link
Contributor

@haiqi96 haiqi96 Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
logger.error(f"{target} not available when using Presto as query engine")
logger.error(f"{target} not available when using {clp_config.package.query_engine} as query engine")

Maybe can also create a variable query_engine for reuse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about just QueryEngine.PRESTO

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, works for me.

@davemarco
Copy link
Contributor Author

also tagging @junhaoliao since this may be used in cloud

@davemarco davemarco requested a review from haiqi96 July 14, 2025 18:43
Copy link
Contributor

@gibber9809 gibber9809 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it would make sense to refactor start_clp.py to the list of components that need to be started based on the config and comparing against that list everywhere instead of having checks littered throughout that are related to specific configuration parameters.

e.g. somewhat like the following

runnable_components = clp_config.get_runnable_components()

if target not in [ALL_TARGET_NAME, *runnable_components]:
    logger.error(...)
...
if target in (ALL_TARGET_NAME, CONTROLLER_TARGET_NAME, QUERY_SCHEDULER_COMPONENT_NAME):
    if QUERY_SCHEDULER_COMPONENT_NAME in runnable_components:
        start_query_scheduler(...)
 ...

Then all we have to do is keep get_runnable_components up to date as we add configuration parameters that effect which components need to be run.

@davemarco
Copy link
Contributor Author

I'm okay with devin's suggestion. It will also add an extra line above each component for for the new condition.
ANY_COMPONENT_NAME in runnable_components:
@haiqi96 what do u think

@haiqi96
Copy link
Contributor

haiqi96 commented Jul 14, 2025

I'm okay with devin's suggestion. It will also add an extra line above each component for for the new condition. ANY_COMPONENT_NAME in runnable_components: @haiqi96 what do u think

Sure, this sounds like a more general approach and can be extended in the future

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4aad8fa and 139fdcf.

📒 Files selected for processing (2)
  • components/clp-package-utils/clp_package_utils/scripts/start_clp.py (19 hunks)
  • components/clp-py-utils/clp_py_utils/clp_config.py (17 hunks)
🧰 Additional context used
🧠 Learnings (2)
components/clp-package-utils/clp_package_utils/scripts/start_clp.py (15)
Learnt from: haiqi96
PR: y-scope/clp#594
File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65
Timestamp: 2024-11-18T16:49:20.248Z
Learning: When reviewing wrapper scripts in `components/clp-package-utils/clp_package_utils/scripts/`, note that it's preferred to keep error handling simple without adding extra complexity.
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within `components/core/tools/scripts/lib_install/`, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like `readonly` declarations.
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-packages-from-source.sh:6-8
Timestamp: 2025-07-01T14:51:19.172Z
Learning: In CLP installation scripts within `components/core/tools/scripts/lib_install/`, maintain consistency with existing variable declaration patterns across platforms rather than adding individual improvements like `readonly` declarations.
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like `apt-get update && apt-get install` or `apk update && apk add`), preserve this pattern for uniform script structure.
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like `apt-get update && apt-get install` or `apk update && apk add`), preserve this pattern for uniform script structure.
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across platforms rather than applying platform-specific optimizations. When a platform follows a pattern of separate update and install commands (like `apt-get update && apt-get install` or `apk update && apk add`), preserve this pattern for uniform script structure.
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:6-15
Timestamp: 2025-07-01T14:52:15.217Z
Learning: For installation scripts in the CLP project, maintain consistency in command patterns across different platforms (e.g., using separate update and install commands like `apk update && apk add`, `apt update && apt install`, `yum update && yum install`) rather than platform-specific optimizations, to ensure uniform script structure and readability.
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/scripts/lib_install/musllinux_1_2/install-prebuilt-packages.sh:27-32
Timestamp: 2025-07-07T17:41:15.655Z
Learning: In CLP installation scripts, consistency across platform scripts is prioritized over defensive programming improvements. For example, when extracting Task binaries with tar in `install-prebuilt-packages.sh`, the extraction pattern should remain consistent with other platform scripts rather than adding defensive flags like `--strip-components=1` to handle potential tarball layout changes.
Learnt from: kirkrodrigues
PR: y-scope/clp#881
File: components/core/tools/scripts/lib_install/ubuntu-jammy/install-prebuilt-packages.sh:35-41
Timestamp: 2025-05-06T09:48:55.408Z
Learning: For installation scripts in the CLP project, prefer explicit error handling over automatic dependency resolution (like `apt-get install -f`) when installing packages to give users more control over their system.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1036
File: components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py:204-211
Timestamp: 2025-07-03T12:58:18.407Z
Learning: In the CLP codebase, the validate_and_cache_dataset function in components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py uses in-place updates of the existing_datasets set parameter rather than returning a new set, as preferred by the development team.
Learnt from: haiqi96
PR: y-scope/clp#569
File: components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py:380-392
Timestamp: 2024-11-15T20:07:22.256Z
Learning: The current implementation assumes single-threaded execution, so race conditions in functions like `is_target_extracted` in `components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py` are not currently a concern.
Learnt from: jackluo923
PR: y-scope/clp#1054
File: components/core/tools/docker-images/clp-env-base-musllinux_1_2-aarch64/build.sh:3-5
Timestamp: 2025-07-07T17:43:04.349Z
Learning: In CLP project build scripts (specifically build.sh files in docker-images directories), maintain consistency with the established pattern of using separate `set -eu` and `set -o pipefail` commands rather than combining them into `set -euo pipefail`, to ensure uniform script structure across all platform build scripts.
Learnt from: gibber9809
PR: y-scope/clp#955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
Learnt from: haiqi96
PR: y-scope/clp#523
File: components/core/src/clp/clp/FileCompressor.hpp:58-78
Timestamp: 2024-10-24T14:25:17.978Z
Learning: When reviewing legacy code refactors, avoid suggesting changes that would extend the scope of the PR.
Learnt from: Bill-hbrhbr
PR: y-scope/clp#831
File: components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py:0-0
Timestamp: 2025-04-17T16:55:23.796Z
Learning: In the CLP project, SQL queries should use parameterized queries with placeholders (%s) and pass values as a tuple to `db_cursor.execute()` to prevent SQL injection, rather than directly interpolating values into the query string.
components/clp-py-utils/clp_py_utils/clp_config.py (2)
Learnt from: haiqi96
PR: y-scope/clp#743
File: components/clp-py-utils/clp_py_utils/clp_config.py:339-365
Timestamp: 2025-04-21T16:21:07.371Z
Learning: When validating enum values in Pydantic models with a pre=True root_validator, convert string values to enum instances using EnumClass(string_value) within a try/except block before comparison, as direct comparison between strings and enum instances will always return False.
Learnt from: haiqi96
PR: y-scope/clp#594
File: components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py:104-110
Timestamp: 2024-11-15T16:21:52.122Z
Learning: In `clp_package_utils/scripts/native/del_archives.py`, when deleting archives, the `archive` variable retrieved from the database is controlled and is always a single string without path components. Therefore, it's acceptable to skip additional validation checks for directory traversal in this context.
🧬 Code Graph Analysis (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
components/clp-py-utils/clp_py_utils/core.py (2)
  • make_config_path_absolute (24-35)
  • get_config_value (7-21)
🪛 GitHub Actions: clp-lint
components/clp-package-utils/clp_package_utils/scripts/start_clp.py

[error] 136-159: Black formatting check failed. File would be reformatted. Run 'black --write' to fix code style issues.

🔇 Additional comments (11)
components/clp-py-utils/clp_py_utils/clp_config.py (6)

3-3: LGTM!

The addition of List to the typing imports is necessary for the new runnable components functionality.


46-68: LGTM!

The runnable components lists are well-defined and logically sound. Excluding query-related components for Presto is correct since external query processing is handled by Presto itself.


75-78: LGTM!

The QueryEngine enum follows the established pattern using LowercaseStrEnum and provides the necessary values for native and Presto query engines.


98-98: LGTM!

The new query_engine field with default value "native" maintains backward compatibility while enabling the new functionality.


109-117: LGTM!

The query_engine validator follows the established pattern and is consistent with other validators in the Package class.


156-158: LGTM!

The formatting improvements enhance readability and follow Python conventions without affecting functionality.

Also applies to: 179-181, 223-225, 409-415, 488-490, 511-513, 667-671, 676-678, 687-689, 691-693, 694-697, 750-752, 780-780, 795-800, 823-825, 827-829, 831-833

components/clp-package-utils/clp_package_utils/scripts/start_clp.py (5)

28-28: LGTM!

The QueryEngine import is necessary for the new conditional component execution logic.


1172-1185: LGTM!

The runnable components logic is well-implemented and follows the approach suggested in previous reviews. The early exit for unavailable components prevents unnecessary processing.


1262-1320: LGTM!

The conditional execution logic is consistent and follows the established pattern. This approach is cleaner than negative conditionals and addresses the concerns raised in previous reviews.


943-943: LGTM!

The query engine setting is correctly propagated to the web UI client settings, enabling the UI to adapt based on the configured query engine.


141-143: LGTM!

The formatting improvements enhance readability and follow Python conventions without affecting functionality.

Also applies to: 151-153, 182-192, 379-381, 424-426, 430-433, 442-444, 447-453, 467-472, 508-510, 528-532, 534-538, 551-556, 667-669, 823-825, 880-882, 915-923, 925-927, 930-932, 1028-1030, 1146-1148, 1172-1174, 1276-1284, 1290-1293, 1294-1302, 1304-1307, 1309-1312, 1314-1317, 1319-1320

@davemarco
Copy link
Contributor Author

@gibber9809 @haiqi96 - I changed per ur comment

@davemarco davemarco requested a review from gibber9809 July 15, 2025 17:30
davemarco and others added 9 commits August 11, 2025 10:08
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
…lp.py

Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
…lp.py

Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (5)
components/clp-py-utils/clp_py_utils/clp_config.py (3)

158-178: Fix enum comparisons in validator.

The validator incorrectly compares string values with enum members, which will always evaluate to False. Use .value for proper comparison.

 @root_validator
 def validate_query_engine_package_compatibility(cls, values):
     query_engine = values.get("query_engine")
     storage_engine = values.get("storage_engine")
 
-    if query_engine in [QueryEngine.CLP, QueryEngine.CLP_S]:
+    if query_engine in [QueryEngine.CLP.value, QueryEngine.CLP_S.value]:
         if query_engine != storage_engine:
             raise ValueError(
                 f"query_engine '{query_engine}' is only compatible with "
                 f"storage_engine '{query_engine}'."
             )
-    elif query_engine == QueryEngine.PRESTO:
-        if storage_engine != StorageEngine.CLP_S:
+    elif query_engine == QueryEngine.PRESTO.value:
+        if storage_engine != StorageEngine.CLP_S.value:
             raise ValueError(
-                f"query_engine '{QueryEngine.PRESTO}' is only compatible with "
-                f"storage_engine '{StorageEngine.CLP_S}'."
+                f"query_engine '{QueryEngine.PRESTO.value}' is only compatible with "
+                f"storage_engine '{StorageEngine.CLP_S.value}'."
             )
     else:
         raise ValueError(f"Unsupported query_engine '{query_engine}'.")
 
     return values

860-864: Fix enum comparison and use predefined component groups.

The method has an enum comparison issue and should use the component groups as suggested in past reviews.

 def get_runnable_components(self) -> Set[str]:
-    if QueryEngine.PRESTO == self.package.query_engine:
+    if QueryEngine.PRESTO.value == self.package.query_engine:
         return COMPRESSION_COMPONENTS | UI_COMPONENTS
     else:
         return ALL_COMPONENTS

902-908: Apply the suggested refactor for clarity.

As suggested in past reviews, this can be simplified.

 def get_components_for_target(target: str) -> Set[str]:
     if target in TARGET_TO_COMPONENTS:
         return TARGET_TO_COMPONENTS[target]
     elif target in ALL_COMPONENTS:
         return {target}
     else:
         return set()
components/clp-package-utils/clp_package_utils/scripts/start_clp.py (2)

111-119: Use explicit length check for clarity.

As suggested in past reviews, use explicit length comparison for better readability.

-        if len(components_to_start) == 0:
+        if len(components_to_start) == 0:

Note: The suggestion is already correctly implemented in the code.


197-200: Verify DB availability before creating tables.

The special handling for CONTROLLER_TARGET_NAME could attempt to create tables when the DB component isn't running. Consider checking if the DB container is actually running before attempting table creation.

#!/bin/bash
# Check if there's any validation that DB is running before table creation
rg -A 5 "create_db_tables" --type py
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0180772 and a113117.

📒 Files selected for processing (2)
  • components/clp-package-utils/clp_package_utils/scripts/start_clp.py (4 hunks)
  • components/clp-py-utils/clp_py_utils/clp_config.py (7 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
components/clp-package-utils/clp_package_utils/scripts/start_clp.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (2)
  • get_components_for_target (902-908)
  • get_runnable_components (860-864)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
🔇 Additional comments (8)
components/clp-py-utils/clp_py_utils/clp_config.py (4)

3-3: LGTM!

The Set import is correctly added for the new methods that return sets of components.


30-50: Component groupings are well-structured.

Good use of set operations to build component hierarchies. The reuse of GENERAL_SCHEDULING_COMPONENTS in both COMPRESSION_COMPONENTS and QUERY_COMPONENTS follows DRY principles.


56-63: Target mapping is correctly defined.

The mapping appropriately associates targets with their component sets, with the controller target including the necessary scheduling infrastructure.


113-116: LGTM!

QueryEngine enum properly defined following the same pattern as StorageEngine.

components/clp-package-utils/clp_package_utils/scripts/start_clp.py (4)

25-25: LGTM!

Import correctly added for the new component filtering functionality.


884-884: LGTM!

Query engine setting correctly added to web UI client configuration.


194-195: LGTM!

Component start correctly checks membership in the filtered set.


202-234: Component filtering consistently applied.

All component starts correctly check membership in the filtered set. Note that results cache indices creation has the same special controller handling as DB tables.

@davemarco
Copy link
Contributor Author

will test building package again before merging

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
components/webui/client/src/config/index.ts (1)

1-1: Remove redundant import of CLP_QUERY_ENGINES

The CLP_QUERY_ENGINES import from common/index.js is unnecessary since you're defining it locally in this file. This creates a naming conflict.

Apply this diff to remove the redundant import:

-import {CLP_QUERY_ENGINES} from "../../../common/index.js";
import {settings} from "../settings";
♻️ Duplicate comments (4)
components/clp-py-utils/clp_py_utils/clp_config.py (2)

831-831: Fix the enum comparison

The comparison between QueryEngine.PRESTO (enum) and self.package.query_engine (string) will always be False.

Apply this diff to fix the comparison:

-        if QueryEngine.PRESTO == self.package.query_engine:
+        if QueryEngine.PRESTO.value == self.package.query_engine:

132-143: Consider consistent enum value comparisons in the validator

The validator mixes string and enum comparisons. While functional due to string enums, using consistent comparison approaches would improve clarity.

Apply this diff for more consistent comparisons:

-        if query_engine in [QueryEngine.CLP, QueryEngine.CLP_S]:
+        if query_engine in [QueryEngine.CLP.value, QueryEngine.CLP_S.value]:
             if query_engine != storage_engine:
                 raise ValueError(
                     f"query_engine '{query_engine}' is only compatible with "
                     f"storage_engine '{query_engine}'."
                 )
-        elif query_engine == QueryEngine.PRESTO:
-            if storage_engine != StorageEngine.CLP_S:
+        elif query_engine == QueryEngine.PRESTO.value:
+            if storage_engine != StorageEngine.CLP_S.value:
                 raise ValueError(
-                    f"query_engine '{QueryEngine.PRESTO}' is only compatible with "
-                    f"storage_engine '{StorageEngine.CLP_S}'."
+                    f"query_engine '{QueryEngine.PRESTO.value}' is only compatible with "
+                    f"storage_engine '{StorageEngine.CLP_S.value}'."
                 )
components/clp-package-utils/clp_package_utils/scripts/start_clp.py (2)

1120-1120: Use explicit length comparison for clarity

While if not components_to_start is Pythonic, using if len(components_to_start) == 0 would be more explicit and consistent with the team's style preferences.

Apply this diff:

-        if len(components_to_start) == 0:
+        if len(components_to_start) == 0:

Wait, the code already uses the explicit form. No change needed.


1201-1204: Simplify controller target special handling logic

The special conditions for creating DB tables and cache indices when CONTROLLER_TARGET_NAME is targeted add unnecessary complexity. The components_to_start set already contains the correct filtered components.

Apply this diff to simplify the logic:

-        if (
-            target == CONTROLLER_TARGET_NAME and DB_COMPONENT_NAME in runnable_components
-        ) or DB_COMPONENT_NAME in components_to_start:
+        if DB_COMPONENT_NAME in components_to_start:
             create_db_tables(instance_id, clp_config, container_clp_config, mounts)

         # ... other code ...

-        if (
-            target == CONTROLLER_TARGET_NAME and RESULTS_CACHE_COMPONENT_NAME in runnable_components
-        ) or RESULTS_CACHE_COMPONENT_NAME in components_to_start:
+        if RESULTS_CACHE_COMPONENT_NAME in components_to_start:
             create_results_cache_indices(instance_id, clp_config, container_clp_config, mounts)

Also applies to: 1215-1218

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a113117 and 0ea1f1a.

📒 Files selected for processing (3)
  • components/clp-package-utils/clp_package_utils/scripts/start_clp.py (4 hunks)
  • components/clp-py-utils/clp_py_utils/clp_config.py (7 hunks)
  • components/webui/client/src/config/index.ts (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit Configuration File

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/webui/client/src/config/index.ts
🧬 Code Graph Analysis (2)
components/clp-package-utils/clp_package_utils/scripts/start_clp.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (2)
  • get_components_for_target (872-878)
  • get_runnable_components (830-834)
components/webui/client/src/config/index.ts (1)
components/webui/common/index.ts (1)
  • CLP_QUERY_ENGINES (133-133)
🪛 Biome (2.1.2)
components/webui/client/src/config/index.ts

[error] 16-16: Shouldn't redeclare 'CLP_QUERY_ENGINES'. Consider to delete it or rename it.

'CLP_QUERY_ENGINES' is defined here:

(lint/suspicious/noRedeclare)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: build-macos (macos-15, true)
  • GitHub Check: build-macos (macos-15, false)
🔇 Additional comments (2)
components/clp-package-utils/clp_package_utils/scripts/start_clp.py (2)

1115-1122: LGTM! Clean implementation of component filtering

The implementation correctly computes the intersection of target components and runnable components, with proper early exit when no components can be started. The error message clearly indicates the issue to the user.


1198-1238: LGTM! Consistent component start logic

All component startup calls now properly check membership in components_to_start, providing uniform behaviour across all components. This is a clean and maintainable approach.

kirkrodrigues
kirkrodrigues previously approved these changes Aug 12, 2025
Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feat(clp-package)!: Add query engine option to support starting only compression components when using the Presto query engine.

@kirkrodrigues kirkrodrigues dismissed gibber9809’s stale review August 12, 2025 19:47

@gibber9809's review has been addressed but they may not have time to re-review and approve.

@davemarco
Copy link
Contributor Author

davemarco commented Aug 13, 2025

builds now. Also tested package startup for clp, and presto, and did not encounter issues

@davemarco
Copy link
Contributor Author

re: title it also starts the UI components right

@kirkrodrigues
Copy link
Member

re: title it also starts the UI components right

True. How about:

feat(clp-package)!: Add query engine option to support starting only compression and UI components when using the Presto query engine.

I tested the tasks to build the package tars and they worked (with the hotfix for the sed issue).

@davemarco davemarco changed the title feat(clp-package): Add option to config for Presto which skips native query engine components on startup. feat(clp-package)!: Add query engine option to support starting only compression and UI components when using the Presto query engine. Aug 13, 2025
@davemarco davemarco requested review from kirkrodrigues and a team August 13, 2025 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants