Skip to content

Conversation

Bill-hbrhbr
Copy link
Contributor

@Bill-hbrhbr Bill-hbrhbr commented Apr 30, 2025

Description

This PR updates the compression, decompressoin, and search workflows to support dataset-specific operations when using the CLP_S storage engine:

  • All metadata tables should now be prefixed with clp_{dataset_name}_, with the exception of the following:
    • clp_datasets: The central table where all datasets info are stored
    • compression_jobs
    • compression_tasks
    • query_jobs
    • query_tasks
  • Archive paths for fs and s3, and stream output directories are suffixed with the dataset name to ensure physical isolation.

This PR succeeds:

and covers step no. 3 of the dataset feature implementation plan.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

For clp-s, the initial db table set:
image

CMD with interface changes taking effect:

bingranhu@baker22:/home/bingranhu/clp/build/clp-package$ ./sbin/compress.sh fs ~/dataset/data/
2025-04-30T05:57:07.264 INFO [compress] Compression job 1 submitted.
2025-04-30T05:57:08.799 INFO [compress] Compression finished.
2025-04-30T05:57:08.799 INFO [compress] Compressed 1.77KB into 889.00B (2.04x). Speed: 2.39KB/s.
bingranhu@baker22:/home/bingranhu/clp/build/clp-package$ ./sbin/search.sh "level: INFO"
: {"timestamp":"2025-04-10T23:10:45Z","level":"INFO","service":"auth-service","message":"User login successful","user_id":"abc123","ip":"192.168.1.10"}
: {"timestamp":"2025-04-10T23:10:45Z","level":"INFO","service":"auth-service","message":"User login successful","user_id":"abc123","ip":"192.168.1.10"}
: {"timestamp":"2025-04-10T23:10:45Z","level":"INFO","service":"auth-service","message":"User login successful","user_id":"abc123","ip":"192.168.1.10"}
bingranhu@baker22:/home/bingranhu/clp/build/clp-package$ ./sbin/compress.sh fs ~/dataset/data/
2025-04-30T05:58:35.211 INFO [compress] Compression job 2 submitted.
2025-04-30T05:58:35.713 INFO [compress] Compressed 1.77KB into 889.00B (2.04x). Speed: 4.02KB/s.
2025-04-30T05:58:36.215 INFO [compress] Compression finished.
2025-04-30T05:58:36.215 INFO [compress] Compressed 1.77KB into 889.00B (2.04x). Speed: 2.92KB/s.
bingranhu@baker22:/home/bingranhu/clp/build/clp-package$ ./sbin/search.sh "level: INFO"
: {"timestamp":"2025-04-10T23:10:45Z","level":"INFO","service":"auth-service","message":"User login successful","user_id":"abc123","ip":"192.168.1.10"}
: {"timestamp":"2025-04-10T23:10:45Z","level":"INFO","service":"auth-service","message":"User login successful","user_id":"abc123","ip":"192.168.1.10"}
: {"timestamp":"2025-04-10T23:10:45Z","level":"INFO","service":"auth-service","message":"User login successful","user_id":"abc123","ip":"192.168.1.10"}
: {"timestamp":"2025-04-10T23:10:45Z","level":"INFO","service":"auth-service","message":"User login successful","user_id":"abc123","ip":"192.168.1.10"}
: {"timestamp":"2025-04-10T23:10:45Z","level":"INFO","service":"auth-service","message":"User login successful","user_id":"abc123","ip":"192.168.1.10"}
: {"timestamp":"2025-04-10T23:10:45Z","level":"INFO","service":"auth-service","message":"User login successful","user_id":"abc123","ip":"192.168.1.10"}
bingranhu@baker22:/home/bingranhu/clp/build/clp-package$

Summary by CodeRabbit

  • New Features

    • Enhanced support for dataset-aware operations with the CLP_S storage engine, including dynamic table prefixes and path naming based on dataset context.
    • Added automatic dataset registration and caching in compression scheduling workflows.
    • Introduced dataset-specific handling in query scheduling and archive management.
    • Added new dataset metadata management capabilities, including dataset registration and schema updates.
  • Bug Fixes

    • Ensured correct handling of dataset-specific paths and table prefixes in compression, decompression, query, and archive management operations.
  • Chores

    • Improved documentation and code clarity for dataset management utilities.

Copy link
Contributor

coderabbitai bot commented Apr 30, 2025

Note

Reviews paused

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.
## Walkthrough

This set of changes introduces dataset-aware logic across several components, primarily by dynamically adjusting SQL table prefixes and archive storage paths based on the storage engine and dataset name. Function signatures and internal logic are updated to propagate `storage_engine` and `dataset` parameters, especially for the `CLP_S` storage engine, ensuring that database operations and archive management are correctly namespaced per dataset. New helper functions are added for dataset registration and caching, and command construction for compression and search tasks is adjusted to use dataset-specific paths and prefixes. No changes are made to error handling or overall control flow beyond these parameterizations.

## Changes

| File(s)                                                                                      | Change Summary                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
|---------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py`           | Added `storage_engine` and `dataset` parameters to main, `_find_archives`, and `_delete_archives`. Table prefix is now conditionally suffixed with dataset name for `CLP_S` engine.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| `components/clp-package-utils/clp_package_utils/scripts/native/decompress.py`                | Updated `get_orig_file_id` to accept and use `storage_engine` and `dataset` for table prefixing. Adjusted `handle_extract_stream_cmd` to pass these parameters.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| `components/clp-package-utils/clp_package_utils/scripts/start_clp.py`                        | Modified `start_webui` to dynamically adjust `table_prefix` with dataset name for `CLP_S` storage engine when constructing Meteor settings.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| `components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py`                              | Added `insert_new_datasets_table_entry` for dataset registration. Adjusted table creation logic to use dataset-aware prefixes. Cleaned up imports and docstrings.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| `components/clp-py-utils/clp_py_utils/initialize-clp-metadata-db.py`                        | Changed logic so only one of `create_datasets_table` or `create_metadata_db_tables` is called based on storage engine, making them mutually exclusive.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| `components/job-orchestration/job_orchestration/executor/compress/compression_task.py`       | Updated `run_clp` to adjust archive output directory, table prefix, and S3 key prefix with dataset name for `CLP_S`. Passes dataset name to indexer command.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| `components/job-orchestration/job_orchestration/executor/query/fs_search_task.py`            | Modified `_make_core_clp_s_command_and_env_vars` to append dataset name to archive directory and S3 key prefix.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| `components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py` | Added `_fetch_existing_datasets` for dataset caching. Updated `search_and_schedule_new_tasks` and `main` to handle datasets, adjust table prefixes, and register new datasets as needed for `CLP_S`.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| `components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py`          | Added `StorageEngine` parameter to key functions. Adjusted table prefixing in `handle_pending_query_jobs` to include dataset name for `CLP_S`. Propagated storage engine parameter through function calls.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |

## Sequence Diagram(s)

```mermaid
sequenceDiagram
    participant Scheduler
    participant MetadataDB
    participant DatasetCache

    Scheduler->>MetadataDB: Fetch existing datasets
    MetadataDB-->>Scheduler: Return set of dataset names

    loop For each new compression job
        Scheduler->>DatasetCache: Check if dataset is registered
        alt Dataset not in cache
            Scheduler->>MetadataDB: Insert new dataset entry
            Scheduler->>MetadataDB: Create metadata tables for dataset
            MetadataDB-->>Scheduler: Acknowledge
            Scheduler->>DatasetCache: Add dataset to cache
        end
        Scheduler->>MetadataDB: Use dataset-aware table prefix for job metadata
    end
sequenceDiagram
    participant Executor
    participant Config
    participant ArchiveStorage
    participant DB

    Executor->>Config: Get storage_engine and dataset
    alt storage_engine == CLP_S
        Executor->>ArchiveStorage: Set archive dir and S3 key prefix with dataset name
        Executor->>DB: Use table_prefix with dataset name
    else
        Executor->>ArchiveStorage: Use default archive dir and key prefix
        Executor->>DB: Use default table_prefix
    end
    Executor->>Indexer: Run with dataset argument
Loading

Possibly related PRs

  • y-scope/clp#831: The main PR extends archive management functions to handle dataset-specific table prefixes based on the storage engine, while the retrieved PR refactors and centralizes metadata DB table creation logic and conditionally creates the datasets table for the CLP_S storage engine; both directly relate to dataset-aware handling of metadata tables and database schema management in the CLP system.
  • y-scope/clp#864: The main PR extends the dynamic table prefix logic by adding dataset-aware suffixes based on the storage engine, modifying function signatures to pass storage_engine and dataset parameters, and conditionally adjusting table prefixes accordingly; the retrieved PR refactors the code to replace hardcoded table name suffixes with constants and uses a dynamic table_prefix from connection params but does not include dataset-specific logic or parameter changes—thus, the main PR builds upon and extends the approach introduced in the retrieved PR with dataset-level granularity.

Suggested reviewers

  • gibber9809
  • wraymo
  • kirkrodrigues

<!-- walkthrough_end -->
<!-- internal state start -->


<!-- DwQgtGAEAqAWCWBnSTIEMB26CuAXA9mAOYCmGJATmriQCaQDG+Ats2bgFyQAOFk+AIwBWJBrngA3EsgEBPRvlqU0AgfFwA6NPEgQAfACgjoCEYDEZyAAUASpETZWaCrKNxU3bABsvkCiQBHbGlcABpIcVwvOkgAIgAzEmoACgYvbjBuNAYAazRSAEouAFVEEkhaajQy3DBEblF4ePgGCJVo5Ex6ZwYEKR5qWGR4/D4mZl5pRHh8DHClccnEadn0DHoynthIAHdYMkhsaYwiSAADAGEAGSsAfQBlM9DYlAxcCkVsBmlGWExSZAECL7BQTfzLGZzCqiFhLFZQrr2JIUXq7UY5eJefA7ZDwLC4EGXG4PM72AhUUiQMhEPEkDQwEHYbiVGhUjAOcHA6joHyQNi4NAstBtAQddD+Hj+ZoADxiO3U2zOaW4twA3kKarcMGg2ABfW6koEObjcUa4CpVGp1BoMJotfgNKjiWaIcIkaXfbjmkZ8YWJHaMdhUXwC0XSLhK9K3DUkXCIJ7nRbg+G3ISCePhJWw5OQ24CxA5DPnIKUWSp9MJxFnEsuPPVQtnekAQVotHUkLQPlk4S2knKWQJw1GkAE+AJ53i8bW9DOiAAzKTEOT8uUBNkcmRaJ0JQ54jK5QrgeUY+btWwIvhIERsM5MDR+7BZNMGJ2UIh8F5qJD+PELfnY4gGhuAgyC2CO2DwF4W6HKaWCIg4DDfHQyCTBIMxHDw3i+P4JZLsgZgABwAIwAJzhIRc5kZAFFEeRBEACwAGw9us1EEYx9EsfQTAUOCsHQUaNDcJAc4/kef7VLGkCJNQ2ASvAEzRGwbxfqs3CflgtD4AwjjsDEeLiV4eIbvQADi+D4EQ0QVNpulvEBBgAGqdvALLfniaTYEo9BSBQTSyHipwEuUeLtq+QprmUIrWTUTLScOkYZEWgUhIFEnSR8zDoBJNSQGeMRnNgAiUJWrFMBgzQUMwaXBQovGiOaQqStIZDfK84mJdGloAYa7TlD6I5jtszQdE+NBZYi9yib2/T4HgnhxvSFwsMwXRgEZ5CvDQFDxNk5S9P8Pw7JQ5Q0EuMRyGSzjiCc4mIGNJBZYsszsK6fjYBgGBpUmUzfmmAjIPK44nvYNp2i+zoIqxjo+tVt2bCi+zIIgTANPQRpg80rQnoBkBXJZyBKMwLrvNQ5QId8yzxN4oJwt+8HIqi+COqp7K7IeINIB+rPSR9YiQml1RbZudAOXA5RgYK+yckC5XvPAAh4N+16uSQG3SOEqAYGO2UCP4aA5N9fwnCQXGvJ5SjIOQlOIM48haTpykCpDMEstIDn6MY4BQJuYloHghCkOQToxOMTtcLw/DCA1fYyPITBKFQqjqFoOieyYUDuLinRwQHxBkMoNDcSt7BcFQAYOE4LgjvHijKMnmjaLoYCGF7pgGM95BvIgAD0yqZOuK5gErXi98qtxZLkK63CPvco35Xq99q4hSD3M0kLcq3aqQFAaNwsgcAYsTHwYFiQE2ACS+fB2TGyOKt1f4L+B0m4gwH7cbAJbR8tBfKdjJRSftlWcy5SC3GpLSUkWQqD8koG6aUpMxAxHiJlcSk88iUnKs0a8TpIRm2FOVJcd5zjXDuAAEQAKIADEmzFCuNAW4ZCmzQCbPcChDCAByTYACyFDSR4iBLVVaBlqYYH5qseCRUygljeF4eQAjLy1TOLcZo6xbjrynFWaMatYwbw0aSUR4j2T0gAEJDXOCovEtB1GI1jqSLRShog0Bsb0OxvMxGQ2QH8foxUDhMjdujS82RPTmhAaMaeEDyD2NYmcE8UDbywIoLjC+7JVZHiioYzx4QlGhmiBPKU8BpSkiOPpLA9wACKVxIA1ngD8VA5U2yQ07HIvkig7R0C4E0DqS5wlgMiSQUkgQbyj3OPcUBJAKEnFpBoUhJJsmElyRvSYMp+GdBNCLegQNti1RBvleKPhsQXXkJgQ46xKAo1GHSBkqBBRCCOLgJ26A9wNS8diCSkVTp9TyjqH4zh9okwoF8Iu8U+A1gCrdREjjYyC1sf0fwPEtz0nFgoN4HxfCYjeYiSgHw+DG1oBtU4/hhFYD5p/Og4QtloL8g/eQxM2xYx5sVGqIJaCyDPPaW59zHlAJ2VUD50V+wFOlCOSS9BVi1R6RSco/Tpw5SknLd0mgjDmEsE2Lw21WaAkUSy0Qn5cEujEu6U0FBgXDk8KKe07B2zSCMFAKhfMXbTCIMvOS5R/G3wjEoX8xLkjOCIBILguBZDcEChoK4SBcAAG0lwUAALoFF0HoLa/CsBZgmC9bufd0gDynqQYe4hR5ZpVOg6es8e7z3gIvHuy8+xr1hRvLeK5d771JL61sMQsQvl8BIZw8AvlhKleAqZUTZVZGWPpc0lKiTkOobQ+hjDmGsPYbcLhvDDSXgEZQbUvgu2jwKAYO1DrvxOpdRKd1RdPUkF/JYtRGjfX1sQNGeAFAuBWEGPMPlklbhYPgEQLgZDP1lHCMVGkGA8yIE6W8N0ai4yQfNAAXgiMG0NAB5L0HYvCRoEfGxNybICACTCc4XrIA3usXejRT6X3WHfe8r9P6/2QAA6GSS4RJUROHSQLgYzekTI4x+/8nAyQUGAyQUD4G4PQesbBrakBENBpDScDQaGmmYewwmlueGDJptgq9ItOaMEkHzZBMeUYS1gLLRWqtNbV7r03pgJte9ZBnAPZAe1Hjj2/tPW65kHqiNXpI1C5x5GH2UdfTRiKdHZjYP/YB020InEbzxdEKjZCdE0AABJdGS/MWsgKMBcFHB+WTrnOxlHU0mgRpJCOxP89ohLLi+jSHva4qQj62xUbfQSfjKhIsVV/TF5jQGyQ8aHaBzjkBuNSsmWN7rNQuAxvmGlxLWXKD/qW5l9Y2WKi5Y+gViyvhENUNKyQcrmnU2d10/3Mzhmy3j2uzPAtc8GALzjNWr8Nn612e3pQRzznj6xFte3C7maruDzzbd0zYON4Wee5W171mSA9wWNmKYjnD7/dPqqq+QdC4xErjSsSL8ATv3cUY84pBcC3FGL+yxeTXKkh2ELOl7TAnoEQiQL0EQdhBNbO2WYr5oE/O2hB84bG+kceiTOOJ9IUnTCUMCVAmS8EdUWfkq9hTVkosaRhlp56jnoHWesZlx5uqnh+fsrEx16CXROR9ROFz5K/gleMtkY2qRBFK6M8Z03pmzMeEikCvxDpPMSGIQEIIKlVLBQAckBF8vZhD3hAu/HsA4iRcCuNurVanoHXwjXKBfMhNcBgEiWqMPisw2wnDkfM/azSLzk9jFTvyRBacb3p68OX5QzhJY3oqqgYhbgxqSMwb9zBaAM6Fnr1nY7w/d7F33iXsrhRev9uquVZvzyJ+IdOxhs66EMKYSwthnCeF8J5LMU4lLarugjWlCLUV6OyvzyX7Yfq7KLUgBwy8Q1KCQCxDSK0MONisOElmlIzsgETqLMqpjufOqoXJCFquJAsHqpqoatKMaqanwOakZK0FauIDai5m5mTierJGej5hepAJAAYFQTVr+BTs3jTvnrcK5MkLQAIN+lFv1oxrFuEAOLAPNu8KdsphhtGu8LGqSNQVQYRlIX5vQU3jnm3iwbQGwRwfRgNj1kNgvqNrSFxt7nxhvoIcJq/kYcIehvzphjGhIVQTQe1Nphmq9qDrmjdo9kWhPFDg9sZuWrDlZu9ojsjmCKji2kfCfAekDrCA4SZhkNdkZoWndh4TDi9nPAKCat+ukGjiEQDmfJfNfLjnfFXPIEAlAW/AYMilAd/J8JTCgOmiasgEArvpQjQgfgusfsuquuflWJNiuD7iOigiwOJPRnJDzHSt4Fcikt0ikZTsdIrPAAYkerMD2PENtFHAKKFFniyrFiitbC7ILokjnPQP3tkDdEFAsn1Grisj2Frnztur8KIDkG+OgO2rQJ0k7iCNdrHsNlKq7rSO7sMsgGcF0aQD0XSH7gmDkmccshrg8dPsXmgIblXicceFemvo1Kbt8ueCMAclbrCSSmckkjxFclnK0vSrUjOKrpCUUg8cFCSmUKzonoCmIOJJHgKuiXUviCCNwrosODUMccgHrhUC0E0i4PSN/vwMFHwAAS0G6LxKAVloFOEMOHLGitJJbg8czs0NAcqtkfAfqmzIIjqmkLeJ4ugZgTEGakVLgWyJELUiUVAKKcUQ3hKp5mQT8MOCgUaYgSaWaGadgRaZam8NaoDCdHyIKHSJkYDmAEYMDo4dmvvLEVERPGWBDiqPyIKFUNGBwbPBkRjtkdjgXCHPkQTkUWSiUcigpJgWJA0fvvOkfkuqfmursELESvgFILQAHt3s9kkM4jjHWGGHMe5qsBAa7LfA3oKN0HlCQAGEwF4I4KmrZtofJgMpAMkI5E2DYBcOlmuckPRAAAwkSMQJocIobQBf50JXAJr6km4CZx5hj2C9CPRoDNgTkBhK6rBnB4hlCpHkA7BdTXm9l5JBhOaNmdBPEN4fmUDmjCjfkb5+AwgUD0AKLIGm43nRA9js7oa3S7G6JJIgpIUCask9j1qfErhIYND4JEUL6QDFA2AXxmzuiiBKwQoDAwLYXwAABeMQzJ4FJqV0407AMu52+sziqZQoGZ/50gYJIIu6DeyinZZMnBM5zAYGIl6Ziyk+yAxMrZneaStUDS1xr4oo2k9xBI3I95uQuIv4sSpumu2s5o3+5AZs6geUbyJSc+w5wKZw5JQqDOh4vK+Fu4Mo7ZzUaEc0iALSYcektAYAc05oCc/8plQlMQulH4s5fIsYaZAoLJA0OyyJ3gqJ+FQ5zZrZDkKqcBGqxpl50IhpupdRv4Rq3pYqvpFqeBAZBBtprm8xcETxEY3FlO35v5kkcY4lQ67wsgqh36ck74xhXl6u0oRhs2Te+U81G+DWfYQ+4yeYwa42C2K185G1ck8AphuG9lS5Wmd2SZrh48KlAoYlWZwRGOYRkZHcERXcMZ0Rsg8Zbh+8nhhaoU4gLkHFYA/c11aAYA7B2ZJ8uZuRBZ9g98dshOJZJO5R6pNpAxle+lvgkprQH5OlGxg2/yn0McqwCqCC9IVg/gIVRwNeaN8unYjOT4jAzSBUclwl6Vol7B4lU4/gRAzg+KUwYkzuPG3xDly+vOKm4VTNM4LNG8PZal/AGALSXSgtXxMqQ5u+/u58Sxf+ulZKmY0tm8bN6ZHNiyU49SktVIHoM50wUgit7J3easZQpIesmAqIqedtxFlIMqWsOs6tjYDIRw4QA0vtCpCt8gSoiVA1NQj6ctZtPg5Kz+w4v+fAlF/Sb0swLS4dXZDahtN1xtfUptkBktQVEVbwMQ0AKGZCKG/+tIyARK2guJUB1ueA4o5Q2sFcHORpyCqCyVRuKm/+lkLQxdK0TlpWl4RVPwtUfM6Nfd0lQImd8lINYlJtpIMVXeHUTQTtnaOQ4Qq0hs6xdePgGleAwyLS7onk1tq4oq8td0Lu/SJVsBaq5VnplV7pNVXpJqPpmEzVVpgZRgJ14Zj1UZL1um/00ViMIQupPc/0ze95MarMPc9FOk5IfcKOywKDgREIswdYBYENWRWO0NI5+O8NxZh0pZUlJZFRv8bUYBt02VIIfuHt0qHGMFWyBkSieWaR3A/ZRi9I9wmMLQzS3Yuw+w7tKdzDqAfuteCg8ubdFtiCcY69GAC0kdsYBi3d9DNwKK2CQxLsiIKNE9pxYY5xUJNu8JxuMFeylKtueJDuYxkFo8POdyeEHUtmMVyj7WTtxyZj+9Fj5uQswoDgAg7WDUowsgMurx5MLui5DxU0UjU0kAG48gFJDx0w1UequuFBSVG6YiM58uflg1rJFuhy1uxy9gn4iAsAQVqE6EYV9sTIuBI51QJ63KllM1Ky2leTeNmh+0k1ipswNACCDx49tA4Q/gciaU4qIIyIRkf+TTnmTs9IVCeIgjUj75SjeAKjuApIPafkXy/g6ke0bl1ToVLSfw8FcVM4syjRc6h+i6J+K6Z+Ka4keIEg2kPMPKhIVi9FFApIYcXQ0GHI5jXzsofArlaN9UjJuy5urK7Ku6YT78j+FDaAzyjJOzfat5f1faRkbFPMiI/BPciTzUMojxTjDyr0uFDDOMZs/JtUwLf+fzrE7+jywSowCJciIpl4DpfyIZeTl4IBuKcpt0fTqKRWGKAYxUsgle6S5QaLXyiIDL3QFA14jyKNEMiB992pT9BqL9uqHpBqQC9VH9jVX9lp+BNptqX+nLFDlVpBuArqdRfAr9aBBrGBDV/ATVprrVqNx0Eoq0SgQED1EAT10ZvcoDowsDpMkMUDggMDSMkbkICDsoSDowPcYKPck4Q+jMsA2DOQuDD9eZN8wKRDj8z8iNpRhIdmG4nB/gnDQ+nBTgt6aiZAEgtwOzU4r5WAQ5+jrOHkowxqZMeFBTeyfRWUSiCMvQnBfWRAKalV68Ap8K5ISTgwsqtU8ThLFJvD/DcLqzFGHjr+DxsjiqGysJ3jiJMFgTwTYgoT9IAA6iIzfULdE6gLE0yaJIMbqR8WcIk8Y5SagDCaYw0Ebj41CxiR+JbvrgE+U5U9cpAWSiHi8mjeyEnmTh8+UPE9RVUpdB5Lk+Y5zO7eu0KiOFepcgboBwieJHO4XkFeCOaKh6CFvMXEhwybo6xM28+rMI8rK7ebFMJHXQZKSodG2aVY/Qgdq9qkidVc63Va60a+6ya/6daYQXaVa6Q46SCLa/a+/VgfJy1Yp0Gb66GQG6EUG0A+mq9aG4IGAxG5A9A+G3G5AyjPsL/Mlug3CK5zmFg453QKMc2gfOGVDTjjDcW4UaW6Q0jRQwIj/H/BvmAGgIzhKDQ6cIhclRg/CJAP9HeU594GlLwNpPzTbrzkxdBfsOkH/h2xYunhO7fkuIFJs1OCZeaDUr8qxP4HaxQHqep1JEAiB78l4PrKyrBTSOdP4Ahe7SDbRmUEFbOFm+omol585xvP1fmA2KTi7KgN26OehZBWLRhsxULucoGup7fQYXO2483e+6zPgozfeegXfrdDjLewqGw4Hh2xSve8rSRV7f8aCZrOaIcWHoO7lMO+o+UBl3iAtFo7+jo/TGVPsOZSgL+E5agM93XveeE85eaEsOwL97THgAY+UHO5RdRRfCKrSdfanuKRR0RWd+aJRc+6JMOJ2p2D2LgONIvGBeyBBdlNBSDIBbKtLegMsNpH2sCuNybeEGHOoG5aTOyEccroiGOW5SDJVS+GjwyJ8reck6gNSe5Vk+bLk//FeUO3450LDXuIUrhYE9IsEG8BN+UMzKJ+yJrDk15DVPkJ3hBZCAJR1MStw5DFI+Nw/gTdsd+E/vBMd27ty5V05/LW1MVD6Ew9tOYxl7lzbGlFiMzGbPk1HddyCMj19P9di/rrumlMotH+AtKHd63jjPwhE4w8LSFN98SBreLFFNBVhcLiu10zn6ryCNy7Pnr9N1sLN9YvN6MVqJOTm1OL/H5LdEkKiOoI7/SChr5M0lIw6ZukQCHDBdzRGrqfz9kAgLdMl+p/ec5zlx8DbDvW0vEPIIskS4UoLIhKy4FNTYiMzOINVBxWlQKE1GQiYmzhtjF5e+9+NEvlEAgwFNWjvJAjlUk7GkXWppY1jgQU6/0XMLYbyKt0hARgVEsYKrpXxq4nA6u41HSEklGAS8owi9DmuVBD5YN1CjGQUmIWMJNgMAsgHDBpnuCxhGB1hM6qgxTBj9ksv2FzNwhv6klMBqwDTv4AjDjts2XQIfGf3H5Ld6wptc7LwNzD8CfsLaIPCbHoAjtzgxA2YFCCoG9NjCV1HOmgDErUCiaYGejAUHXR6CqBBg+YGoWMHkCUyZgiwQYKsGTtos9AsQIwPCDMDWBrg9aiNn6R6EeMwJYIa43mh4AuATYetChhiFhAN8j6XvuNg4FRorCtgoQSIOQSdV7AzpNruNjOA+o/UEgWwXYR+iYMwM6g3zsUkyY9sC+WLdigT2hhNNHi2uCwntz2LLlOolAtQp4KMTeDf0CYTqNoX6SjCeyaQs4BeUvBD9EYI/OQVlzySKCCwU4Vhu7RT6X9fot0DPtwAAYmdnqZnEBpZzs4QN4GtncBnAyja1DU2wQFwHcNLBLDvOAg/eOjkhr4NAuhDOGiWy0HE5y2H8VTpF0qKG9DggCX8MKABL6Exs8SFissQMhlBV+6KTqsgGSA94VsE8TcLVzBTlgAYow3vLiKLBVhfeCaREOWTNAdRASvGGEWyEcDe90Rm2JZFiMIE4j/oRYcEkYwpLFIyeA0IPtUnuGo16k09HXLSlyHW4r6kzSJkLTCEI8H2KtcRk3zuCPBVmbTKEjcnhIHhxw2fKSMD36JKIOBWwAAFKCBloU7P2kSU5RLhmWKLBRrAExQPoW60kXATHwGjSD0u6YKblSOBKwj9ufADwB8CyA81gUBID4NgCIDbIpK9eA6CIlQQlD66dghkXzUJGS4jw+IOYQSLI7Yj7hZYNkX7VFJJ1+6gBOTvy0gCJc/hvyAzv60gGqodSaBHVnAM9IIC3W5pb+mayU4dUByWACQd5gCSXpfwGY5ka3lZHph9Bn0CePtmCH9DJ2YGXYiLjIQMCdqmAIIbBQcDqpUhh/aHH5GWrD4dQClaIEMKWpCZwgGARwKtVax5h8AmbYfo6CHxFRcRcGU7OGiXCRpqgbKW0PgA0DQB6w1harMRgHFAchx2YwkaOJnETjGAFA9wQ4LHGzj/0C494CxGXHjxxhHGcIVNgMLgg8q64+8jPC3FHihMI+PcV4MPE7UTxo+CjAQCvELCbxgTe8VtEfERoXx90MRDME/Hfinm9hcztGwEBWd7OFwmNmcOuEJtbhabHEbUMEFEF8h3Y3Xs8XOCvixE0IfsRiLZEgTxxH4ScZBIGHQSEkc4uCcYSXGxpRm0gTCd+g3E4TDqeEncaPmnL7jIYWoH5MtTZHjifAdWNAAfBVL4BqAx408eRMvHSDMRFAW8RwX+gPjjqL0KrLJKYnYx/MBI5SVBNAlqTwJbg3/kbU0kzjtJsEvwYuJYEGTEpIQwdDKK9HoSjJa4kydhIOrbj3gBE6yURPsl4THJpoZyY4lclcAMUnkvKN5JCwUS/J1Eu8cFLomhSR0PA44ZmjDZXD42swLibG3OE3D5BLnESUBLEnBEJJnYgoc6hdLSSIwckqKd6nrr3p/UXAJ8ZkPEKnZKsBGCKW+IUkhk8Qe0gNHjAYlZDcMp0qxqBSZbsBEwUYZCWNngzjx6Me8KHBoE+mQIBg46VnImLyS5jGavIIaTphGmnCxpNnfifDPgbCTsxjw2sItKcwHCW4bcH2KxCAT+wCABDYFCXUEzlxYaBRYAXXCThqBG4acbGRnHo7qBlCj6CmrUitzrVrozcVuAzIYAAB2eILQAABMBEeIIxCFkEQCIAAVgEB8zKI+5diKREFlzh6I9ESoPEHogkApZgs+iLzLQBEQuZBgBmTuXoiJBBZ/MgiKoEllzg9ZtABgILIEC8zeZRENAILIYCMRGIDAOcGuBIg7lJZDAEiLbLnC0BJZBshmSRDQB7kBZJAXWerOVnZAI5rswWYLKIg7kCIqcvcoxCIgEQSIJAUiNnL9n0RQ53sKkCRF5kCAdytAHcgwHVnsRGIEsk2S7OKh7kU5jEMufHyIhERJZ6swWfEBIitB04xcgOZbJIgqyCIvMuuSRAEAkRJZBEQWTuUSC5yPZvs9grzJ3IkAp58QY2fRE7lFyIAkADuQIHdkkBjZQTb4K3NoDWydy48wWYxC9n0QLZos7OTLNbn0QGARERIAbMNnFzJelOVyCzJIBoRJydAIdPQH0BAA=== -->

<!-- internal state end -->
<!-- tips_start -->

---

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

<details>
<summary>❤️ Share</summary>

- [X](https://twitter.com/intent/tweet?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A&url=https%3A//coderabbit.ai)
- [Mastodon](https://mastodon.social/share?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A%20https%3A%2F%2Fcoderabbit.ai)
- [Reddit](https://www.reddit.com/submit?title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&text=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code.%20Check%20it%20out%3A%20https%3A//coderabbit.ai)
- [LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fcoderabbit.ai&mini=true&title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&summary=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code)

</details>

<details>
<summary>🪧 Tips</summary>

### Chat

There are 3 ways to chat with [CodeRabbit](https://coderabbit.ai?utm_source=oss&utm_medium=github&utm_campaign=y-scope/clp&utm_content=868):

- Review comments: Directly reply to a review comment made by CodeRabbit. Example:
  - `I pushed a fix in commit <commit_id>, please review it.`
  - `Explain this complex logic.`
  - `Open a follow-up GitHub issue for this discussion.`
- Files and specific lines of code (under the "Files changed" tab): Tag `@coderabbitai` in a new review comment at the desired location with your query. Examples:
  - `@coderabbitai explain this code block.`
  -	`@coderabbitai modularize this function.`
- PR comments: Tag `@coderabbitai` in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
  - `@coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.`
  - `@coderabbitai read src/utils.ts and explain its main purpose.`
  - `@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.`
  - `@coderabbitai help me debug CodeRabbit configuration file.`

### Support

Need help? Create a ticket on our [support page](https://www.coderabbit.ai/contact-us/support) for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

### CodeRabbit Commands (Invoked using PR comments)

- `@coderabbitai pause` to pause the reviews on a PR.
- `@coderabbitai resume` to resume the paused reviews.
- `@coderabbitai review` to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
- `@coderabbitai full review` to do a full review from scratch and review all the files again.
- `@coderabbitai summary` to regenerate the summary of the PR.
- `@coderabbitai generate docstrings` to [generate docstrings](https://docs.coderabbit.ai/finishing-touches/docstrings) for this PR.
- `@coderabbitai generate sequence diagram` to generate a sequence diagram of the changes in this PR.
- `@coderabbitai resolve` resolve all the CodeRabbit review comments.
- `@coderabbitai configuration` to show the current CodeRabbit configuration for the repository.
- `@coderabbitai help` to get help.

### Other keywords and placeholders

- Add `@coderabbitai ignore` anywhere in the PR description to prevent this PR from being reviewed.
- Add `@coderabbitai summary` to generate the high-level summary at a specific location in the PR description.
- Add `@coderabbitai` anywhere in the PR title to generate the title automatically.

### CodeRabbit Configuration File (`.coderabbit.yaml`)

- You can programmatically configure CodeRabbit by adding a `.coderabbit.yaml` file to the root of your repository.
- Please see the [configuration documentation](https://docs.coderabbit.ai/guides/configure-coderabbit) for more information.
- If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: `# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json`

### Documentation and Community

- Visit our [Documentation](https://docs.coderabbit.ai) for detailed information on how to use CodeRabbit.
- Join our [Discord Community](http://discord.gg/coderabbit) to get help, request features, and share feedback.
- Follow us on [X/Twitter](https://twitter.com/coderabbitai) for updates and announcements.

</details>

<!-- tips_end -->

@Bill-hbrhbr Bill-hbrhbr changed the title feat(clp-package): Implement dataset functionality for clp-s. feat(clp-package): Use dataset-specific tables and archive paths for compression, decompression and search when using CLP_S. May 3, 2025
@Bill-hbrhbr Bill-hbrhbr marked this pull request as ready for review May 3, 2025 20:37
@Bill-hbrhbr Bill-hbrhbr requested a review from a team as a code owner May 3, 2025 20:37
@Bill-hbrhbr Bill-hbrhbr requested a review from wraymo May 3, 2025 20:37
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 71f1b38 and c7fd28f.

📒 Files selected for processing (9)
  • components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (11 hunks)
  • components/clp-package-utils/clp_package_utils/scripts/native/decompress.py (4 hunks)
  • components/clp-package-utils/clp_package_utils/scripts/start_clp.py (3 hunks)
  • components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py (4 hunks)
  • components/clp-py-utils/clp_py_utils/initialize-clp-metadata-db.py (1 hunks)
  • components/job-orchestration/job_orchestration/executor/compress/compression_task.py (2 hunks)
  • components/job-orchestration/job_orchestration/executor/query/fs_search_task.py (1 hunks)
  • components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (4 hunks)
  • components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py (6 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (6)
components/clp-package-utils/clp_package_utils/scripts/start_clp.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
  • StorageEngine (55-57)
components/clp-package-utils/clp_package_utils/scripts/native/decompress.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (3)
  • CLPConfig (608-794)
  • Database (88-167)
  • StorageEngine (55-57)
components/job-orchestration/job_orchestration/executor/compress/compression_task.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (2)
  • StorageEngine (55-57)
  • StorageType (60-62)
components/job-orchestration/job_orchestration/executor/query/fs_search_task.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (3)
  • get_directory (541-542)
  • get_directory (563-564)
  • StorageType (60-62)
components/clp-py-utils/clp_py_utils/initialize-clp-metadata-db.py (1)
components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py (1)
  • create_metadata_db_tables (133-154)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (2)
  • Database (88-167)
  • StorageEngine (55-57)
🔇 Additional comments (26)
components/clp-package-utils/clp_package_utils/scripts/start_clp.py (2)

19-19: Added necessary imports for dataset-specific functionality

The addition of CLP_DEFAULT_DATASET_NAME and StorageEngine imports supports the new dataset-specific table prefixing logic.

Also applies to: 35-35


869-870: Correctly implements dataset-specific table prefixes for CLP_S

The conditional logic appropriately modifies the table_prefix by appending the default dataset name when using the CLP_S storage engine. This ensures all table references in the Meteor settings are properly namespaced per the PR objectives.

components/job-orchestration/job_orchestration/executor/query/fs_search_task.py (2)

57-58: Correctly implements dataset-specific archive paths

The code now extracts the dataset from the search configuration and appends it to the archives directory path when constructing the search command. This ensures search operations use the correct dataset-specific archive location.


66-66: Properly modifies S3 key prefix for dataset isolation

The S3 key prefix is modified to include the dataset name, ensuring physical isolation of archives in S3 storage. This change maintains consistency with the filesystem-based approach and aligns with the PR objectives for dataset isolation.

components/clp-py-utils/clp_py_utils/initialize-clp-metadata-db.py (1)

57-59: Correctly implements mutually exclusive table creation logic

The conditional logic ensures that for the CLP_S storage engine, only the create_datasets_table function is called, while for other storage engines, only the create_metadata_db_tables function is called. This approach appropriately sets up the database structure based on the storage engine type.

components/clp-package-utils/clp_package_utils/scripts/native/decompress.py (4)

13-13: Added necessary imports for dataset-specific functionality

The imports of CLP_DEFAULT_DATASET_NAME and StorageEngine support the new dataset-specific table prefixing logic needed for decompress operations.

Also applies to: 17-17


44-46: Updated function signature to support dataset-specific tables

The get_orig_file_id function signature now includes storage_engine and dataset parameters, allowing it to work with dataset-specific table names. This change is necessary to support the dataset isolation features described in the PR objectives.


59-61: Correctly implements dataset-specific table prefixes

The conditional logic properly modifies the table_prefix for the CLP_S storage engine by appending the dataset name, ensuring queries target the correct dataset-specific tables.


139-144: Updated function call with required dataset parameters

The call to get_orig_file_id has been updated to include the storage engine and the default dataset name, ensuring that file lookups work correctly with the updated function signature. This maintains compatibility with the dataset-aware architecture.

components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py (3)

3-3: Appropriate import added for Path object.

The addition of the Path import from pathlib is appropriate as it's used for type hinting in the newly added insert_new_datasets_table_entry function.


99-99: Corrected docstring to use plural form.

Minor improvement in docstring accuracy, changing "dataset information table" to "datasets information table" to better reflect the table's purpose.


143-143: Appropriate relocation of column metadata table creation.

Moving the _create_column_metadata_table call inside the dataset conditional block ensures dataset-specific column metadata tables are only created when a dataset is specified, which aligns with the dataset isolation approach.

components/job-orchestration/job_orchestration/executor/compress/compression_task.py (2)

283-292: Good implementation of dataset-specific handling for CLP_S storage engine.

This addition properly modifies both the table prefix and archive paths based on the dataset name when using the CLP_S storage engine. The code follows a consistent pattern:

  1. For database operations: appends the dataset name to the table prefix
  2. For filesystem operations: places dataset archives in dataset-specific subdirectories
  3. For S3 operations: adds dataset name to the key prefix path

The implementation maintains proper isolation between datasets, which aligns with the PR objective.


396-396: Correctly using dynamic dataset name for indexer.

Replacing the hardcoded CLP_DEFAULT_DATASET_NAME with the dynamically determined input_dataset ensures the indexer correctly associates archives with their specific datasets.

components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (7)

13-13: Appropriate imports added to support dataset-aware functionality.

The addition of CLP_DEFAULT_DATASET_NAME and StorageEngine imports enables the archive manager to incorporate dataset-specific logic.

Also applies to: 16-17


187-187: Correctly fetching storage engine from configuration.

Storage engine detection is properly implemented by extracting it from the configuration object.


198-200: Consistently passing dataset parameters to archive functions.

The code now correctly passes the storage engine and default dataset name to the archive management functions. This maintains consistency with the dataset-specific approach implemented throughout the codebase.

Also applies to: 210-212, 222-224


238-239: Properly updated function signature to support dataset-aware logic.

The _find_archives function signature now includes storage_engine and dataset parameters, making the function capable of handling dataset-specific archives.


262-264: Correct implementation of dataset-specific table prefix.

The conditional modification of the table prefix when using the CLP_S storage engine follows the established pattern and ensures archives are queried from the correct dataset-specific tables.


305-306: Properly updated function signature to support dataset-aware logic.

The _delete_archives function signature now includes storage_engine and dataset parameters, making the function capable of handling dataset-specific archive deletion.


330-332: Correct implementation of dataset-specific table prefix for deletion.

The conditional modification of the table prefix for deletion operations when using the CLP_S storage engine ensures that archives are deleted from the correct dataset-specific tables.

components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py (5)

40-40: Appropriate import added for storage engine enum.

The addition of StorageEngine import enables dataset-aware functionality in the query scheduler.


618-618: Properly updated function signature to support dataset-aware logic.

The handle_pending_query_jobs function signature now includes the clp_storage_engine parameter, making the function capable of handling dataset-specific queries.


638-644: Good implementation of dataset-specific table prefix logic.

The code correctly extracts the dataset from the search configuration and modifies the table prefix to include the dataset name when using the CLP_S storage engine. This ensures that queries target the correct dataset-specific tables.


1057-1057: Correctly propagating storage engine through the function call chain.

The clp_storage_engine parameter is properly propagated through the function call chain, ensuring dataset-aware behavior is consistently applied throughout the query processing workflow.

Also applies to: 1072-1072


1157-1157: Properly passing storage engine from configuration.

The code correctly passes the storage engine from the configuration to the job handling function, ensuring dataset-aware behavior is implemented based on the system's configuration.

Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestion.

For the PR title, how about:

feat(clp-json): Use dataset-specific tables and archive directories for compression, decompression, and search.

Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
@Bill-hbrhbr Bill-hbrhbr changed the title feat(clp-package): Use dataset-specific tables and archive paths for compression, decompression and search when using CLP_S. feat(clp-json): Use dataset-specific tables and archive directories for compression, decompression, and search. Jun 19, 2025
@Bill-hbrhbr Bill-hbrhbr merged commit bcb7f54 into y-scope:main Jun 19, 2025
8 checks passed
@Bill-hbrhbr Bill-hbrhbr deleted the enable-dataset branch June 19, 2025 18:38
@Bill-hbrhbr Bill-hbrhbr restored the enable-dataset branch June 19, 2025 18:38
@Bill-hbrhbr Bill-hbrhbr deleted the enable-dataset branch June 19, 2025 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants