-
Notifications
You must be signed in to change notification settings - Fork 81
refactor(clp-py-utils): Move CLP metadata DB table-creation logic into utility module; Conditionally create the datasets table when using the CLP_S
storage engine.
#831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis update introduces a new module containing utility functions for creating metadata tables in a MariaDB or MySQL database. Several string constants representing table suffixes are added to the configuration file. The table creation logic in an initialization script is refactored to use a new consolidated function from the utility module, replacing multiple explicit SQL statements with a single function call. The new module defines individual functions for creating each table and a coordinating function to create all required tables with optional dataset-specific behaviour. Additionally, command-line argument handling is enhanced in related scripts to include a required Changes
Sequence Diagram(s)sequenceDiagram
participant StartScript as start_clp.py
participant CreateScript as create-db-tables.py
participant InitScript as initialize-clp-metadata-db.py
participant Utils as clp_metadata_db_utils.py
participant DB as Database
StartScript->>CreateScript: Run with --storage-engine argument
CreateScript->>InitScript: Run with --storage-engine argument
InitScript->>Utils: create_metadata_db_tables(metadata_db_cursor, table_prefix, dataset?)
Utils->>DB: CREATE TABLE IF NOT EXISTS datasets
Utils->>DB: CREATE TABLE IF NOT EXISTS archives
Utils->>DB: CREATE TABLE IF NOT EXISTS tags
Utils->>DB: CREATE TABLE IF NOT EXISTS archive_tags
Utils->>DB: CREATE TABLE IF NOT EXISTS files
alt If dataset specified (e.g., for CLP_S storage engine)
Utils->>DB: CREATE TABLE IF NOT EXISTS column_metadata
end
InitScript->>DB: Commit transaction
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
--dataset
option to compression scripts and create dataset-specific tables for clp-s
compression jobs.
--dataset
option to compression scripts and create dataset-specific tables for clp-s
compression jobs.--dataset
option to compression scripts and create dataset-specific tables for clp-s
compression jobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
components/clp-package-utils/clp_package_utils/scripts/compress.py
(2 hunks)components/clp-package-utils/clp_package_utils/scripts/native/compress.py
(3 hunks)components/clp-py-utils/clp_py_utils/clp_config.py
(1 hunks)components/clp-py-utils/clp_py_utils/initialize-clp-metadata-db.py
(3 hunks)components/clp-py-utils/clp_py_utils/sql_table_schema_utils.py
(1 hunks)components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
(5 hunks)components/job-orchestration/job_orchestration/scheduler/job_config.py
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
components/clp-py-utils/clp_py_utils/initialize-clp-metadata-db.py (3)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
Database
(76-155)components/clp-py-utils/clp_py_utils/core.py (1)
read_yaml_config_file
(38-44)components/clp-py-utils/clp_py_utils/sql_table_schema_utils.py (2)
create_archives_table
(1-18)create_column_metadata_table
(21-30)
🔇 Additional comments (17)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
40-41
: Constants for dataset-related table suffixes look goodThe addition of these constants follows the established pattern for table naming and will provide consistency when constructing dataset-specific table names throughout the codebase.
components/job-orchestration/job_orchestration/scheduler/job_config.py (2)
28-28
: Dataset field added with sensible defaultThe addition of the dataset field with a default value of "default" maintains backward compatibility while enabling the new dataset feature.
34-34
: Dataset field added with sensible defaultThe addition of the dataset field with a default value of "default" maintains backward compatibility while enabling the new dataset feature.
components/clp-py-utils/clp_py_utils/initialize-clp-metadata-db.py (4)
9-16
: Updated imports for new constants and utilitiesThe import statements have been updated to include the new constants and utility functions, which is appropriate.
44-44
: Good refactoring to use the utility functionReplacing inline SQL with a call to the utility function improves maintainability and ensures consistency in table schemas.
90-93
: Simplified primary key for datasets tableThe primary key for the datasets table has been simplified to use only the
name
column, which is appropriate since dataset names should be unique.
97-99
: Good refactoring to use the utility functionUsing the
create_column_metadata_table
utility function instead of inline SQL improves maintainability and ensures consistency.components/clp-py-utils/clp_py_utils/sql_table_schema_utils.py (2)
1-18
: Well-structured utility function for archives table creationThis utility function encapsulates the SQL for creating an archives table, making it reusable across the codebase. The table structure includes all necessary fields and proper indexing.
21-30
: Well-structured utility function for column metadata table creationThis utility function encapsulates the SQL for creating a column metadata table, which will be reused when creating dataset-specific metadata tables. The composite primary key on
name
andtype
is appropriate.components/clp-package-utils/clp_package_utils/scripts/compress.py (2)
85-87
: Dataset flag handling looks good!The conditional check ensures the
--dataset
flag is only added when the dataset argument has a value, which is a good practice.
151-153
: Implementation of the--dataset
parameter looks good!The default value "default" is appropriate for maintaining backward compatibility, and the help text clearly communicates the purpose of this parameter.
components/clp-package-utils/clp_package_utils/scripts/native/compress.py (3)
143-143
: Dataset parameter inclusion in FsInputConfig looks good!The parameter is properly passed to the input configuration object, enabling dataset-specific compression for filesystem inputs.
156-156
: Dataset parameter inclusion in S3InputConfig looks good!The parameter is properly passed to the input configuration object, enabling dataset-specific compression for S3 inputs.
208-210
: Implementation of the--dataset
parameter looks good!The default value "default" matches the implementation in compress.py, maintaining consistency across the codebase.
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (3)
156-158
: Function parameter update looks good!The
search_and_schedule_new_tasks
function is properly updated to accept the storage engine parameter, which is needed for the dataset-specific logic.
411-412
: Storage engine extraction looks good!The storage engine is properly extracted from the configuration to be passed to the task scheduling function.
418-418
: Parameter passing looks good!The storage engine is correctly passed to the
search_and_schedule_new_tasks
function.
dataset = input_config.dataset | ||
if StorageEngine.CLP_S == clp_storage_engine and dataset not in seen_datasets: | ||
seen_datasets.add(dataset) | ||
query = f'INSERT INTO {table_prefix}{DATASETS_TABLE_SUFFIX} (name) VALUES ("{dataset}")' | ||
db_cursor.execute(query) | ||
create_archives_table(db_cursor, f"{table_prefix}{dataset}_{ARCHIVES_TABLE_SUFFIX}") | ||
create_column_metadata_table( | ||
db_cursor, f"{table_prefix}{dataset}_{COLUMN_METADATA_TABLE_SUFFIX}" | ||
) | ||
db_conn.commit() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix SQL injection vulnerability and add error handling
There are several issues with the dataset registration implementation:
- SQL injection vulnerability: The dataset name is directly inserted into the query without parameter binding
- Missing error handling for table creation failures
- No validation of dataset names (length, special characters, etc.)
Implement parameter binding and add error handling:
dataset = input_config.dataset
if StorageEngine.CLP_S == clp_storage_engine and dataset not in seen_datasets:
seen_datasets.add(dataset)
- query = f'INSERT INTO {table_prefix}{DATASETS_TABLE_SUFFIX} (name) VALUES ("{dataset}")'
- db_cursor.execute(query)
- create_archives_table(db_cursor, f"{table_prefix}{dataset}_{ARCHIVES_TABLE_SUFFIX}")
- create_column_metadata_table(
- db_cursor, f"{table_prefix}{dataset}_{COLUMN_METADATA_TABLE_SUFFIX}"
- )
- db_conn.commit()
+ try:
+ # Validate dataset name (add appropriate validation logic)
+ if not re.match(r'^[a-zA-Z0-9_-]+$', dataset):
+ logger.error(f"Invalid dataset name: {dataset}. Using 'default' instead.")
+ dataset = "default"
+
+ # Use parameter binding to prevent SQL injection
+ query = f"INSERT INTO {table_prefix}{DATASETS_TABLE_SUFFIX} (name) VALUES (%s)"
+ db_cursor.execute(query, (dataset,))
+
+ # Create dataset-specific tables
+ create_archives_table(db_cursor, f"{table_prefix}{dataset}_{ARCHIVES_TABLE_SUFFIX}")
+ create_column_metadata_table(
+ db_cursor, f"{table_prefix}{dataset}_{COLUMN_METADATA_TABLE_SUFFIX}"
+ )
+ db_conn.commit()
+ logger.info(f"Created tables for new dataset: {dataset}")
+ except Exception as ex:
+ logger.error(f"Failed to create tables for dataset {dataset}: {ex}")
+ db_conn.rollback()
Don't forget to add the required import at the top of the file:
import re
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a todo for checking dataset
name is valid for sql tables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Bill-hbrhbr Thank you for adding the TODO to check that dataset names are valid for SQL tables. That's a good step toward addressing the issue.
While the TODO will remind you to implement validation in the future, you might also want to consider addressing the SQL injection vulnerability sooner rather than later by using parameterized queries. This is a security best practice that prevents attackers from executing malicious SQL commands through user input.
A simple fix would be to change:
query = f'INSERT INTO {table_prefix}{DATASETS_TABLE_SUFFIX} (name) VALUES ("{dataset}")'
db_cursor.execute(query)
To:
query = f"INSERT INTO {table_prefix}{DATASETS_TABLE_SUFFIX} (name) VALUES (%s)"
db_cursor.execute(query, (dataset,))
Would you like me to show a more complete implementation that includes both validation and parameterized queries?
✏️ Learnings added
Learnt from: Bill-hbrhbr
PR: y-scope/clp#831
File: components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py:0-0
Timestamp: 2025-04-17T16:55:23.759Z
Learning: In the CLP project, SQL queries should use parameterized queries with placeholders (%s) and pass values as a tuple to `db_cursor.execute()` to prevent SQL injection, rather than directly interpolating values into the query string.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
|
||
scheduled_jobs = {} | ||
|
||
seen_datasets = set() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider using database checks instead of in-memory tracking
The seen_datasets
global set will be lost if the process restarts, potentially causing redundant table creation attempts for datasets that were already processed.
Consider querying the database to check if a dataset exists instead of relying on in-memory tracking:
-seen_datasets = set()
Then modify the dataset check logic to query the database instead:
-if StorageEngine.CLP_S == clp_storage_engine and dataset not in seen_datasets:
- seen_datasets.add(dataset)
+if StorageEngine.CLP_S == clp_storage_engine:
+ # Check if dataset exists in the database
+ query = f"SELECT name FROM {table_prefix}{DATASETS_TABLE_SUFFIX} WHERE name = %s"
+ db_cursor.execute(query, (dataset,))
+ if not db_cursor.fetchone():
Committable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've addressed the issue by updating in memory cache upon startup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Bill-hbrhbr That's a great approach! Initializing the cache at startup gives you the performance benefits of in-memory checks while ensuring you don't lose track of datasets between restarts. Thanks for explaining how you addressed it.
🐰👍
✏️ Learnings added
Learnt from: Bill-hbrhbr
PR: y-scope/clp#831
File: components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py:0-0
Timestamp: 2025-04-17T16:55:06.632Z
Learning: In the compression scheduler, the team prefers initializing in-memory caches from the database at startup rather than performing repeated database queries for efficiency reasons. This approach maintains both performance and reliability across process restarts.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
…ocal cache generation.
Why are we exposing the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
components/clp-py-utils/clp_py_utils/clp_config.py
(1 hunks)components/clp-py-utils/clp_py_utils/initialize-clp-metadata-db.py
(2 hunks)components/clp-py-utils/clp_py_utils/sql_table_schema_utils.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- components/clp-py-utils/clp_py_utils/clp_config.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
components/clp-py-utils/clp_py_utils/initialize-clp-metadata-db.py (3)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
Database
(80-159)components/clp-py-utils/clp_py_utils/core.py (1)
read_yaml_config_file
(38-44)components/clp-py-utils/clp_py_utils/sql_table_schema_utils.py (4)
create_archives_table
(1-18)create_tags_table
(21-31)create_archive_tags_table
(34-45)create_files_table
(48-66)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: lint-check (ubuntu-latest)
- GitHub Check: lint-check (macos-latest)
- GitHub Check: build (macos-latest)
components/clp-py-utils/clp_py_utils/initialize-clp-metadata-db.py
Outdated
Show resolved
Hide resolved
fc6c126
to
45258ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
components/clp-py-utils/clp_py_utils/sql_table_schema_utils.py (1)
105-123
: 🛠️ Refactor suggestion
create_metadata_db_tables
never records the dataset inclp_datasets
The PR description emphasises the new
clp_datasets
registry table, yet this helper never invokescreate_datasets_table
. Consequently, compression jobs that rely solely on this helper will create dataset‑specific tables without inserting the dataset into the global registry, leaving the schema in an inconsistent state.Consider:
@@ db_cursor, table_prefix: str, dataset: typing.Optional[str] = None ) -> None: if dataset is not None: table_prefix = f"{table_prefix}{dataset}_" create_column_metadata_table(db_cursor, f"{table_prefix}{COLUMN_METADATA_TABLE_SUFFIX}") + # Ensure the dataset itself is recorded + create_datasets_table(db_cursor, f"{table_prefix}{DATASETS_TABLE_SUFFIX}")(N.B. you will also need to import
DATASETS_TABLE_SUFFIX
alongside the earlier import fix.)This change keeps the registry in sync with the per‑dataset tables.
♻️ Duplicate comments (1)
components/clp-py-utils/clp_py_utils/sql_table_schema_utils.py (1)
60-62
:⚠️ Potential issueStill using the wrong cursor variable – will raise
NameError
The earlier review already pointed this out, but the fix has not been applied.
metadata_db_cursor
is undefined; use the function argumentdb_cursor
.- metadata_db_cursor.execute( + db_cursor.execute(
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
components/clp-py-utils/clp_py_utils/initialize-clp-metadata-db.py
(2 hunks)components/clp-py-utils/clp_py_utils/sql_table_schema_utils.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- components/clp-py-utils/clp_py_utils/initialize-clp-metadata-db.py
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: lint-check (macos-latest)
- GitHub Check: lint-check (ubuntu-latest)
clp-s
.
components/clp-py-utils/clp_py_utils/initialize-clp-metadata-db.py
Outdated
Show resolved
Hide resolved
components/clp-py-utils/clp_py_utils/initialize-clp-metadata-db.py
Outdated
Show resolved
Hide resolved
components/clp-py-utils/clp_py_utils/initialize-clp-metadata-db.py
Outdated
Show resolved
Hide resolved
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more suggestions.
components/clp-py-utils/clp_py_utils/initialize-clp-metadata-db.py
Outdated
Show resolved
Hide resolved
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py (1)
114-143
: 🛠️ Refactor suggestionConsider adding error handling to the table creation process
The function is well-documented and implements the functionality as described in the PR objectives. However, there's no error handling for database operations which could lead to incomplete table creation if an operation fails.
Consider adding try-except blocks to handle potential database errors:
def create_metadata_db_tables(db_cursor, table_prefix: str, dataset: str | None = None) -> None: """ Creates the standard set of tables for CLP's metadata. :param db_cursor: The database cursor to execute the table creations. :param table_prefix: A string to prepend to all table names. :param dataset: If set, all tables will be named in a dataset-specific manner. """ if dataset is not None: table_prefix = f"{table_prefix}{dataset}_" archives_table_name = f"{table_prefix}{ARCHIVES_TABLE_SUFFIX}" tags_table_name = f"{table_prefix}{TAGS_TABLE_SUFFIX}" archive_tags_table_name = f"{table_prefix}{ARCHIVE_TAGS_TABLE_SUFFIX}" # TODO: Update this to # {table_prefix}{CLP_DEFAULT_DATASET_NAME}_{COLUMN_METADATA_TABLE_SUFFIX} when we can also # change the indexer to match. column_metadata_table_name = ( f"{table_prefix}{COLUMN_METADATA_TABLE_SUFFIX}_{CLP_DEFAULT_DATASET_NAME}" ) + try: _create_archives_table(db_cursor, archives_table_name) _create_tags_table(db_cursor, tags_table_name) _create_archive_tags_table( db_cursor, archive_tags_table_name, archives_table_name, tags_table_name ) _create_files_table(db_cursor, table_prefix) _create_column_metadata_table(db_cursor, column_metadata_table_name) + except Exception as e: + # Log the error or handle it appropriately + raise Exception(f"Error creating metadata tables: {str(e)}") from e
🧹 Nitpick comments (5)
components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py (5)
14-32
: Add docstring to helper functionThe
_create_archives_table
function lacks documentation. Consider adding a docstring to explain its purpose, parameters, and return value.def _create_archives_table(db_cursor, archives_table_name: str) -> None: + """ + Creates a table for storing archive metadata. + + :param db_cursor: The database cursor to execute the table creation. + :param archives_table_name: The name of the table to create. + """ db_cursor.execute( f"""
34-45
: Add docstring to helper functionThe
_create_tags_table
function lacks documentation. Consider adding a docstring to explain its purpose, parameters, and return value.def _create_tags_table(db_cursor, tags_table_name: str) -> None: + """ + Creates a table for storing tag information. + + :param db_cursor: The database cursor to execute the table creation. + :param tags_table_name: The name of the table to create. + """ db_cursor.execute( f"""
47-61
: Add docstring to helper functionThe
_create_archive_tags_table
function lacks documentation. Consider adding a docstring to explain its purpose, parameters, and return value.def _create_archive_tags_table( db_cursor, archive_tags_table_name: str, archives_table_name: str, tags_table_name: str ) -> None: + """ + Creates a table for storing relationships between archives and tags. + + :param db_cursor: The database cursor to execute the table creation. + :param archive_tags_table_name: The name of the relation table to create. + :param archives_table_name: The name of the archives table to reference. + :param tags_table_name: The name of the tags table to reference. + """ db_cursor.execute( f"""
63-82
: Add docstring and standardize parametersThe
_create_files_table
function lacks documentation and usestable_prefix
instead of a full table name like the other helper functions, which may cause confusion.def _create_files_table(db_cursor, table_prefix: str) -> None: + """ + Creates a table for storing file metadata. + + :param db_cursor: The database cursor to execute the table creation. + :param table_prefix: A string to prepend to the files table suffix. + """ db_cursor.execute( f"""For consistency with other helper methods, consider refactoring to accept the full table name:
-def _create_files_table(db_cursor, table_prefix: str) -> None: +def _create_files_table(db_cursor, files_table_name: str) -> None: """ Creates a table for storing file metadata. :param db_cursor: The database cursor to execute the table creation. - :param table_prefix: A string to prepend to the files table suffix. + :param files_table_name: The name of the table to create. """ db_cursor.execute( f""" - CREATE TABLE IF NOT EXISTS `{table_prefix}{FILES_TABLE_SUFFIX}` ( + CREATE TABLE IF NOT EXISTS `{files_table_name}` (Then update the call in
create_metadata_db_tables
:- _create_files_table(db_cursor, table_prefix) + files_table_name = f"{table_prefix}{FILES_TABLE_SUFFIX}" + _create_files_table(db_cursor, files_table_name)
84-94
: Add docstring to helper functionThe
_create_column_metadata_table
function lacks documentation. Consider adding a docstring to explain its purpose, parameters, and return value.def _create_column_metadata_table(db_cursor, table_name: str) -> None: + """ + Creates a table for storing column metadata. + + :param db_cursor: The database cursor to execute the table creation. + :param table_name: The name of the table to create. + """ db_cursor.execute( f"""
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py
(1 hunks)components/clp-py-utils/clp_py_utils/initialize-clp-metadata-db.py
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- components/clp-py-utils/clp_py_utils/initialize-clp-metadata-db.py
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: lint-check (ubuntu-latest)
- GitHub Check: lint-check (macos-latest)
🔇 Additional comments (3)
components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py (3)
1-11
: Excellent use of future imports!The inclusion of
from __future__ import annotations
is perfect for supporting the Python 3.8 compatibility with the|
syntax in type hints. This is aligned with a past review comment and ensures backward compatibility.
96-112
: Well-documented function implementationThe
create_datasets_table
function is well-documented and properly implements the functionality described in the PR objectives for creating the datasets table.
129-134
: Ensure the TODO comment is trackedThe TODO comment indicates future work needed on the column metadata table naming. Make sure this is tracked in your issue tracking system to avoid it being forgotten.
clp-s
.CLP_S
storage engine.
…o utility module; Conditionally create the datasets table when using the `CLP_S` storage engine. (y-scope#831) Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Description
clp_datasets
table.dataset
specification that defaults toNone
:clp_{dataset}_archives
clp_{dataset}_tags
clp_{dataset}_archive_tags
clp_{dataset}_files
dataset
is specified, also createclp_{dataset}_column_metadata
clp_archives
,clp_tags
,clp_archive_tags
,clp_files
--storage-engine
flag fromstart-clp.py
all the way toinitialize-clp-metadata-db.py
and createclp_datasets
if the storage engine isclp-s
This PR succeeds #819, #839 and covers the following parts of the dataset feature implementation plan:
clp_datasets
table.dataset
Checklist
breaking change.
Validation performed
CLP_S
, theclp_datasets
table is also created.For

CLP_S
:Summary by CodeRabbit
Summary by CodeRabbit