-
Notifications
You must be signed in to change notification settings - Fork 78
Add interface for NoSQL storage #214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis pull request encompasses widespread refactoring and feature additions across the project. Major updates include enhancements to benchmark configurations (e.g., new “modules” keys and updated function signatures with an added Changes
Sequence Diagram(s)sequenceDiagram
participant User as User/CLI
participant SeBS as SeBS Core
participant SysRes as SystemResources
participant Storage as Persistent Storage
participant NoSQL as NoSQL Storage
User->>SeBS: Provide storage configuration(s) via CLI
SeBS->>SysRes: Load and merge multiple storage configs
SysRes->>Storage: get_storage(replace_existing)
Storage-->>SysRes: Return storage instance
SysRes->>NoSQL: get_nosql_storage()
NoSQL-->>SysRes: Return NoSQL storage instance
SysRes->>SeBS: Supply complete resource instances
SeBS->>Function: Update function configuration (env vars, storage settings)
Function-->>SeBS: Confirm update
Poem
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
…outs across platforms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
sebs/storage/scylladb.py (2)
293-296
:⚠️ Potential issueUse the correct table name when waiting for creation.
You are passing
name
instead of the actualtable_name
towaiter.wait(TableName=...)
, which may cause the waiter to fail if the new table name differs from the logical name.- waiter.wait(TableName=name) + waiter.wait(TableName=table_name)
305-310
:⚠️ Potential issueReturn the actual table name for consistency.
When the table already exists, the code logs the real
table_name
but returns the logicalname
instead. Returning the actual table name is more consistent and avoids downstream mismatches.- return name + return table_name
🧹 Nitpick comments (22)
sebs/aws/resources.py (2)
42-49
: Remove outdated references to “benchmark” and “buckets” in the docstring.These parameters are no longer part of the method signature, causing confusion for anyone reading or using the code. Consider removing or updating these references to match the current implementation.
""" - Create a client instance for cloud storage. When benchmark and buckets - parameters are passed, then storage is initialized with required number - of buckets. Buckets may be created or retrieved from cache. + Create a client instance for cloud storage. :param replace_existing: replace existing files in cached buckets? :return: storage client """
70-82
: Set logging handlers forDynamoDB
for better consistency.In
get_storage()
, the S3 storage useslogging_handlers
; however, theDynamoDB
instance currently does not. Consistency in logging is helpful for debugging and operation visibility.def get_nosql_storage(self) -> NoSQLStorage: if not self._nosql_storage: assert self._session is not None self.logging.info("Initialize DynamoDB NoSQL instance.") self._nosql_storage = DynamoDB( self._session, self._cache_client, self.config.resources, self.config.region, access_key=self.config.credentials.access_key, secret_key=self.config.credentials.secret_key, ) + self._nosql_storage.logging_handlers = self._logging_handlers return self._nosql_storage
sebs/azure/system_resources.py (2)
41-50
: Remove or revise references to creating buckets in the docstring.The method no longer generates buckets per se; it initializes a BlobStorage instance. Unused references to “buckets” or related parameters can confuse readers.
""" Create wrapper object for Azure blob storage. First ensure that storage account is created and connection string - is known. Then, create wrapper and create request number of buckets. + is known. Then, create BlobStorage wrapper instance. Requires Azure CLI instance in Docker to obtain storage account details. :param replace_existing: when true, replace existing files in input buckets :return: Azure storage instance """
76-105
: Declare_cli_instance_stop
in the constructor for clarity.Introducing
_cli_instance_stop
midway through the code makes its usage less discoverable. Consider initializing it in the constructor to keep class attributes organized and maintain a clear, readable structure.def __init__( self, system_config: SeBSConfig, config: AzureConfig, cache_client: Cache, docker_client: docker.client, logger_handlers: LoggingHandlers, ): super().__init__(config, cache_client, docker_client) self._logging_handlers = logger_handlers self._storage: Optional[BlobStorage] = None self._nosql_storage: Optional[CosmosDB] = None self._cli_instance: Optional[AzureCLI] = None + self._cli_instance_stop: bool = False self._system_config = system_config def cli_instance(self) -> AzureCLI: if self._cli_instance is None: self._cli_instance = AzureCLI(self._system_config, self._docker_client) - self._cli_instance_stop = True + self._cli_instance_stop = True # reuse the declared attribute ...sebs/faas/nosql.py (1)
112-118
: Clarify the return values ofclear_table
andremove_table
.Both methods declare a string return type but do not specify what the string represents (e.g., a success message, the table name, or an error code). Providing a brief description in the docstring or code comments will improve readability.
sebs/azure/cloud_resources.py (3)
6-10
: Consider expanding the docstring for future Azure resources
Providing additional context here about other potential resources would address past feedback regarding this file’s generic name.
32-55
: Add robust error handling for Azure CLI failures
If retrieving the URL or credentials fails, consider catching and converting the error into a more descriptive exception at this level for better troubleshooting.
66-77
: Ensure secure handling of primary keys
Make sure the primary key is never logged or stored in plaintext beyond this scope, to avoid security or compliance issues.sebs/gcp/datastore.py (5)
20-26
: Validate config input in deserialize
Although straightforward, consider validating the presence of required keys inconfig
to avoid potential KeyErrors.
47-53
:get_tables
returns an empty dict
For debugging or feature completeness, consider enumeratingkinds
in the Datastore for more transparency.
76-81
: Ensure consistent cache client usage
Here, we seeself._cache_client
referenced. Other files useself.cache_client
; maintaining a single reference pattern prevents confusion.
85-116
:write_to_table
asserts secondary key
Currently, a missing secondary key is treated as impossible. Consider raising an explicit exception or supporting the no-secondary-key case.
176-181
: Unimplemented table methods
Add docstrings or TODOs explaining how these methods will be implemented in the future.sebs/azure/cosmosdb.py (5)
22-28
: Serialization
Simple and clear. Consider minimal input validation onconfig
keys for extra safety.
48-54
:get_tables
returning empty dict
Similar to other providers, consider enumerating containers for debugging or feature completeness.
119-142
:write_to_table
requires secondary key
Asserting the presence ofsecondary_key
could be replaced with a more descriptive exception or optional logic if the design eventually allows a single key.
143-195
: Creating database and container
Implementation checks existence by reading the container and conditionally creating it. This is a suitable approach, though parallel creation attempts may need synchronization.
196-201
: Stubs forclear_table
andremove_table
Documenting intent or timeline for these features would help future contributors.sebs/storage/scylladb.py (4)
108-121
: Consider using Docker health checks for container readiness.The manual loop with
nodetool status
works but a native Docker health check would be more idiomatic and potentially more reliable. You can define a health check command in the image (or via run parameters) and poll the container’s health status directly.
110-110
: Fix typos in log messages.Several log messages include minor typographical errors such as “succesfully” and “ScyllaBD.” Correcting these will ensure clearer logs.
- self.logging.info("Started ScyllaDB succesfully!") + self.logging.info("Started ScyllaDB successfully!") - self.logging.error("Failed to launch ScyllaBD!") + self.logging.error("Failed to launch ScyllaDB!") - raise RuntimeError("Starting ScyllaDB storage unsuccesful") + raise RuntimeError("Starting ScyllaDB storage unsuccessful")Also applies to: 117-117, 128-128
125-125
: Raise exceptions using theraise ... from e
pattern for clarity.When re-raising exceptions within
except
clauses, attach the original exception object (e.g.,raise RuntimeError("...") from e
) to clearly preserve the exception chain. This improvement addresses static analysis concerns and clarifies error origins.except docker.errors.APIError as e: ... - raise RuntimeError("Starting ScyllaDB storage unsuccesful") + raise RuntimeError("Starting ScyllaDB storage unsuccessful") from eAlso applies to: 128-128, 199-199, 312-312
🧰 Tools
🪛 Ruff (0.8.2)
125-125: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
130-130
: Remove code duplication with MinIO logic.The comment indicates a “FIXME” for duplicated code from your MinIO storage implementation. Refactoring and centralizing shared functionality would simplify maintenance and reduce potential inconsistencies.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
benchmarks/100.webapps/130.crud-api/input.py
(1 hunks)benchmarks/wrappers/gcp/python/handler.py
(1 hunks)sebs/aws/resources.py
(1 hunks)sebs/azure/cloud_resources.py
(1 hunks)sebs/azure/cosmosdb.py
(1 hunks)sebs/azure/system_resources.py
(1 hunks)sebs/benchmark.py
(9 hunks)sebs/faas/nosql.py
(1 hunks)sebs/faas/resources.py
(1 hunks)sebs/gcp/datastore.py
(1 hunks)sebs/regression.py
(15 hunks)sebs/storage/scylladb.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- benchmarks/wrappers/gcp/python/handler.py
- sebs/faas/resources.py
- benchmarks/100.webapps/130.crud-api/input.py
- sebs/benchmark.py
- sebs/regression.py
🧰 Additional context used
🪛 Ruff (0.8.2)
sebs/gcp/datastore.py
164-164: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
sebs/storage/scylladb.py
125-125: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
128-128: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
199-199: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
312-312: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
🔇 Additional comments (24)
sebs/faas/nosql.py (1)
61-78
: Consider concurrency-safety for table creation.If multiple threads attempt to create the same benchmark table concurrently, there may be a race condition. Recommend clarifying the concurrency model or adding synchronization to guarantee only one creation attempt.
sebs/azure/cloud_resources.py (4)
1-3
: Imports are well-structured
No issues found; all imports appear necessary and relevant.
13-31
: Class properties look solid
The read-only properties and constructor are straightforward and clear.
56-65
: CosmosDB endpoint retrieval
Implementation forquery_url
is correct. The returned endpoint is properly extracted from CLI output.
78-87
: Serialization logic
The serialization and deserialization methods are concise and effective.sebs/gcp/datastore.py (8)
1-3
: Core imports
Looks good; usage is consistent with the rest of the file.
12-19
:BenchmarkResources
dataclass
The field definitions are clear and well-aligned with Datastore usage.
28-46
: ClassDatastore
initialization
Static methods and constructor strongly align withNoSQLStorage
; no issues noted.
54-63
:_get_table_name
Straightforward logic verifying existence inbenchmark_resources
; no issues found.
64-75
: Cache retrieval flow
Implementation aligns well with the cache-based design patterns in this codebase.
82-84
:benchmark_database
Simple database name accessor; appear consistent with the rest of the code.
117-161
: Database creation logic
This section properly checks for Firestore existence and allocates a new database when needed. Error handling for "NOT_FOUND" is clear.
165-175
: Benchmark resource registration
Resources and database clients are memoized correctly here after creation; flow is logical.sebs/azure/cosmosdb.py (11)
1-3
: Imports
No issues identified; all are relevant to Cosmos DB usage.
14-21
:BenchmarkResources
dataclass
Defines database and containers for Cosmos DB context. Straightforward approach.
30-38
:CosmosDB
class definition
Follows theNoSQLStorage
contract with matching method signatures.
39-46
: Constructor usage of cache client
Previous feedback oncache_client
usage appears resolved; the references look consistent.
55-64
:_get_table_name
Method checks membership incontainers
to confirm existence. No issues found.
65-76
:retrieve_cache
Logic follows the established cache pattern; straightforward and effective.
77-82
:update_cache
Consistent usage ofself.cache_client
to update NoSQL configuration.
83-96
: Lazy initialization of Cosmos client
Prevents unnecessary connections until the client is needed. Implementation looks solid.
97-99
:has_tables
Minimal check for a benchmark’s presence in_benchmark_resources
. Fits the intended usage.
100-102
:benchmark_database
Retrieves database name from_benchmark_resources
. No concerns noted.
103-118
:credentials
Ensures theCosmosDBAccount
is fetched first; returning the triple (account name, URL, key) is straightforward.
self.logging.error("Couldn't query Datastore instances!") | ||
self.logging.error(e) | ||
raise RuntimeError("Couldn't query Datastore instances!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use exception chaining for clarity
Per B904, attach the original error as context to preserve the traceback and clarify debugging:
- raise RuntimeError("Couldn't query Datastore instances!")
+ raise RuntimeError("Couldn't query Datastore instances!") from e
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
self.logging.error("Couldn't query Datastore instances!") | |
self.logging.error(e) | |
raise RuntimeError("Couldn't query Datastore instances!") | |
self.logging.error("Couldn't query Datastore instances!") | |
self.logging.error(e) | |
raise RuntimeError("Couldn't query Datastore instances!") from e |
🧰 Tools
🪛 Ruff (0.8.2)
164-164: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (4)
sebs/aws/dynamodb.py (4)
97-97
: Consider adding error handling for put_item operation.The
put_item
operation may fail for various reasons (e.g., provisioned throughput exceeded, conditional check failed). Consider adding error handling to catch and properly handle these exceptions.- self.client.put_item(TableName=table_name, Item=serialized_data) + try: + self.client.put_item(TableName=table_name, Item=serialized_data) + except self.client.exceptions.ProvisionedThroughputExceededException as e: + self.logging.error(f"DynamoDB write capacity exceeded: {e}") + raise RuntimeError(f"Failed to write to DynamoDB table {table_name}: {e}") from e + except Exception as e: + self.logging.error(f"Error writing to DynamoDB table {table_name}: {e}") + raise RuntimeError(f"Failed to write to DynamoDB table {table_name}: {e}") from e
123-124
: Type ignores should be fixed with proper typing.Instead of using
# type: ignore
, consider properly typing these parameters or using a more specific type annotation.- AttributeDefinitions=definitions, # type: ignore - KeySchema=key_schema, # type: ignore + AttributeDefinitions=definitions, # type: List[Dict[str, str]] + KeySchema=key_schema, # type: List[Dict[str, str]]You could also define proper type aliases at the top of the file to make the code more readable:
from typing import Dict, List, Optional, Tuple AttributeDefinition = Dict[str, str] KeySchemaElement = Dict[str, str]
169-169
: Use exception chaining for better error traceability.When raising a new exception from a caught exception, use
raise ... from e
to maintain the exception chain for better debugging.- raise RuntimeError(f"Creating DynamoDB failed, unknown reason! Error: {e}") + raise RuntimeError(f"Creating DynamoDB failed, unknown reason! Error: {e}") from e🧰 Tools
🪛 Ruff (0.8.2)
169-169: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
12-20
: Missing docstring for class and static methods.Adding class and method docstrings would improve code readability and maintainability.
class DynamoDB(NoSQLStorage): + """ + AWS DynamoDB NoSQL storage implementation for SeBS. + + This class provides methods to create, write to, and manage DynamoDB tables + for benchmark data storage. + """ @staticmethod def typename() -> str: + """Return the type name for this storage provider.""" return "AWS.DynamoDB" @staticmethod def deployment_name(): + """Return the deployment name for this storage provider.""" return "aws"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
sebs/aws/dynamodb.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
sebs/aws/dynamodb.py
169-169: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
🔇 Additional comments (4)
sebs/aws/dynamodb.py (4)
34-36
: AWS credentials are provided but might not be necessary.The AWS credentials (access_key and secret_key) are passed explicitly to the client. Based on previous discussions in the PR, these might not be necessary if you're relying on the session's credentials. Consider whether explicitly passing them is required for your use case.
#!/bin/bash # Check if boto3 client creation works without explicit credentials # when using a session python3 -c " import boto3 # Test if creating a session and then a client works without explicit credentials try: session = boto3.session.Session() # Creating a client from the session should inherit the session's credentials client = session.client('dynamodb', region_name='us-east-1') print('Client creation without explicit credentials works') except Exception as e: print(f'Error: {e}') "
171-175
: Implement the missing table operations methods.The
clear_table
andremove_table
methods are not implemented. Based on previous discussions, these might be planned for a future PR (#242). Please ensure these are implemented or add a more descriptive comment about the future implementation.I could help implement these methods if needed. For example,
remove_table
would usedelete_table
API, andclear_table
would require scanning and deleting all items in the table.
39-40
: Good explanation for the table mapping structure.The comment explaining the mapping structure with an example is helpful for understanding the code. This is a good practice.
98-103
: Good documentation of AWS-specific behavior.The docstring explaining the difference in database object hierarchy between AWS, Azure, and GCP is valuable context that helps readers understand the implementation.
f"Using existing DynamoDB table {table_name} for benchmark {benchmark}" | ||
) | ||
self._tables[benchmark][name] = table_name | ||
return name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return value inconsistency.
You're returning name
here when the function generally returns the AWS table name (table_name
). This could lead to unexpected behavior.
- return name
+ return table_name
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
return name | |
return table_name |
|
||
def update_cache(self, benchmark: str): | ||
|
||
self._cache_client.update_nosql( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix inconsistent cache client reference.
You're using self._cache_client
here but it's defined as self.cache_client
in the constructor and used as such elsewhere.
- self._cache_client.update_nosql(
+ self.cache_client.update_nosql(
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
self._cache_client.update_nosql( | |
self.cache_client.update_nosql( |
f"Using existing DynamoDB table {table_name} for benchmark {benchmark}" | ||
) | ||
self._tables[benchmark][name] = table_name | ||
return name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return value inconsistency.
Similar to line 154, you're returning name
here when the function should be returning table_name
.
- return name
+ return table_name
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
return name | |
return table_name |
if "being created" in e.response["Error"]["Message"]: | ||
|
||
self.logging.info(f"Waiting for the existing table {table_name} to be created") | ||
waiter = self.client.get_waiter("table_exists") | ||
waiter.wait(TableName=table_name, WaiterConfig={"Delay": 1}) | ||
ret = self.client.describe_table(TableName=table_name) | ||
|
||
self.logging.info( | ||
f"Using existing DynamoDB table {table_name} for benchmark {benchmark}" | ||
) | ||
self._tables[benchmark][name] = table_name | ||
return name | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Eliminate code duplication in exception handling.
The code in this "being created" exception block is almost identical to the "already exists" block above. Consider consolidating these blocks to reduce duplication.
- if "being created" in e.response["Error"]["Message"]:
-
- self.logging.info(f"Waiting for the existing table {table_name} to be created")
- waiter = self.client.get_waiter("table_exists")
- waiter.wait(TableName=table_name, WaiterConfig={"Delay": 1})
- ret = self.client.describe_table(TableName=table_name)
-
- self.logging.info(
- f"Using existing DynamoDB table {table_name} for benchmark {benchmark}"
- )
- self._tables[benchmark][name] = table_name
- return name
-
- raise RuntimeError(f"Creating DynamoDB failed, unknown reason! Error: {e}")
+ if "already exists" in e.response["Error"]["Message"] or "being created" in e.response["Error"]["Message"]:
+ self.logging.info(f"Waiting for the existing table {table_name} to be created")
+ waiter = self.client.get_waiter("table_exists")
+ waiter.wait(TableName=table_name, WaiterConfig={"Delay": 1})
+ ret = self.client.describe_table(TableName=table_name)
+
+ self.logging.info(
+ f"Using existing DynamoDB table {table_name} for benchmark {benchmark}"
+ )
+ self._tables[benchmark][name] = table_name
+ return table_name
+
+ raise RuntimeError(f"Creating DynamoDB failed, unknown reason! Error: {e}") from e
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if "being created" in e.response["Error"]["Message"]: | |
self.logging.info(f"Waiting for the existing table {table_name} to be created") | |
waiter = self.client.get_waiter("table_exists") | |
waiter.wait(TableName=table_name, WaiterConfig={"Delay": 1}) | |
ret = self.client.describe_table(TableName=table_name) | |
self.logging.info( | |
f"Using existing DynamoDB table {table_name} for benchmark {benchmark}" | |
) | |
self._tables[benchmark][name] = table_name | |
return name | |
if "already exists" in e.response["Error"]["Message"] or "being created" in e.response["Error"]["Message"]: | |
self.logging.info(f"Waiting for the existing table {table_name} to be created") | |
waiter = self.client.get_waiter("table_exists") | |
waiter.wait(TableName=table_name, WaiterConfig={"Delay": 1}) | |
ret = self.client.describe_table(TableName=table_name) | |
self.logging.info( | |
f"Using existing DynamoDB table {table_name} for benchmark {benchmark}" | |
) | |
self._tables[benchmark][name] = table_name | |
return table_name | |
raise RuntimeError(f"Creating DynamoDB failed, unknown reason! Error: {e}") from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
docs/platforms.md (2)
15-19
: Enhance the "Supported platforms" header formatting.
The new section clearly lists the supported platforms. For better visual hierarchy and consistency with the rest of the document (which uses markdown headings for other sections), consider converting this plain text line into a markdown header (e.g., using## Supported platforms
).Proposed diff:
-Supported platforms +## Supported platforms
144-145
: Improve sentence clarity and punctuation for GCP permissions.
The combined sentence reading "You should have at least write access to Cloud Functions (Cloud Functions Admin
) and Logging Furthermore, SeBS needs the permissions to create Firestore databases through..." can be enhanced by adding proper punctuation to separate the requirements. Splitting into two sentences or adding a period after Logging would improve readability.Proposed diff:
-You should have at least write access to **Cloud Functions** (`Cloud Functions Admin`) and **Logging** Furthermore, SeBS needs the permissions to create Firestore databases through +You should have at least write access to **Cloud Functions** (`Cloud Functions Admin`) and **Logging**. Furthermore, SeBS needs the permissions to create Firestore databases throughsebs/config.py (2)
29-35
: Add error handling and documentation to thedeployment_module_packages
method.The method looks good functionally, but it's missing two important elements:
- Error handling for when the "module_packages" key doesn't exist in the configuration
- A docstring explaining the method's purpose, parameters, and return value
Consider updating the implementation to safely handle missing keys:
def deployment_module_packages( self, deployment_name: str, language_name: str ) -> Dict[str, str]: + """ + Retrieves module packages for a specific deployment and language. + + Args: + deployment_name: The name of the deployment system + language_name: The programming language name + + Returns: + Dictionary mapping module names to their package names + """ - return self._system_config[deployment_name]["languages"][language_name]["deployment"][ - "module_packages" - ] + try: + return self._system_config[deployment_name]["languages"][language_name]["deployment"][ + "module_packages" + ] + except KeyError: + return {}
70-72
: Avoid duplication and add documentation to theversion
method.The method correctly uses
.get()
with a default value, but it duplicates logic already present inbenchmark_image_tag
(line 103). This could lead to maintenance issues if one implementation is updated but not the other.Consider extracting the common logic to avoid duplication:
def version(self) -> str: + """ + Returns the SeBS version from the configuration. + + Returns: + The version string or "unknown" if not specified + """ return self._system_config["general"].get("SeBS_version", "unknown")Then update
benchmark_image_tag
to use this method:def benchmark_image_tag( self, system: str, benchmark: str, language_name: str, language_version: str, architecture: str, ) -> str: tag = f"function.{system}.{benchmark}.{language_name}-{language_version}-{architecture}" if self.image_tag_prefix: tag = f"{tag}-{self.image_tag_prefix}" - sebs_version = self._system_config["general"].get("SeBS_version", "unknown") + sebs_version = self.version() tag = f"{tag}-{sebs_version}" return tagsebs/benchmark.py (1)
783-785
: Consider documenting the NoSQL function parameters.The signature for the
nosql_func
callback is complex, and a brief comment explaining the parameters would improve maintainability.nosql_func: Optional[ - Callable[[str, str, dict, Tuple[str, str], Optional[Tuple[str, str]]], None] + Callable[[str, str, dict, Tuple[str, str], Optional[Tuple[str, str]]], None] # (table_name, item_id, item_data, primary_key, secondary_key)tools/build_docker_images.py (1)
68-68
: Architecture-specific Docker builds introducedThe code now explicitly accesses the
x64
key in the base images dictionary, indicating a shift toward architecture-specific Docker builds.This change supports the NoSQL integration by ensuring the correct architecture-specific images are built. Consider the following recommendations:
- Ensure other architectures (if needed) are handled properly.
- Update documentation to reflect this architecture-specific build approach.
- Add error handling for when the "x64" key is missing.
- if "base_images" in language_config: - for version, base_image in language_config["base_images"]["x64"].items(): + if "base_images" in language_config: + arch = "x64" # Default architecture + if arch not in language_config["base_images"]: + print(f"Warning: Architecture {arch} not found in base_images. Skipping.") + return + for version, base_image in language_config["base_images"][arch].items():
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
docs/platforms.md
(3 hunks)sebs/benchmark.py
(10 hunks)sebs/config.py
(2 hunks)sebs/regression.py
(15 hunks)tools/build_docker_images.py
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- sebs/regression.py
🔇 Additional comments (7)
sebs/benchmark.py (6)
26-28
: Well-structured implementation of module support.The addition of
modules
parameter toBenchmarkConfig
enables extension points for different storage types, aligning with the PR's NoSQL integration goals. The implementation properly handles the modules in initialization, property access, and deserialization.Also applies to: 32-32, 54-56, 67-67
153-164
: Good addition of tracking properties.These properties provide a clear API for determining if a benchmark uses different storage types. This will be useful for conditional logic in deployment and test code.
235-242
: Well-organized initialization of storage-related state.The explicit initialization of storage state flags to
False
by default is good practice, making the behavior predictable and easy to understand.
362-378
: Good modularization of deployment packages.The refactored code now properly handles module-specific package installations, allowing dependencies to be included only when necessary for the specific storage types being used.
629-697
: Excellent refactoring of the input preparation.The
prepare_input
method refactoring shows good separation of concerns between object storage and NoSQL storage handling. The implementation:
- Takes a unified
SystemResources
parameter instead of a specific storage type- Conditionally handles object storage and NoSQL based on module capabilities
- Updates tracking flags to reflect what storage types are actually used
- Properly calls cache update methods for both storage types
This design will scale well as more storage types are added in the future.
769-772
: Well-defined module interface for NoSQL support.Adding the abstract
allocate_nosql
method to theBenchmarkModuleInterface
ensures that all implementing benchmarks will provide the necessary configuration for NoSQL tables in a consistent way.tools/build_docker_images.py (1)
41-41
:❓ Verification inconclusive
Docker tag format changed from dot to hyphen
The separator between the image tag and the SeBS version has been changed from a dot (.) to a hyphen (-). This changes the format of Docker image tags from
repository:type.system.language.version.sebs_version
torepository:type.system.language.version-sebs_version
.Make sure this change is consistent with how images are referenced elsewhere in the codebase and in deployment scripts.
🏁 Script executed:
#!/bin/bash # Check if there are any references to the old image tag format with dot-separated SeBS version rg 'docker.*\..*SeBS_version|docker.*\.[^"]*unknown' --type pyLength of output: 64
Docker Tag Format Update Consistency Check
The update in
tools/build_docker_images.py
—changing the concatenation to use a hyphen (target += "-" + sebs_version
)—appears to reflect the new Docker tag format (repository:type.system.language.version-sebs_version
). Our initial search in Python files did not reveal any references still using the old dot separator. However, because the initial verification did not cover non-Python files (such as deployment scripts), please verify that all image references across the entire codebase adhere to this updated format.
- File Updated:
tools/build_docker_images.py
now correctly uses the hyphen separator.- Action Required: Manually confirm that deployment scripts and any other Docker image references (potentially in non-Python files) have been updated accordingly.
Sometimes new environment variables are not being propagated to functions. I don't know why and we don't know a better option to query for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
sebs/azure/azure.py (1)
296-296
: 🛠️ Refactor suggestionAvoid mutable default arguments.
Using
{}
as a default value forenv_variables
can lead to unexpected behavior if the same dictionary is modified across multiple calls.- def update_envs(self, function: Function, code_package: Benchmark, env_variables: dict = {}): + def update_envs(self, function: Function, code_package: Benchmark, env_variables: Optional[dict] = None): + if env_variables is None: + env_variables = {}🧰 Tools
🪛 Ruff (0.8.2)
296-296: Do not use mutable data structures for argument defaults
Replace with
None
; initialize within function(B006)
🧹 Nitpick comments (6)
sebs/azure/system_resources.py (4)
41-50
: Fix docstring placement.The multi-line docstring for
get_storage
is incorrectly placed outside the method. According to PEP 257, docstrings should be placed immediately after the method definition.- """ - Create wrapper object for Azure blob storage. - First ensure that storage account is created and connection string - is known. Then, create wrapper and create request number of buckets. - - Requires Azure CLI instance in Docker to obtain storage account details. - - :param replace_existing: when true, replace existing files in input buckets - :return: Azure storage instance - """ def get_storage(self, replace_existing: Optional[bool] = None) -> BlobStorage: + """ + Create wrapper object for Azure blob storage. + First ensure that storage account is created and connection string + is known. Then, create wrapper and create request number of buckets. + + Requires Azure CLI instance in Docker to obtain storage account details. + + :param replace_existing: when true, replace existing files in input buckets + :return: Azure storage instance + """
66-71
: Add docstring for the NoSQL storage method.The
get_nosql_storage
method is missing a docstring to explain its purpose and behavior.def get_nosql_storage(self) -> CosmosDB: + """ + Create wrapper object for Azure Cosmos DB. + Initialize Cosmos DB instance with the CLI instance, cache client, and configuration. + + :return: CosmosDB instance for NoSQL storage operations + """ if self._nosql_storage is None: self._nosql_storage = CosmosDB( self.cli_instance, self._cache_client, self.config.resources, self.config.region ) return self._nosql_storage
83-90
: Improve error messages with more specific guidance.The current error messages provide basic information but could be enhanced with more specific guidance for the user.
subscriptions = json.loads(output) if len(subscriptions) == 0: - raise RuntimeError("Didn't find any valid subscription on Azure!") + raise RuntimeError("No valid Azure subscriptions found. Please verify your Azure credentials and ensure you have at least one active subscription.") if len(subscriptions) > 1: - raise RuntimeError("Found more than one valid subscription on Azure - not supported!") + raise RuntimeError("Multiple Azure subscriptions detected. Please set a default subscription using 'az account set --subscription <subscription_id>' before running this operation.") self.config.credentials.subscription_id = subscriptions[0]["id"]
102-108
: Document expected behavior ofinitialize_cli
.The
initialize_cli
method lacks documentation to clarify its relationship with the auto-initialized CLI from thecli_instance
property.def initialize_cli(self, cli: AzureCLI, login: bool = False): + """ + Set an existing CLI instance instead of auto-creating one. + This is useful when sharing a CLI instance across multiple components. + + :param cli: Existing AzureCLI instance to use + :param login: Whether to perform login after setting the CLI instance + """ self._cli_instance = cli self._cli_instance_stop = False if login: self._login_cli()sebs/azure/azure.py (2)
297-313
: Consider refactoring NoSQL environment variable setup.The current code has multiple environment variable sets with hardcoded prefixes. This could be refactored to be more maintainable.
- envs = {} + envs = env_variables.copy() if code_package.uses_nosql: nosql_storage = cast(CosmosDB, self._system_resources.get_nosql_storage()) # If we use NoSQL, then the handle must be allocated _, url, creds = nosql_storage.credentials() db = nosql_storage.benchmark_database(code_package.benchmark) - envs["NOSQL_STORAGE_DATABASE"] = db - envs["NOSQL_STORAGE_URL"] = url - envs["NOSQL_STORAGE_CREDS"] = creds + # Group related NoSQL environment variables + nosql_env_vars = { + "NOSQL_STORAGE_DATABASE": db, + "NOSQL_STORAGE_URL": url, + "NOSQL_STORAGE_CREDS": creds + } + envs.update(nosql_env_vars) for original_name, actual_name in nosql_storage.get_tables( code_package.benchmark ).items(): envs[f"NOSQL_STORAGE_TABLE_{original_name}"] = actual_name
369-374
: Remove commented-out code.The commented-out code about sleeping should be removed if it's no longer needed, or converted to a clearly marked commented section if it's kept for reference.
- # if we don't do that, next invocation might still see old values - # Disabled since we swapped the order - we first update envs, then we publish. - #self.logging.info( - # "Sleeping for 10 seconds - Azure needs more time to propagate changes. " - # "Otherwise, functions might not see new variables and fail unexpectedly." - #) + # Note: Environment variable propagation is now handled by updating envs first, then publishing
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
config/systems.json
(8 hunks)sebs/azure/azure.py
(8 hunks)sebs/azure/system_resources.py
(1 hunks)sebs/regression.py
(15 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- config/systems.json
- sebs/regression.py
🧰 Additional context used
🪛 Ruff (0.8.2)
sebs/azure/azure.py
296-296: Do not use mutable data structures for argument defaults
Replace with None
; initialize within function
(B006)
🔇 Additional comments (3)
sebs/azure/system_resources.py (1)
35-40
:❓ Verification inconclusive
Add error handling for initialization.
The properties are initialized without checking for potential initialization errors that might occur later. Consider adding validation or error handling.
🏁 Script executed:
#!/bin/bash # Check if there are any error handling patterns in similar classes rg -A 3 -B 3 "def __init__.*SystemResources" --type pyLength of output: 56
Action Required: Incorporate Error Handling in SystemResources Initializer
The initialization of properties in
sebs/azure/system_resources.py
(lines 35–40) currently does not include any error handling or input validation. Given that similar classes in the repository don’t appear to implement this pattern, it’s not a consistency issue—but proactively validating or catching initialization errors can prevent unexpected runtime exceptions. Please consider:
- Wrapping assignments (e.g., setting
_logging_handlers
or_system_config
) within try/except blocks to catch and log unexpected errors.- Validating inputs (such as ensuring
system_config
is notNone
or malformed) before assignment.- Aligning any new error handling logic with established patterns in other parts of the repository (manual verification is recommended to confirm consistency).
sebs/azure/azure.py (2)
339-349
:❓ Verification inconclusive
Refine Azure system variable detection logic.
The current approach to detecting Azure system variables relies on hardcoded prefixes, which could break if Azure changes their naming conventions.
🏁 Script executed:
#!/bin/bash # Check if there are any other places in the codebase that filter Azure environment variables rg "FUNCTIONS_|WEBSITE_|APPINSIGHTS_|Azure" --type pyLength of output: 17040
Improve Azure Environment Variable Detection Logic
The current logic relies on a hardcoded list of prefixes (i.e.,
"FUNCTIONS_"
,"WEBSITE_"
,"APPINSIGHTS_"
,"Azure"
) to identify Azure system variables. This approach is brittle since any change in Azure's naming conventions could break it. Consider refactoring to mitigate future issues:
- Centralize or Externalize Prefixes: Move these hardcoded prefixes to a dedicated constants file or configuration setting to ease future updates.
- Consult Official Documentation: Explore any available Azure documentation or API endpoints that provide an authoritative list of system variable prefixes.
- Maintain Backward Compatibility: Ensure that any refinements do not inadvertently overwrite manually provided variable values.
Please review this refactoring approach for suitability against the broader usage across the codebase.
589-592
:❓ Verification inconclusive
Clarify the cold start enforcement mechanism.
The commented-out line and FIXME comment suggest uncertainty about whether the current approach is sufficient for enforcing cold starts.
🏁 Script executed:
#!/bin/bash # Let's look for other examples of cold start enforcement in the codebase rg -A 5 "enforce_cold_start" --type pyLength of output: 5816
Action Required: Clarify Azure Cold Start Enforcement
The current implementation in
_enforce_cold_start
updates the function’s environment variables (viaupdate_envs
) using theForceColdStart
counter, then pauses execution for 20 seconds. However, the commented-out call toupdate_function
and the accompanying FIXME note raise uncertainty about whether this approach fully enforces a cold start. For comparison, other providers (e.g., AWS and GCP) update function configurations explicitly to trigger cold starts. Could you please clarify whether updating the environment is sufficient in the Azure context or if the alternative update method should be reconsidered? This clarification or additional documentation will help ensure consistency in our cold start enforcement strategy.
env_string = "" | ||
for k, v in envs.items(): | ||
env_string += f" {k}={v}" | ||
|
||
self.logging.info(f"Exporting environment variables for function {function.name}") | ||
self.cli_instance.execute( | ||
f"az functionapp config appsettings set --name {function.name} " | ||
f" --resource-group {resource_group} " | ||
f" --settings {env_string} " | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve environment variable command construction.
The current approach of concatenating environment variables into a string can have issues with escaping special characters or spaces in values.
- env_string = ""
- for k, v in envs.items():
- env_string += f" {k}={v}"
+ # Use Azure CLI's argument format for settings to handle escaping properly
+ settings_args = " ".join([f"{k}='{v}'" for k, v in envs.items()])
self.logging.info(f"Exporting environment variables for function {function.name}")
self.cli_instance.execute(
f"az functionapp config appsettings set --name {function.name} "
f" --resource-group {resource_group} "
- f" --settings {env_string} "
+ f" --settings {settings_args} "
)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
env_string = "" | |
for k, v in envs.items(): | |
env_string += f" {k}={v}" | |
self.logging.info(f"Exporting environment variables for function {function.name}") | |
self.cli_instance.execute( | |
f"az functionapp config appsettings set --name {function.name} " | |
f" --resource-group {resource_group} " | |
f" --settings {env_string} " | |
) | |
# Use Azure CLI's argument format for settings to handle escaping properly | |
settings_args = " ".join([f"{k}='{v}'" for k, v in envs.items()]) | |
self.logging.info(f"Exporting environment variables for function {function.name}") | |
self.cli_instance.execute( | |
f"az functionapp config appsettings set --name {function.name} " | |
f" --resource-group {resource_group} " | |
f" --settings {settings_args} " | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
sebs/gcp/gcp.py (1)
450-450
:⚠️ Potential issueAvoid using a mutable default argument for
env_variables
.Using a mutable default argument can lead to unexpected behavior if the function is called multiple times.
- def update_function_configuration( - self, function: Function, code_package: Benchmark, env_variables: dict = {} - ): + def update_function_configuration( + self, function: Function, code_package: Benchmark, env_variables: Optional[dict] = None + ): + if env_variables is None: + env_variables = {}🧰 Tools
🪛 Ruff (0.8.2)
450-450: Do not use mutable data structures for argument defaults
Replace with
None
; initialize within function(B006)
sebs/azure/azure.py (2)
296-296
:⚠️ Potential issueAvoid mutable default argument in
update_envs
.Using a mutable object like an empty dictionary as a default argument can lead to unexpected behavior.
-def update_envs(self, function: Function, code_package: Benchmark, env_variables: dict = {}): +def update_envs(self, function: Function, code_package: Benchmark, env_variables: Optional[dict] = None): + if env_variables is None: + env_variables = {}🧰 Tools
🪛 Ruff (0.8.2)
296-296: Do not use mutable data structures for argument defaults
Replace with
None
; initialize within function(B006)
357-366
: 🛠️ Refactor suggestionImprove environment variable command construction.
The current string concatenation approach for environment variables can have issues with escaping special characters or values containing spaces.
- env_string = "" - for k, v in envs.items(): - env_string += f" {k}={v}" + # Use Azure CLI's argument format for settings to handle escaping properly + settings_args = " ".join([f"{k}='{v}'" for k, v in envs.items()]) self.logging.info(f"Exporting environment variables for function {function.name}") self.cli_instance.execute( f"az functionapp config appsettings set --name {function.name} " f" --resource-group {resource_group} " - f" --settings {env_string} " + f" --settings {settings_args} " )
🧹 Nitpick comments (8)
sebs/benchmark.py (5)
67-67
: Consider validating module valuesThe deserialization directly converts JSON strings to BenchmarkModule objects without validation. Consider adding validation to ensure that only supported module types are accepted.
363-365
: Remove extraneousf
prefixThe f-string doesn't contain any variables, so the
f
prefix is unnecessary.- destination_file = f"requirements.txt.{self._language_version}" + destination_file = "requirements.txt.{}".format(self._language_version) - destination_file = f"requirements.txt" + destination_file = "requirements.txt"🧰 Tools
🪛 Ruff (0.8.2)
365-365: f-string without any placeholders
Remove extraneous
f
prefix(F541)
391-393
: Remove extraneousf
prefixSimilar to the previous comment, this f-string doesn't contain any interpolation.
- package_config = os.path.join(output_dir, f"package.json.{self._language_version}") + package_config = os.path.join(output_dir, "package.json.{}".format(self._language_version)) - package_config = os.path.join(output_dir, f"package.json") + package_config = os.path.join(output_dir, "package.json")🧰 Tools
🪛 Ruff (0.8.2)
393-393: f-string without any placeholders
Remove extraneous
f
prefix(F541)
777-780
: Clarify if allocate_nosql is optionalSince this is a new abstract method in the
BenchmarkModuleInterface
, it's not clear if all existing benchmark modules need to implement this method. Consider adding a docstring to explain the purpose and usage.@staticmethod @abstractmethod def allocate_nosql() -> dict: - pass + """ + Optional method to define NoSQL tables needed by the benchmark. + + Returns: + dict: A dictionary where keys are table names and values are dictionaries + with 'primary_key' and optional 'secondary_key' fields. + """ + pass
783-794
: Consider simplifying the type annotation for nosql_funcThe type annotation for the
nosql_func
parameter is complex and might be hard for developers to understand. Consider creating a TypeAlias for this function type.# Add at the top of the file with other imports + from typing import TypeAlias + + # Define a type alias for the NoSQL function type + NoSQLWriteFunction: TypeAlias = Callable[[str, str, dict, Tuple[str, str], Optional[Tuple[str, str]]], None] # Then update the method signature @staticmethod @abstractmethod def generate_input( data_dir: str, size: str, benchmarks_bucket: Optional[str], input_paths: List[str], output_paths: List[str], upload_func: Optional[Callable[[int, str, str], None]], - nosql_func: Optional[ - Callable[[str, str, dict, Tuple[str, str], Optional[Tuple[str, str]]], None] - ], + nosql_func: Optional[NoSQLWriteFunction],sebs/gcp/gcp.py (2)
435-448
: Consider designing for future environment variable needs.The implementation effectively handles NoSQL database variables, but as mentioned in previous discussions, other services like Redis might need similar handling in the future.
Consider extending this method to accommodate future environment variable types by using a more generic approach or adding in a comment about future extensions:
def _generate_function_envs(self, code_package: Benchmark) -> dict: envs = {} + # Handle NoSQL-related environment variables if code_package.uses_nosql: db = ( cast(GCPSystemResources, self._system_resources) .get_nosql_storage() .benchmark_database(code_package.benchmark) ) envs["NOSQL_STORAGE_DATABASE"] = db + # TODO: Add support for Redis and other future services here return envs
266-285
: Improve error handling in IAM policy retry mechanism.The current implementation catches all HttpError instances without examining specific error types, which might mask unexpected errors.
Consider enhancing the error handling to log the specific error and only retry for specific error codes:
while counter < MAX_RETRIES: try: allow_unauthenticated_req.execute() break - except HttpError: + except HttpError as e: + # Only retry for specific error codes (e.g., 429 Too Many Requests, 503 Service Unavailable) + if e.resp.status in [429, 503]: + self.logging.info( + f"Sleeping for 5 seconds because of HTTP error {e.resp.status}: {str(e)}" + ) + time.sleep(5) + counter += 1 + else: + # For other errors, we should not retry + raise RuntimeError( + f"Failed to configure function {full_func_name} " + f"for unauthenticated invocations: {str(e)}" + ) self.logging.info( "Sleeping for 5 seconds because the created functions is not yet available!" ) time.sleep(5) counter += 1sebs/azure/azure.py (1)
370-374
: Remove commented-out code.The commented sleep code is no longer necessary since you've changed the order of operations (updating environment variables before publishing). Keeping commented code can lead to confusion.
- # if we don't do that, next invocation might still see old values - # Disabled since we swapped the order - we first update envs, then we publish. - # self.logging.info( - # "Sleeping for 10 seconds - Azure needs more time to propagate changes. " - # "Otherwise, functions might not see new variables and fail unexpectedly." - # )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
sebs/azure/azure.py
(8 hunks)sebs/benchmark.py
(11 hunks)sebs/gcp/gcp.py
(13 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
sebs/azure/azure.py
296-296: Do not use mutable data structures for argument defaults
Replace with None
; initialize within function
(B006)
sebs/gcp/gcp.py
450-450: Do not use mutable data structures for argument defaults
Replace with None
; initialize within function
(B006)
sebs/benchmark.py
365-365: f-string without any placeholders
Remove extraneous f
prefix
(F541)
393-393: f-string without any placeholders
Remove extraneous f
prefix
(F541)
🔇 Additional comments (6)
sebs/benchmark.py (5)
26-32
: Great enhancement to support NoSQL modulesThe addition of the
modules
parameter toBenchmarkConfig
constructor aligns well with the PR objective of adding NoSQL storage capabilities.
54-56
: LGTM - Clean property implementationThe property implementation for
modules
follows the same pattern as the other properties in this class.
153-164
: Good addition of state tracking propertiesThese properties provide clear access to the benchmark's state regarding input processing and storage usage, which is essential for the NoSQL functionality.
637-705
: Well-structured implementation of prepare_input for NoSQLThe refactored
prepare_input
method now handles both object storage and NoSQL storage in a clean and structured way. The conditional checks for the presence of methods in the input module are a good approach to maintain backward compatibility.However, consider adding more error handling for edge cases:
""" Handle key-value storage. This part is optional - only selected benchmarks implement this. """ if hasattr(self._benchmark_input_module, "allocate_nosql"): - nosql_storage = system_resources.get_nosql_storage() + try: + nosql_storage = system_resources.get_nosql_storage() + except Exception as e: + self.logging.error(f"Failed to initialize NoSQL storage: {str(e)}") + raise RuntimeError(f"NoSQL storage initialization failed: {str(e)}")
376-382
: 🛠️ Refactor suggestionAdd safeguard for missing module packages
The code assumes that
bench_module.value
exists inmodule_packages
, but there's no check to confirm this. This could lead to a KeyError exception.for bench_module in self._benchmark_config.modules: - if bench_module.value in module_packages: - for package in module_packages[bench_module.value]: - out.write(package) + if bench_module.value in module_packages: + for package in module_packages[bench_module.value]: + out.write(package) + else: + self.logging.debug("No packages found for module %s", bench_module.value)Likely an incorrect or invalid review comment.
sebs/azure/azure.py (1)
589-593
:❓ Verification inconclusive
Consider adding a comment about cold start implementation.
The commented line suggests uncertainty about whether the current implementation is sufficient for enforcing cold starts.
self.update_envs(function, code_package, {"ForceColdStart": str(self.cold_start_counter)}) - # FIXME: is this sufficient to enforce cold starts? - # self.update_function(function, code_package, False, "") + # We enforce cold starts by updating an environment variable that changes on each cold start request. + # This causes the function container to be redeployed, triggering a cold start.It would be beneficial to verify if this approach consistently forces cold starts on Azure:
🏁 Script executed:
#!/bin/bash # Verify the cold start enforcement on Azure functions # Check the function's environment variables before and after enforcing cold start echo "Fetching environment variables for the function before cold start..." az functionapp config appsettings list --name YOUR_FUNCTION_APP --resource-group YOUR_RESOURCE_GROUP # Measure initial response time echo "Invoking function to measure warm response time..." time curl -s YOUR_FUNCTION_URL # Force a cold start echo "Enforcing cold start..." # (Script would execute the enforce_cold_start function here) # Measure response time after cold start echo "Invoking function to verify cold start..." time curl -s YOUR_FUNCTION_URLLength of output: 743
Review Update: Clarify Cold Start Mechanism and Recommend Manual Verification
The update to the comment now clearly explains that cold starts are triggered by updating an environment variable that changes with each request. However, the test script output was inconclusive because the environment lacked the Azure CLI ("az" command not found). Please ensure that manual verification is performed in an environment with the necessary tooling (i.e., Azure CLI configured) to confirm that this approach consistently forces cold starts.
Required actions:
- Update the comment at
sebs/azure/azure.py
(lines 589–593) as follows:self.update_envs(function, code_package, {"ForceColdStart": str(self.cold_start_counter)}) - # FIXME: is this sufficient to enforce cold starts? - # self.update_function(function, code_package, False, "") + # We enforce cold starts by updating an environment variable that changes on each cold start request. + # This causes the function container to be redeployed, triggering a cold start.- Manually test the cold start behavior on an environment where the Azure CLI is available to ensure the mechanism works as expected.
|
||
self._benchmark_data_path = find_benchmark(self._benchmark, "benchmarks-data") | ||
self._benchmark_input_module = load_benchmark_input(self._benchmark_path) | ||
|
||
# Check if input has been processed | ||
self._input_processed: bool = False | ||
self._uses_storage: bool = False | ||
self._uses_nosql: bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling for benchmark input loading
The code loads the benchmark input module but doesn't have error handling if the module is not found or fails to load.
- self._benchmark_input_module = load_benchmark_input(self._benchmark_path)
+ try:
+ self._benchmark_input_module = load_benchmark_input(self._benchmark_path)
+ except (ImportError, FileNotFoundError) as e:
+ raise RuntimeError(f"Failed to load input module for benchmark {self._benchmark}: {str(e)}")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
self._benchmark_data_path = find_benchmark(self._benchmark, "benchmarks-data") | |
self._benchmark_input_module = load_benchmark_input(self._benchmark_path) | |
# Check if input has been processed | |
self._input_processed: bool = False | |
self._uses_storage: bool = False | |
self._uses_nosql: bool = False | |
self._benchmark_data_path = find_benchmark(self._benchmark, "benchmarks-data") | |
try: | |
self._benchmark_input_module = load_benchmark_input(self._benchmark_path) | |
except (ImportError, FileNotFoundError) as e: | |
raise RuntimeError(f"Failed to load input module for benchmark {self._benchmark}: {str(e)}") | |
# Check if input has been processed | |
self._input_processed: bool = False | |
self._uses_storage: bool = False | |
self._uses_nosql: bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
docs/benchmarks.md (1)
45-49
: Improve Grammatical Clarity in CRUD API DescriptionIn the CRUD API description, the phrase “add new item (
PUT
)” would read more clearly if it were “add a new item (PUT
)”. This minor adjustment will enhance the readability of the documentation.-It offers three basic methods: add new item (`PUT`), get an item (`GET`), and query all items in a cart. +It offers three basic methods: add a new item (`PUT`), get an item (`GET`), and query all items in a cart.🧰 Tools
🪛 LanguageTool
[uncategorized] ~47-~47: You might be missing the article “a” here.
Context: ...art. It offers three basic methods: add new item (PUT
), get an item (GET
), and ...(AI_EN_LECTOR_MISSING_DETERMINER_A)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/benchmarks.md
(2 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/benchmarks.md
[uncategorized] ~47-~47: You might be missing the article “a” here.
Context: ...art. It offers three basic methods: add new item (PUT
), get an item (GET
), and ...
(AI_EN_LECTOR_MISSING_DETERMINER_A)
🔇 Additional comments (1)
docs/benchmarks.md (1)
8-8
: New Benchmark Row Added SuccessfullyThe new table row for 130.crud-api is well formatted and clearly conveys the intended benchmark details, including the type, supported language, architectures, and a concise description. This aligns nicely with the PR's objective of integrating NoSQL storage capabilities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
docs/storage.md (1)
81-82
: Consider using "inaccessible" for better readability.The phrases "not accessible" could be replaced with "inaccessible" to improve readability.
-By default, the container runs on the Docker bridge network with an address (e.g., `172.17.0.2`) that is not accessible from outside the host. -Even when deploying both OpenWhisk and storage on the same system, the local bridge network is not accessible from the Kubernetes cluster. +By default, the container runs on the Docker bridge network with an address (e.g., `172.17.0.2`) that is inaccessible from outside the host. +Even when deploying both OpenWhisk and storage on the same system, the local bridge network is inaccessible from the Kubernetes cluster.🧰 Tools
🪛 LanguageTool
[style] ~81-~81: Consider using “inaccessible” to avoid wordiness.
Context: ...an address (e.g.,172.17.0.2
) that is not accessible from outside the host. Even when deploy...(NOT_ABLE_PREMIUM)
[style] ~82-~82: Consider using “inaccessible” to avoid wordiness.
Context: ...ame system, the local bridge network is not accessible from the Kubernetes cluster. To make i...(NOT_ABLE_PREMIUM)
sebs/storage/config.py (1)
83-87
: Dictionary key lookup can be optimized.The code is using
.keys()
when checking for keys in a dictionary, which is unnecessary.- for key in ScyllaDBConfig.__dataclass_fields__.keys(): + for key in ScyllaDBConfig.__dataclass_fields__: cache.update_config(val=getattr(self, key), keys=[*path, key])🧰 Tools
🪛 Ruff (0.8.2)
85-85: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
sebs.py (2)
163-169
: Use context manager for file operations.When opening files, it's better to use context managers to ensure proper resource cleanup.
- if storage_configuration is not None: - - for cfg_f in storage_configuration: - sebs_client.logging.info(f"Loading storage configuration from {cfg_f}") - - cfg = json.load(open(cfg_f, 'r')) - append_nested_dict(config_obj, ["deployment", deployment, "storage"], cfg) + if storage_configuration is not None: + + for cfg_f in storage_configuration: + sebs_client.logging.info(f"Loading storage configuration from {cfg_f}") + + with open(cfg_f, 'r') as f: + cfg = json.load(f) + append_nested_dict(config_obj, ["deployment", deployment, "storage"], cfg)🧰 Tools
🪛 Ruff (0.8.2)
168-168: Use a context manager for opening files
(SIM115)
397-399
: Use context manager for file operations.When opening files, it's better to use context managers to ensure proper resource cleanup.
- user_storage_config = json.load(open(config, 'r')) + with open(config, 'r') as f: + user_storage_config = json.load(f)🧰 Tools
🪛 Ruff (0.8.2)
398-398: Use a context manager for opening files
(SIM115)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
docs/platforms.md
(5 hunks)docs/storage.md
(1 hunks)sebs.py
(12 hunks)sebs/storage/config.py
(2 hunks)sebs/storage/minio.py
(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- sebs/storage/minio.py
🧰 Additional context used
🪛 LanguageTool
docs/storage.md
[style] ~81-~81: Consider using “inaccessible” to avoid wordiness.
Context: ...an address (e.g., 172.17.0.2
) that is not accessible from outside the host. Even when deploy...
(NOT_ABLE_PREMIUM)
[style] ~82-~82: Consider using “inaccessible” to avoid wordiness.
Context: ...ame system, the local bridge network is not accessible from the Kubernetes cluster. To make i...
(NOT_ABLE_PREMIUM)
🪛 Ruff (0.8.2)
sebs.py
168-168: Use a context manager for opening files
(SIM115)
398-398: Use a context manager for opening files
(SIM115)
sebs/storage/config.py
85-85: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
🔇 Additional comments (25)
docs/storage.md (5)
1-15
: Well-structured introduction to storage configuration.The introduction clearly explains the purpose of storage in SeBS benchmarks and introduces Minio and ScyllaDB as the solutions for object storage and NoSQL database, respectively. This helps users understand the overall storage architecture.
16-32
: Clear CLI command examples for starting storage services.The documentation provides explicit examples for starting object storage, NoSQL storage, or both, which makes it easy for users to understand the different options available.
34-68
: Comprehensive configuration example.The JSON configuration example is detailed and includes all necessary parameters for both storage types. This helps users understand the expected configuration format.
93-108
: Good practice: HTTP validation example.Including a concrete example with curl to validate the Minio configuration is excellent as it provides users with a practical way to verify their setup is working correctly.
111-123
: Clear lifecycle management instructions.The section on lifecycle management clearly explains how storage containers are retained and how to stop them when needed, completing the operational understanding of the storage services.
sebs/storage/config.py (6)
10-19
: Well-defined abstract base class with clear contract.The
PersistentStorageConfig
abstract base class defines a clear contract with theserialize
andenvs
abstract methods that concrete implementations must provide.
22-32
: Updated MinioConfig with additional attributes.The
MinioConfig
class is now properly inheriting fromPersistentStorageConfig
and includes the newversion
anddata_volume
attributes which are necessary for managing storage instances.
51-59
: Implementation of abstract methods in MinioConfig.The
serialize
method now simply returnsself.__dict__
and theenvs
method provides the necessary environment variables for Minio.
62-67
: NoSQL abstract class definition.The abstract class for NoSQL storage configurations defines the contract for serialization, providing a consistent approach across different types of storage.
69-82
: Comprehensive ScyllaDB configuration class.The
ScyllaDBConfig
class includes all necessary attributes for ScyllaDB management, including address, ports, credentials, and resource settings.
88-99
: Well-implemented ScyllaDB serialization methods.The
deserialize
andserialize
methods in theScyllaDBConfig
class follow a consistent pattern compared toMinioConfig
, making the code more maintainable.docs/platforms.md (4)
15-20
: Helpful platform summary with navigation links.Adding a concise list of supported platforms with links to their respective sections improves navigation within the document.
21-23
: Clear reference to storage documentation.The new "Storage Configuration" section provides a clear reference to the separate storage documentation, ensuring users are aware of the importance of storage configuration for their deployments.
148-149
: Updated GCP permissions requirements.The enhanced documentation now clearly specifies the required permissions for Google Cloud Functions, including the
Firestore Service Agent
role needed for creating Firestore databases.
185-185
: Helpful cross-reference to storage documentation.Adding a reference to the storage documentation in the OpenWhisk section ensures users understand the specific storage requirements for OpenWhisk deployments.
sebs.py (10)
9-9
: Updated import to include NoSQL storage types.The import has been updated to include the
NoSQLStorage
types, which is necessary for the new NoSQL storage functionality.
138-138
: Updated type annotation for storage configuration.The type annotation for
storage_configuration
has been changed fromOptional[str]
toOptional[List[str]]
to support multiple storage configuration files.
233-233
: Updated CLI option to support multiple storage configurations.The
--storage-configuration
CLI option now supports multiple values, allowing users to specify multiple storage configuration files.
390-392
: Enhanced storage command with more options.The
storage start
command now accepts different storage types ("object", "nosql", or "all") and reads configuration from a JSON file, providing more flexibility.
400-416
: Well-structured handling of object storage.The code for starting object storage is well-structured and includes proper initialization, configuration, and serialization of the storage instance.
418-434
: Well-structured handling of NoSQL storage.The code for starting NoSQL storage follows the same pattern as object storage, maintaining a consistent approach that makes the code more maintainable.
446-447
: Consistent interface for stopping storage.The
storage stop
command follows the same pattern asstorage start
, accepting the storage type and input JSON path.
454-464
: Proper shutdown of object storage.The code for stopping object storage properly deserializes the configuration, shuts down the storage instance, and logs the process.
466-476
: Proper shutdown of NoSQL storage.The code for stopping NoSQL storage follows the same pattern as object storage, maintaining a consistent approach.
707-708
: Improved error handling for unsupported resources.The code now raises a
NotImplementedError
for unsupported resource types, providing better error feedback to users.
We want to add NoSQL storage and a test benchmark implementing simple CRUD API - emulate a shopping cart, like on Amazon.
Summary by CodeRabbit
Updated Release Notes
New Features
Refactor
Documentation
Chores