Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ibis): persistent query cache #1088

Merged
merged 14 commits into from
Apr 2, 2025
Merged

Conversation

douenergy
Copy link
Contributor

@douenergy douenergy commented Mar 12, 2025

  • Implemented a persistent query cache system that saves results to disk as Parquet files (/tmp/wren-engine/)

  • Added an enable_cache parameter to the query endpoint to allow clients to opt-in to cache functionality.

  • Integrated cache hit/miss tracking via an X-Cache-Hit response header for monitoring and debugging.
    I prefer header instead of a json body field. Because clean separation of metadata from data, follows HTTP convention for metadata, doesn't modify the existing response structure.

Summary by CodeRabbit

  • New Features

    • Introduced query caching to improve performance by reducing redundant executions.
    • Endpoints now support an optional caching parameter and include response headers that indicate cache usage.
  • Tests

    • Added asynchronous tests to verify the caching behavior by comparing results on initial and subsequent requests for both v2 and v3 endpoints.

Copy link

coderabbitai bot commented Mar 12, 2025

Walkthrough

The update integrates a query caching feature into the ibis-server application. A new QueryCacheManager class is introduced to handle caching of query results with methods for generating keys, retrieving, and storing cached data. The application state is extended to include this manager, and the query endpoints in both v2 and v3 routers are modified to check for and utilize cached results. New tests have been added to verify cache functionality, ensuring that the caching mechanism operates as intended.

Changes

File(s) Change Summary
ibis-server/app/main.py Added new dependency on QueryCacheManager, introduced a new field query_cache_manager in the State class, and initialized it in the lifespan context manager.
ibis-server/app/query_cache/__init__.py Introduced the QueryCacheManager class with methods: get, set, _generate_cache_key, _get_cache_file_name, _get_full_path, and _get_dal_operator. OpenTelemetry tracing has been integrated into these methods.
ibis-server/app/routers/v2/connector.py
ibis-server/app/routers/v3/connector.py
Modified the query endpoint to support caching via a new cache_enable flag, retrieval of the cache manager using get_query_cache_manager, and inclusion of cache hit information in response headers.
ibis-server/tests/routers/v2/connector/test_postgres.py
ibis-server/tests/routers/v3/connector/postgres/test_query.py
ibis-server/tests/routers/v3/connector/postgres/test_fallback_v2.py
Introduced new asynchronous test functions (test_query_with_cache) to validate the caching mechanism, ensuring that responses reflect cache hits and misses.

Sequence Diagram(s)

Query Endpoint Flow

sequenceDiagram
    participant Client
    participant Router
    participant CacheManager
    participant DataSource

    Client->>Router: Send query request (cache_enable=true)
    Router->>CacheManager: get(data_source, sql, info)
    alt Cache Hit
        CacheManager-->>Router: Cached result
        Router->>Client: Return cached result (header: X-Cache-Hit: true)
    else Cache Miss
        CacheManager-->>Router: No cache found
        Router->>DataSource: Execute rewritten query
        DataSource-->>Router: Query result
        Router->>CacheManager: set(data_source, sql, result, info)
        Router->>Client: Return new result (header: X-Cache-Hit: false)
    end
Loading

Application Initialization Flow

sequenceDiagram
    participant Lifespan
    participant State
    participant CacheManager

    Lifespan->>State: Start application lifecycle
    State->>CacheManager: Instantiate QueryCacheManager
    CacheManager-->>State: Return manager instance
    State-->>Lifespan: Yield state with QueryCacheManager and JavaEngineConnector
Loading

Poem

Hop along, my dear coder friend,
I’ve bounced new caching magic to no end.
Query results now rest in cache so fine,
Like nuts stored safely, all in a line.
I hop with joy at every tested leap,
Celebrating code that’s solid and deep.
— A frolicsome rabbit in digital sleep.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eedbd81 and e42310b.

📒 Files selected for processing (3)
  • ibis-server/app/routers/v2/connector.py (3 hunks)
  • ibis-server/app/routers/v3/connector.py (3 hunks)
  • ibis-server/tests/routers/v3/connector/postgres/test_fallback_v2.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • ibis-server/app/routers/v3/connector.py
🧰 Additional context used
🧬 Code Definitions (2)
ibis-server/app/routers/v2/connector.py (6)
ibis-server/app/query_cache/__init__.py (2)
  • QueryCacheManager (12-75)
  • get (17-35)
ibis-server/app/model/connector.py (5)
  • query (56-57)
  • query (72-73)
  • query (112-115)
  • query (143-181)
  • query (197-203)
ibis-server/app/model/__init__.py (1)
  • QueryDTO (17-20)
ibis-server/app/mdl/java_engine.py (1)
  • JavaEngineConnector (11-51)
ibis-server/app/util.py (1)
  • build_context (98-101)
ibis-server/tests/routers/v3/connector/bigquery/conftest.py (1)
  • connection_info (24-29)
ibis-server/tests/routers/v3/connector/postgres/test_fallback_v2.py (1)
ibis-server/tests/routers/v3/connector/postgres/test_query.py (2)
  • test_query_with_cache (139-170)
  • manifest_str (96-97)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: ci
🔇 Additional comments (10)
ibis-server/app/routers/v2/connector.py (9)

23-23: LGTM for the addition of QueryCacheManager.

The import for the QueryCacheManager class is correctly added, which will be used to implement the persistent query cache functionality.


34-36: Good implementation of the dependency injection pattern.

The get_query_cache_manager function follows the same pattern as the existing get_java_engine_connector function, making the code consistent and maintainable.


45-45: LGTM: Parameter name aligns with previous review comments.

The cache_enable parameter was implemented as a query parameter rather than a header field, which aligns with the reviewer's suggestion in the previous comments. Good job using the Query annotation with an alias for consistent naming conventions.


48-48: LGTM: Proper dependency injection for the cache manager.

The query_cache_manager is correctly injected using the Depends function with the previously defined helper function.


51-56: Good addition of detailed span names for better observability.

The span name now includes information about whether it's a dry run and whether caching is enabled, which will improve tracing and monitoring capabilities. This addresses the reviewer's suggestion to add more tracing context.


73-79: LGTM: Correct handling of dry runs.

The code explicitly separates the handling of dry runs and does not check the cache for dry runs. This aligns with the reviewer's comment that dry runs should not use caching as they are only validating SQL syntax, not executing the query.


85-90: LGTM: Proper cache key generation and retrieval.

The cache key is appropriately generated based on the data source, original SQL query (not the rewritten one), and connection information. This ensures that different data sources or connection details result in different cache entries even for the same SQL query.


91-95: LGTM: Added tracing event for cache hits as requested.

Good addition of the tracing event for cache hits and proper response header for cache hit indication. This directly addresses the reviewer's suggestion to add trace events for cache hits.


96-105: LGTM: Proper handling of cache misses and result caching.

The implementation correctly executes the query when a cache miss occurs, stores the result in the cache if caching is enabled, and sets the appropriate cache hit header. The code maintains a clean separation between execution flow and caching logic.

ibis-server/tests/routers/v3/connector/postgres/test_fallback_v2.py (1)

48-80: LGTM: Comprehensive test for the caching functionality.

The test correctly verifies both the cache miss on the first request and the cache hit on the second request by checking the X-Cache-Hit header. It also confirms data integrity by comparing the results from both requests. This matches the similar test in the v3 connector test file.

One small observation: while this test covers the basic caching functionality, consider adding tests for edge cases in future PRs:

  1. Different SQL queries producing different cache entries
  2. Different connection information producing different cache entries
  3. Verifying behavior when cache_enable is set to false
✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added ibis dependencies Pull requests that update a dependency file python Pull requests that update Python code labels Mar 12, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (9)
ibis-server/app/routers/v2/connector.py (2)

57-93: Consider adding cache invalidation mechanisms.

While the current implementation provides a solid foundation for query caching, it might be beneficial to add mechanisms for cache invalidation, such as a time-to-live (TTL) parameter or explicit invalidation endpoints. This would help manage cache freshness, especially for frequently changing data.


57-93: Consider adding cache control headers for more granular caching control.

For enhanced client control over caching behavior, you might want to implement support for standard HTTP cache control headers. This would allow clients to specify caching preferences like max-age or no-cache on a per-request basis.

ibis-server/app/query_cache/__init__.py (5)

9-11: Constructor could be more robust with configuration options

The __init__ method is currently empty. Consider adding configuration options such as base cache directory path and cache expiration policy to make the cache manager more flexible.

-def __init__(self):
-    pass
+def __init__(self, cache_dir: str = "/tmp/wren-engine/", cache_expiry_seconds: Optional[int] = None):
+    self.cache_dir = cache_dir
+    self.cache_expiry_seconds = cache_expiry_seconds
+    os.makedirs(self.cache_dir, exist_ok=True)

13-27: Add cache expiry check and more specific return type

The cache retrieval logic doesn't check for stale cache entries. Additionally, the return type could be more specific.

-def get(self, data_source: str, sql: str, info) -> Optional[Any]:
+def get(self, data_source: str, sql: str, info) -> Optional[pd.DataFrame]:
     cache_key = self._generate_cache_key(data_source, sql, info)
     cache_path = self._get_cache_path(cache_key)
 
     # Check if cache file exists
     if os.path.exists(cache_path):
+        # Check cache expiry if configured
+        if self.cache_expiry_seconds:
+            file_age = time.time() - os.path.getmtime(cache_path)
+            if file_age > self.cache_expiry_seconds:
+                logger.debug(f"Cache expired: {cache_path}")
+                return None
+
         try:
             cache = ibis.read_parquet(cache_path)
             df = cache.execute()
             return df
         except Exception as e:
             logger.debug(f"Failed to read query cache {e}")
             return None

37-37: Use consistent log levels for caching operations

You're using logger.info for successful writes but logger.debug for failed operations. For consistency, either use debug level for all cache operations or info level for all.

-logger.info(f"\nWriting query cache to {cache_path}\n")
+logger.debug(f"Writing query cache to {cache_path}")

43-53: Consider security implications of including sensitive information in cache keys

The cache key includes sensitive connection information. While this ensures cache entries are specific to different connections, it could potentially expose sensitive information in logs or error messages.

Consider using non-sensitive identifiers or applying additional hashing to sensitive values:

def _generate_cache_key(self, data_source: str, sql: str, info) -> str:
    key_parts = [
        data_source,
        sql,
-       info.host.get_secret_value(),
-       info.port.get_secret_value(),
-       info.user.get_secret_value(),
+       hashlib.sha256(info.host.get_secret_value().encode()).hexdigest(),
+       hashlib.sha256(str(info.port.get_secret_value()).encode()).hexdigest(),
+       hashlib.sha256(info.user.get_secret_value().encode()).hexdigest(),
    ]
    key_string = "|".join(key_parts)

    return hashlib.sha256(key_string.encode()).hexdigest()

1-7: Add missing imports for suggested improvements

If implementing the suggested improvements, you'll need these additional imports.

 import hashlib
 import os
+import time
 from typing import Any, Optional
 
 import ibis
+import pandas as pd
 from loguru import logger
ibis-server/tests/routers/v2/query_cache/test_cache.py (2)

124-131: Use a more robust approach for test cache directory

The current approach creates a directory in /tmp, which may not work on all platforms. Consider using the built-in pytest tmp_path fixture instead.

@pytest.fixture(scope="function")
-def cache_dir():
-    temp_dir = "/tmp/wren-engine-test"
-    os.makedirs(temp_dir, exist_ok=True)
-    yield temp_dir
-    # Clean up after the test
-    shutil.rmtree(temp_dir, ignore_errors=True)
+def cache_dir(tmp_path):
+    cache_dir = tmp_path / "wren-engine-test"
+    cache_dir.mkdir(exist_ok=True)
+    yield str(cache_dir)
+    # Clean up happens automatically with tmp_path

1-14: Add a test for the enable_cache parameter

The PR summary mentions an enable_cache parameter to opt-in to caching, but there's no test to verify that caching is disabled when this parameter is not set to True.

Consider adding a test function that verifies caching behavior when the enable_cache flag is set to False:

async def test_query_without_cache(
    client, manifest_str, postgres: PostgresContainer, cache_dir, monkeypatch
):
    # Override the cache path to use our test directory
    monkeypatch.setattr(
        QueryCacheManager,
        "_get_cache_path",
        lambda self, key: f"{cache_dir}/{key}.cache",
    )

    connection_info = _to_connection_info(postgres)
    test_sql = 'SELECT * FROM "Orders" LIMIT 10'

    # First request with enable_cache=False
    response1 = await client.post(
        url=f"{base_url}/query",
        json={
            "connectionInfo": connection_info,
            "manifestStr": manifest_str,
            "sql": test_sql,
            "enable_cache": False,
        },
    )
    assert response1.status_code == 200
    
    # Second request with same SQL - should still miss cache
    response2 = await client.post(
        url=f"{base_url}/query",
        json={
            "connectionInfo": connection_info,
            "manifestStr": manifest_str,
            "sql": test_sql,
            "enable_cache": False,
        },
    )
    assert response2.status_code == 200
    
    # Verify cache wasn't used
    assert response2.headers.get("X-Cache-Hit") == "false", "Cache should not be used when enable_cache is False"
    
    # No cache files should be created
    assert len(os.listdir(cache_dir)) == 0, "No cache files should be created when enable_cache is False"
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e1453e4 and b598698.

📒 Files selected for processing (6)
  • ibis-server/app/main.py (2 hunks)
  • ibis-server/app/model/__init__.py (1 hunks)
  • ibis-server/app/query_cache/__init__.py (1 hunks)
  • ibis-server/app/routers/v2/connector.py (3 hunks)
  • ibis-server/pyproject.toml (1 hunks)
  • ibis-server/tests/routers/v2/query_cache/test_cache.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: ci
🔇 Additional comments (14)
ibis-server/app/model/__init__.py (1)

21-21: LGTM! Clean implementation of the cache opt-in flag.

The addition of enable_cache as a boolean field with a default value of False is a good implementation of the opt-in caching approach mentioned in the PR objectives. This ensures backward compatibility as existing clients won't have their queries cached unless they explicitly set this flag.

ibis-server/pyproject.toml (1)

78-78: Good test organization with the new cache marker.

Adding a dedicated pytest marker for cache tests facilitates better test organization and allows selectively running only cache-related tests during development or debugging.

ibis-server/app/main.py (3)

16-16: LGTM! Appropriate import for the new cache manager.

The import statement for the QueryCacheManager is correctly placed with other application imports.


25-26: LGTM! Properly extending the application state.

Adding the QueryCacheManager to the State class ensures it's accessible throughout the application lifecycle.


31-37: LGTM! Proper initialization and state management for the cache manager.

The changes correctly initialize the QueryCacheManager and include it in the yielded state dictionary. This approach properly integrates the cache manager into the application's lifecycle.

ibis-server/app/routers/v2/connector.py (8)

22-22: LGTM! Appropriate import for the QueryCacheManager.

The import is correctly placed with other application imports.


33-35: LGTM! Well-structured dependency injection for the cache manager.

The helper function follows the same pattern as get_java_engine_connector, which maintains consistency in the codebase.


44-44: LGTM! Properly injecting the cache manager dependency.

Correctly added the QueryCacheManager as a dependency to the query endpoint.


53-56: LGTM! Good initialization of cache-related variables.

Variables are clearly named and initialized with appropriate default values.


57-62: LGTM! Clean implementation of cache retrieval.

The code properly checks if caching is enabled before attempting to retrieve cached results. The cache key appears to be constructed from the data source, SQL query, and connection info, which should provide good cache specificity.


64-68: LGTM! Clean cache hit handling.

The implementation correctly returns the cached result and sets the X-Cache-Hit header to indicate a cache hit, as specified in the PR objectives.


78-83: LGTM! Consistent header handling for dry runs.

The X-Cache-Hit header is consistently applied even for dry run responses, which helps maintain a uniform API behavior.


84-93: LGTM! Complete cache miss and result storage implementation.

The code properly executes the query when there's a cache miss, and conditionally stores the result in the cache if caching is enabled. The X-Cache-Hit header is consistently set to indicate the cache status.

ibis-server/tests/routers/v2/query_cache/test_cache.py (1)

18-101: LGTM! Comprehensive test manifest

The test manifest provides a thorough representation of the database schema for testing, including various column types and relationships.

Copy link
Contributor

@goldmedal goldmedal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @douenergy. Overall, it looks great, but something is missing.

  • This PR only implements the cache for the v2 API. However, the v3 API also requires it.
  • As per our offline discussion yesterday, we want to make the cache storing be async. However, I think we don't need to do that in this PR. We can have another follow-up PR for the async thing. If you want to do that here, it's also ok for me.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b598698 and d8eaf42.

📒 Files selected for processing (2)
  • ibis-server/app/routers/v2/connector.py (4 hunks)
  • ibis-server/pyproject.toml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • ibis-server/pyproject.toml
🧰 Additional context used
🧬 Code Definitions (1)
ibis-server/app/routers/v2/connector.py (1)
ibis-server/app/query_cache/__init__.py (2)
  • QueryCacheManager (9-56)
  • get (13-27)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: ci
🔇 Additional comments (8)
ibis-server/app/routers/v2/connector.py (8)

23-23: Import of QueryCacheManager added correctly

The new import for QueryCacheManager has been properly added to support the caching functionality.


34-36: LGTM - Clean implementation of cache manager dependency

This function follows the same pattern as get_java_engine_connector, providing a consistent approach to dependency injection in FastAPI.


47-47: Cache manager dependency properly injected

Good use of FastAPI's dependency injection system to pass the query cache manager to the query endpoint.


78-89: Consider adding tracing for cache-related events

Adding tracing events for cache operations would improve observability, especially for debugging and monitoring cache effectiveness.

if enable_cache:
    cached_result = query_cache_manager.get(
        str(data_source), dto.sql, dto.connection_info
    )
    cache_hit = cached_result is not None
+    if cache_hit:
+        tracer.get_current_span().add_event("cache_hit")
+    else:
+        tracer.get_current_span().add_event("cache_miss")

90-94: LGTM - Clean implementation of cache hit response path

The code properly handles the case when a cached result is found, returning it with the appropriate cache hit header.


94-103: LGTM - Query execution and caching logic looks good

The implementation correctly executes the query when no cached result is found, caches the result if caching is enabled, and sets the appropriate cache header.


80-82: Clarify dry run behavior with caching

It's unclear what the expected behavior should be when both dry_run and enable_cache are true. Should we skip the cache check for dry runs entirely (as implemented), or should we check the cache first and skip the dry run if we get a hit?

Based on a previous review comment, there are two potential approaches:

  1. Skip cache for dry runs (current implementation) since dry runs are used to validate SQL which is already validated if in cache
  2. Always perform dry runs regardless of cache since they're inexpensive

Could you clarify the intended behavior for this scenario?


92-92: Consider using a query parameter instead of a header for cache hit information

As suggested in a previous review comment, you might want to consider using a query parameter instead of a header for indicating cache hits, similar to the dry-run parameter pattern. This would provide a more consistent API design.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
ibis-server/app/routers/v3/connector.py (1)

63-65: X-Cache-Hit header in dry run response.

Setting X-Cache-Hit to "true" for dry run responses may be misleading since no actual caching occurs during a dry run.

Consider setting this to a more appropriate value like "not_applicable" or removing it for dry runs to avoid confusion:

- dry_response.headers["X-Cache-Hit"] = "true"
+ dry_response.headers["X-Cache-Hit"] = "not_applicable"
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c93665 and cb10f96.

📒 Files selected for processing (2)
  • ibis-server/app/routers/v3/connector.py (3 hunks)
  • ibis-server/tests/routers/v3/connector/postgres/test_query.py (3 hunks)
🧰 Additional context used
🧬 Code Definitions (2)
ibis-server/tests/routers/v3/connector/postgres/test_query.py (1)
ibis-server/tests/routers/v2/connector/test_postgres.py (3)
  • cache_dir (131-136)
  • test_query_with_cache (179-221)
  • manifest_str (109-110)
ibis-server/app/routers/v3/connector.py (2)
ibis-server/app/routers/v2/connector.py (3)
  • get_java_engine_connector (30-31)
  • get_query_cache_manager (34-35)
  • query (41-108)
ibis-server/app/model/connector.py (10)
  • dry_run (59-63)
  • dry_run (76-77)
  • dry_run (84-92)
  • dry_run (118-120)
  • dry_run (206-212)
  • query (56-57)
  • query (72-73)
  • query (112-115)
  • query (143-181)
  • query (197-203)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: ci
🔇 Additional comments (12)
ibis-server/tests/routers/v3/connector/postgres/test_query.py (4)

2-3: Appropriate imports added for caching functionality.

These imports are necessary for managing the temporary cache directory used in testing. os is used for directory operations, and shutil for cleanup.


8-8: Imports QueryCacheManager correctly.

The import of QueryCacheManager is properly added to support the new cache testing functionality.


103-109: Well-implemented cache directory fixture.

This fixture correctly creates a temporary directory for testing cache functionality and ensures proper cleanup after tests. Using exist_ok=True is a good practice to prevent errors if the directory already exists.


151-191: Comprehensive test case for caching functionality.

This test thoroughly validates the cache mechanism by:

  1. Correctly overriding the cache path using monkeypatch
  2. Testing both cache miss (first request) and cache hit (second request) scenarios
  3. Verifying the cache hit/miss is reported in the response headers
  4. Confirming that the data returned is consistent between cached and non-cached responses

The test structure follows best practices by making assertions for both the cache hit status and data consistency.

ibis-server/app/routers/v3/connector.py (8)

23-23: Appropriate import for query cache functionality.

The QueryCacheManager import is correctly added to support the new caching feature.


25-25: Updated import for cache manager dependency.

The import has been properly updated to include the get_query_cache_manager function.


40-40: New cache_enable parameter correctly implemented.

The cache_enable parameter is properly added with a default value of False, making caching opt-in as described in the PR objectives.


44-44: Query cache manager dependency correctly added.

The QueryCacheManager is properly injected as a dependency using the Depends function.


46-51: Improved span naming for observability.

The span naming logic has been enhanced to include cache status, which is helpful for observability and tracing performance differences between cached and non-cached requests.


67-78: Cache lookup implementation.

The cache lookup logic is properly implemented. It checks for cache hits only when caching is enabled and uses the correct parameters (data source, SQL, and connection info) to generate the cache key.


79-84: Cache hit handling.

The cache hit logic is correctly implemented:

  1. Adding a span event for monitoring cache hits
  2. Returning the cached result with the appropriate header
  3. Converting the boolean cache_hit to lowercase string

85-92: Cache miss handling.

The cache miss and update logic is properly implemented. It executes the query, stores the result in the cache if enabled, and includes the X-Cache-Hit header in the response.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
ibis-server/tests/routers/v3/connector/postgres/test_query.py (1)

139-171: Well-structured test for verifying cache hits and misses.

Your test correctly checks that the response headers reflect a cache miss on the first request and a hit on the second, verifying that the data, columns, and data types are identical. As an enhancement, consider adding additional test cases, such as varying the SQL query slightly or testing different parameters to ensure that only truly identical queries produce cache hits.

ibis-server/app/query_cache/__init__.py (1)

13-14: Consider externalizing the cache root path further.

Although the constructor parameterizes root, defaulting to "/tmp/wren-engine/", you might consider reading it from a config file or an environment variable for flexibility across different platforms and deployment environments (e.g., Windows vs. Linux).

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cb10f96 and 3a7111b.

📒 Files selected for processing (3)
  • ibis-server/app/query_cache/__init__.py (1 hunks)
  • ibis-server/tests/routers/v2/connector/test_postgres.py (1 hunks)
  • ibis-server/tests/routers/v3/connector/postgres/test_query.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • ibis-server/tests/routers/v2/connector/test_postgres.py
🧰 Additional context used
🧬 Code Definitions (1)
ibis-server/tests/routers/v3/connector/postgres/test_query.py (1)
ibis-server/tests/routers/v2/connector/test_postgres.py (2)
  • test_query_with_cache (167-200)
  • manifest_str (106-107)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: ci
🔇 Additional comments (1)
ibis-server/app/query_cache/__init__.py (1)

55-66: Review potential PII within cache keys.

The _generate_cache_key method includes user credentials such as host, port, and username in the hash input. While the resulting file name is a SHA-256 hash, ensure no logs or error messages leak these credential strings. Restricting logs to the hash or redacting sensitive info helps maintain data privacy.

@douenergy
Copy link
Contributor Author

@goldmedal I have addressed all the comments including

  1. Modified the tracking span name (appended cache_enable)
  2. Added tracing for cache hit
  3. Added v3 cache
  4. Used OpenDAL to unify the FS interface
  5. Moved test cases from separate files to Postgres test
  6. Used HTTP query params instead of body params
  7. For dry run queries, we don't care about the cache now - we just do what dry runs do because dry runs are just rewrites compared to execution, which is cheaper (I think this is a good idea)

Copy link
Contributor

@goldmedal goldmedal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @douenergy. 👍

@goldmedal goldmedal merged commit 7e90148 into Canner:main Apr 2, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file ibis python Pull requests that update Python code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants