Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry pick2 before #11

Open
wants to merge 5 commits into
base: feature/vector-index
Choose a base branch
from
Open

Conversation

CalvinNeo
Copy link
Owner

@CalvinNeo CalvinNeo commented Aug 14, 2024

What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:

What is changed and how it works?


Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

Summary by CodeRabbit

  • New Features

    • Introduced granular control over error simulation with a new fail point for file cache downloads.
    • Added a method to check key existence in the cache without altering its state.
    • Enhanced vector index handling with comprehensive updates to related structures and functions for better performance and clarity.
  • Bug Fixes

    • Improved error handling in file cache management methods to prevent failures during downloads and cache retrieval.
  • Documentation

    • Updated comments and documentation to reflect changes in vector index structures and their functionalities.
  • Refactor

    • Streamlined the code architecture by separating vector index functionalities into distinct builder and viewer classes, enhancing clarity and modularity.
  • Tests

    • Expanded testing coverage for vector index interactions with S3, validating performance and caching behaviors under various scenarios.

breezewish and others added 5 commits August 12, 2024 15:54
Signed-off-by: Wish <breezewish@outlook.com>
Signed-off-by: Wish <breezewish@outlook.com>
Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>
Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>
Copy link

coderabbitai bot commented Aug 14, 2024

Walkthrough

This update introduces significant improvements to the handling of vector indices within the database management system. Key changes include the restructuring of vector index management into distinct builder and viewer classes, enhancing clarity and efficiency. Additionally, several files have been modified to improve cache management, performance tracking, and S3 integration, ensuring robust error handling and better resource management. Overall, the modifications streamline interactions with vector indices and improve the system's performance and maintainability.

Changes

File(s) Change Summary
contrib/tipb Updated subproject commit identifier.
dbms/src/Common/FailPoint.cpp, dbms/src/Common/LRUCache.h Introduced a new fail point for file cache downloads and added a method to check key existence in LRUCache.
dbms/src/Interpreters/Context.cpp, dbms/src/Interpreters/Context.h Modified vector index cache parameter names for clarity, reflecting a shift from size to entity count.
dbms/src/Storages/DeltaMerge/DMContext.h Updated DMContext constructor to accept an optional scan context parameter.
dbms/src/Storages/DeltaMerge/DeltaMergeDefines.h Changed vector index member type in ColumnDefine from VectorIndexInfoPtr to VectorIndexDefinitionPtr.
dbms/src/Storages/DeltaMerge/File/ColumnStat.h, dbms/src/Storages/DeltaMerge/File/DMFileWriter.cpp Updated vector index types to improve handling within ColumnStat and DMFileWriter.
dbms/src/Storages/DeltaMerge/File/DMFileWithVectorIndexBlockInputStream.cpp Implemented a new input stream for reading data with vector indexing capabilities.
dbms/src/Storages/DeltaMerge/File/DMFileWriter.h, dbms/src/Storages/DeltaMerge/Index/VectorIndex.h Introduced vector index builder and viewer classes for improved separation of concerns and handling of vector indices.
dbms/src/Storages/S3/FileCache.cpp, dbms/src/Storages/S3/FileCache.h Enhanced file cache management with new methods for file downloading and retrieval, improving synchronization and error handling.
dbms/src/Storages/DeltaMerge/tests/gtest_dm_vector_index.cpp Added tests for vector index interactions with S3 and improved vector index definitions in tests.
dbms/src/TiDB/Schema/TiDB.cpp, dbms/src/TiDB/Schema/TiDB.h Updated vector index handling to utilize the tipb namespace for improved type safety and consistency.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Cache
    participant S3
    participant VectorIndexBuilder
    participant VectorIndexViewer

    User->>Cache: Request file
    Cache->>S3: Check file existence
    S3-->>Cache: File status
    Cache->>VectorIndexBuilder: Prepare index
    VectorIndexBuilder-->>Cache: Index ready
    Cache-->>User: Return file with index
Loading

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@CalvinNeo
Copy link
Owner Author

@coderabbitai review

Copy link

coderabbitai bot commented Aug 14, 2024

Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Outside diff range, codebase verification and nitpick comments (1)
dbms/src/Storages/S3/FileCache.cpp (1)

68-107: Consider defining a constant for the timeout value.

The use of a magic number (300 seconds) for the timeout can be improved by defining a named constant for better readability and maintainability.

constexpr double WAIT_TIMEOUT_SECONDS = 300.0;

// Replace 300 with WAIT_TIMEOUT_SECONDS in the code.
Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 32d911b and e9a6e68.

Files selected for processing (36)
  • contrib/tipb (1 hunks)
  • dbms/src/Common/FailPoint.cpp (1 hunks)
  • dbms/src/Common/LRUCache.h (1 hunks)
  • dbms/src/Interpreters/Context.cpp (3 hunks)
  • dbms/src/Interpreters/Context.h (1 hunks)
  • dbms/src/Server/Server.cpp (1 hunks)
  • dbms/src/Storages/DeltaMerge/DMContext.h (2 hunks)
  • dbms/src/Storages/DeltaMerge/DeltaMergeDefines.h (1 hunks)
  • dbms/src/Storages/DeltaMerge/File/ColumnStat.h (1 hunks)
  • dbms/src/Storages/DeltaMerge/File/DMFileWithVectorIndexBlockInputStream.cpp (1 hunks)
  • dbms/src/Storages/DeltaMerge/File/DMFileWithVectorIndexBlockInputStream.h (5 hunks)
  • dbms/src/Storages/DeltaMerge/File/DMFileWriter.cpp (4 hunks)
  • dbms/src/Storages/DeltaMerge/File/DMFileWriter.h (4 hunks)
  • dbms/src/Storages/DeltaMerge/File/VectorColumnFromIndexReader.cpp (5 hunks)
  • dbms/src/Storages/DeltaMerge/File/VectorColumnFromIndexReader.h (4 hunks)
  • dbms/src/Storages/DeltaMerge/File/dtpb/dmfile.proto (3 hunks)
  • dbms/src/Storages/DeltaMerge/Index/RSIndex.h (1 hunks)
  • dbms/src/Storages/DeltaMerge/Index/VectorIndex.cpp (2 hunks)
  • dbms/src/Storages/DeltaMerge/Index/VectorIndex.h (1 hunks)
  • dbms/src/Storages/DeltaMerge/Index/VectorIndexCache.cpp (1 hunks)
  • dbms/src/Storages/DeltaMerge/Index/VectorIndexCache.h (1 hunks)
  • dbms/src/Storages/DeltaMerge/Index/VectorIndexHNSW/Index.cpp (5 hunks)
  • dbms/src/Storages/DeltaMerge/Index/VectorIndexHNSW/Index.h (1 hunks)
  • dbms/src/Storages/DeltaMerge/Index/VectorIndex_fwd.h (1 hunks)
  • dbms/src/Storages/DeltaMerge/Index/VectorSearchPerf.cpp (1 hunks)
  • dbms/src/Storages/DeltaMerge/Index/VectorSearchPerf.h (1 hunks)
  • dbms/src/Storages/DeltaMerge/Remote/Serializer.h (2 hunks)
  • dbms/src/Storages/DeltaMerge/ScanContext.h (6 hunks)
  • dbms/src/Storages/DeltaMerge/tests/gtest_dm_vector_index.cpp (26 hunks)
  • dbms/src/Storages/S3/FileCache.cpp (8 hunks)
  • dbms/src/Storages/S3/FileCache.h (7 hunks)
  • dbms/src/Storages/S3/FileCachePerf.cpp (1 hunks)
  • dbms/src/Storages/S3/FileCachePerf.h (1 hunks)
  • dbms/src/TiDB/Schema/TiDB.cpp (3 hunks)
  • dbms/src/TiDB/Schema/TiDB.h (1 hunks)
  • dbms/src/TiDB/Schema/VectorIndex.h (1 hunks)
Files skipped from review due to trivial changes (1)
  • contrib/tipb
Additional comments not posted (85)
dbms/src/Storages/S3/FileCachePerf.cpp (1)

1-22: File Setup Looks Good.

The licensing information and namespace declaration are correctly formatted and follow standard conventions.

dbms/src/Storages/DeltaMerge/Index/VectorSearchPerf.cpp (1)

1-22: File Setup Looks Good.

The licensing information and namespace declaration are correctly formatted and follow standard conventions.

dbms/src/Storages/DeltaMerge/Index/RSIndex.h (1)

Line range hint 13-23:
Simplification of RSIndex Structure Looks Good.

The removal of VectorIndexPtr and its constructor simplifies the RSIndex structure, likely improving clarity and reducing complexity.

Ensure that any dependencies on the removed VectorIndexPtr are updated accordingly in the codebase.

Verification successful

Removal of VectorIndexPtr is Complete.

The search did not find any remaining references to VectorIndexPtr, indicating that its removal from the codebase is thorough and no dependencies remain.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify dependencies on the removed `VectorIndexPtr`.

# Test: Search for any remaining usage of `VectorIndexPtr` in the codebase.
rg --type cpp 'VectorIndexPtr'

Length of output: 30

dbms/src/Storages/DeltaMerge/Index/VectorIndex_fwd.h (1)

24-28: Restructuring of vector index handling.

The introduction of VectorIndexBuilder and VectorIndexViewer classes, along with their respective smart pointer types, enhances the modularity of the vector index system. This separation of concerns can improve maintainability and potentially introduce new capabilities or optimizations in the indexing process.

dbms/src/Storages/S3/FileCachePerf.h (1)

27-35: Introduction of performance tracking for S3 file caching.

The FileCachePerfContext structure and the thread-local instance file_cache provide a mechanism to track performance metrics related to S3 file caching. This can be beneficial for identifying bottlenecks and optimizing performance in multi-threaded environments.

dbms/src/Storages/DeltaMerge/Index/VectorSearchPerf.h (1)

27-35: Introduction of performance tracking for vector search operations.

The VectorSearchPerfContext structure and the thread-local instance vector_search provide a mechanism to track performance metrics related to vector search operations. This can aid in performance optimization and debugging in multi-threaded environments.

dbms/src/Storages/DeltaMerge/Index/VectorIndexHNSW/Index.h (3)

24-25: Update in USearchImplType usage.

The change from VectorIndex::Key to VectorIndexBuilder::Key suggests a shift in the design pattern, possibly to decouple responsibilities. Ensure that this change is consistent across the codebase and that all dependencies are updated accordingly.

Verification successful

Consistent usage of USearchImplType with VectorIndexBuilder::Key.

The change from VectorIndex::Key to VectorIndexBuilder::Key is consistently reflected across the codebase. No issues found with the implementation.

  • USearchImplType is used consistently with VectorIndexBuilder::Key in both Index.h and Index.cpp.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the usage of `USearchImplType` with `VectorIndexBuilder::Key` across the codebase.

# Test: Search for the usage of `USearchImplType`. Expect: Consistent usage with `VectorIndexBuilder::Key`.
rg --type cpp -A 3 $'USearchImplType'

Length of output: 2454


41-55: Addition of VectorIndexHNSWViewer.

The new class VectorIndexHNSWViewer provides functionality for viewing vector indices. The view method and search method are key components. Ensure that these methods are correctly implemented and handle all edge cases, especially regarding file properties and query information.

Verification successful

Verification of VectorIndexHNSWViewer methods successful.

The view and search methods in VectorIndexHNSWViewer are implemented correctly, handling file properties and query information appropriately. The view method validates the index kind and distance metric, while the search method ensures the query vector size matches the index dimensions. No issues found.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the implementation of `view` and `search` methods in `VectorIndexHNSWViewer`.

# Test: Search for the implementation of `view` and `search` methods. Expect: Correct handling of file properties and query information.
ast-grep --lang cpp --pattern $'class VectorIndexHNSWViewer {
  $$$
  static $_ view($_, $_) {
    $$$
  }
  $$$
  std::vector<Key> search($_, $_) const {
    $$$
  }
  $$$
}'

Length of output: 521


Script:

#!/bin/bash
# Search for the implementation of the `view` method in the `VectorIndexHNSWViewer` class.
rg -A 10 'VectorIndexHNSWViewer::view' --type=cpp

# Search for the implementation of the `search` method in the `VectorIndexHNSWViewer` class.
rg -A 10 'VectorIndexHNSWViewer::search' --type=cpp

Length of output: 3154


27-39: Introduction of VectorIndexHNSWBuilder.

The new class VectorIndexHNSWBuilder inherits from VectorIndexBuilder, indicating a refactoring effort to separate concerns. The constructor now takes a VectorIndexDefinitionPtr, which aligns with the new structure changes. Ensure that the logic within addBlock and save methods aligns with the intended functionality and that all necessary validations are in place.

dbms/src/Storages/DeltaMerge/Index/VectorIndexCache.h (1)

30-77: Introduction of VectorIndexCache.

The VectorIndexCache class uses an LRU cache to manage vector index viewers. The implementation includes thread management for cleaning outdated cache entries. Ensure that the cache eviction logic is correctly implemented and that the thread management handles shutdown scenarios gracefully.

Verification successful

Cache Eviction and Thread Management Verified in VectorIndexCache

The VectorIndexCache class correctly implements cache eviction and thread management. The cleanOutdatedCacheEntries method efficiently removes outdated cache entries, and the cleanOutdatedLoop method manages thread execution and shutdown gracefully.

  • cleanOutdatedCacheEntries: Properly checks and removes outdated entries.
  • cleanOutdatedLoop: Manages the cleaning loop with shutdown handling.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the implementation of cache eviction and thread management in `VectorIndexCache`.

# Test: Search for the implementation of `cleanOutdatedCacheEntries` and `cleanOutdatedLoop` methods. Expect: Proper eviction logic and thread management.
ast-grep --lang cpp --pattern $'class VectorIndexCache {
  $$$
  size_t cleanOutdatedCacheEntries() {
    $$$
  }
  $$$
  void cleanOutdatedLoop() {
    $$$
  }
  $$$
}'

Length of output: 170


Script:

#!/bin/bash
# Search for the definition of `cleanOutdatedCacheEntries` in the codebase.
rg 'size_t cleanOutdatedCacheEntries\(\)' -A 10

# Search for the definition of `cleanOutdatedLoop` in the codebase.
rg 'void cleanOutdatedLoop\(\)' -A 10

Length of output: 1731


Script:

#!/bin/bash
# Search for the implementation of `cleanOutdatedCacheEntries` in the codebase.
rg 'size_t VectorIndexCache::cleanOutdatedCacheEntries\(\)' -A 20

# Search for the implementation of `cleanOutdatedLoop` in the codebase.
rg 'void VectorIndexCache::cleanOutdatedLoop\(\)' -A 20

Length of output: 3498

dbms/src/TiDB/Schema/VectorIndex.h (2)

27-40: Refactoring to VectorIndexDefinition.

The refactoring consolidates vector index information into VectorIndexDefinition, aligning with TiDB schema. Ensure that this change is consistently applied across the codebase and that all dependencies are updated.

Verification successful

Refactoring to VectorIndexDefinition is consistent across the codebase.

The refactoring of VectorIndexDefinition has been applied consistently across the codebase. There are no remaining references to VectorIndexInfo, indicating a thorough update. The changes align the structure with TiDB's schema definitions as intended.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the usage of `VectorIndexDefinition` across the codebase.

# Test: Search for the usage of `VectorIndexDefinition`. Expect: Consistent usage and removal of old `VectorIndexInfo` references.
rg --type cpp -A 3 $'VectorIndexDefinition'

Length of output: 9363


45-66: Formatter specialization for VectorIndexDefinition.

The formatter specialization has been updated to reflect the new structure. Ensure that the formatting logic correctly represents the VectorIndexDefinition and handles null pointers appropriately.

Verification successful

Formatter specialization for VectorIndexDefinition is correctly implemented.

The formatter specialization for VectorIndexDefinition is present and, along with VectorIndexDefinitionPtr, appears to handle null pointers appropriately through the use of smart pointers. The code is consistent with the expected design.

  • dbms/src/TiDB/Schema/VectorIndex.h: Contains the formatter specializations for both VectorIndexDefinition and VectorIndexDefinitionPtr.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the formatter specialization for `VectorIndexDefinition`.

# Test: Search for the implementation of formatter specialization. Expect: Correct handling of `VectorIndexDefinition` and null pointers.
ast-grep --lang cpp --pattern $'template <>
struct fmt::formatter<TiDB::VectorIndexDefinition> {
  $$$
  auto format(const TiDB::VectorIndexDefinition & vi, $_) const {
    $$$
  }
  $$$
}'

Length of output: 190


Script:

#!/bin/bash
# Search for any formatter specializations related to `VectorIndexDefinition`.
rg --context 5 'formatter<.*VectorIndexDefinition.*>' --type cpp

# Search for usages of `VectorIndexDefinition` to check for null pointer handling.
rg 'VectorIndexDefinition' --type cpp

Length of output: 4892

dbms/src/Storages/DeltaMerge/Index/VectorIndex.cpp (3)

Line range hint 32-39:
LGTM!

The implementation correctly checks for nullable types and ensures they are DataTypeFloat32.


41-54: LGTM!

The function is well-structured with appropriate runtime checks and error handling for unsupported vector index kinds.


58-71: LGTM!

The function is correctly implemented with necessary checks and error handling for unsupported index kinds.

dbms/src/Storages/DeltaMerge/Index/VectorIndex.h (2)

31-55: LGTM!

The VectorIndexBuilder class is well-structured, encapsulating the logic for building vector indices and ensuring type support.


58-90: LGTM!

The VectorIndexViewer class is well-designed, focusing on viewing functionalities and maintaining consistency with the builder class.

dbms/src/Storages/DeltaMerge/File/VectorColumnFromIndexReader.h (1)

Line range hint 47-71:
LGTM!

The changes improve the design by aligning with the viewer-based approach, enhancing data retrieval efficiency.

dbms/src/Storages/DeltaMerge/Index/VectorIndexCache.cpp (4)

23-61: LGTM! Efficient cache cleaning logic.

The function cleanOutdatedCacheEntries is well-implemented with appropriate use of locks and logging. The logic for checking file existence and removing outdated entries is sound.


63-83: LGTM! Proper loop for periodic cache cleaning.

The cleanOutdatedLoop function effectively manages periodic cache cleaning with a proper shutdown mechanism using a condition variable.


86-91: LGTM! Proper initialization and thread management.

The constructor for VectorIndexCache correctly initializes the cache and starts a thread for cleaning outdated entries.


93-98: LGTM! Safe shutdown procedure.

The destructor for VectorIndexCache ensures a safe shutdown by setting the shutdown flag and joining the cleaner thread.

dbms/src/Storages/DeltaMerge/File/dtpb/dmfile.proto (2)

66-66: LGTM! Updated ColumnStat with new vector index properties.

The change to use VectorIndexFileProps in ColumnStat enhances clarity by separating construction and retrieval details of vector indices.


83-93: LGTM! New VectorIndexFileProps message.

The introduction of VectorIndexFileProps provides a well-structured representation of vector index properties, facilitating efficient index retrieval.

dbms/src/Storages/DeltaMerge/Remote/Serializer.h (2)

96-96: LGTM! Maintained encapsulation for serializeColumnFileSet.

The function serializeColumnFileSet remains private, indicating it is intended for internal use only, which is appropriate.


96-96: Verify the change in access level for serializeSegment.

The access level for serializeSegment has been changed from private to public. Ensure this change is intentional and necessary for broader usage, and verify that it doesn't expose sensitive functionality.

dbms/src/Storages/DeltaMerge/File/VectorColumnFromIndexReader.cpp (1)

37-48: Ensure the runtime check is necessary and efficient.

The addition of a runtime check using std::is_sorted ensures that sorted_results is in ascending order. This is a good practice for validating input assumptions. However, consider the performance implications if this check is run frequently in a production environment. If the input is guaranteed to be sorted by design, you might consider removing this check in release builds.

dbms/src/Storages/DeltaMerge/DMContext.h (1)

136-137: Verify the integration of the new parameter scan_context.

The addition of scan_context as a parameter to the constructor enhances flexibility. Ensure that this change is consistently applied throughout the codebase where DMContext is instantiated. The default value of nullptr maintains backward compatibility.

Verification successful

Integration of scan_context is consistent and backward compatible.

The scan_context parameter is integrated into the DMContext constructor with a default value of nullptr, ensuring backward compatibility. Instantiations across the codebase either use this default or specify the parameter explicitly, confirming consistent integration. No issues found.

  • Instances in DeltaMergeStore.cpp and DeltaMergeStore.h explicitly use scan_context.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify all instantiations of `DMContext` use the new `scan_context` parameter.

# Test: Search for instantiations of `DMContext`. Expect: Consistent use of `scan_context`.
rg --type cpp -A 5 $'DMContext('

Length of output: 104


Script:

#!/bin/bash
# Correcting the regex pattern to search for instantiations of `DMContext`.
rg --type cpp -A 5 'DMContext\('

Length of output: 64207

dbms/src/Storages/DeltaMerge/File/ColumnStat.h (1)

44-44: Assess the impact of changing vector_index type.

Changing the type of vector_index to std::optional<dtpb::VectorIndexFileProps> may affect serialization and deserialization processes. Ensure that all related methods, such as toProto and mergeFromProto, correctly handle the new type.

Verification successful

Change to std::optional for vector_index is correctly handled in serialization and deserialization.

The toProto and mergeFromProto methods in ColumnStat.h appropriately handle the std::optional<dtpb::VectorIndexFileProps> type by checking for its presence and using the necessary protobuf methods. No further issues found.

  • File: dbms/src/Storages/DeltaMerge/File/ColumnStat.h
  • Methods: toProto, mergeFromProto
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify correct handling of `vector_index` type change in serialization and deserialization methods.

# Test: Search for usage of `vector_index` in related methods. Expect: Correct handling of new type.
rg --type cpp -A 5 $'vector_index'

Length of output: 30639

dbms/src/Storages/DeltaMerge/File/DMFileWriter.h (3)

57-57: LGTM: Updated parameter type in Stream constructor.

The change from TiDB::VectorIndexInfoPtr to TiDB::VectorIndexDefinitionPtr appears to align with a more structured approach to vector indexing.


75-75: LGTM: Updated member variable type in Stream.

Changing the vector_index member variable to VectorIndexBuilderPtr aligns with the use of a builder pattern, enhancing modularity and maintainability.


163-163: LGTM: Updated parameter type in addStreams method.

The change from TiDB::VectorIndexInfoPtr to TiDB::VectorIndexDefinitionPtr is consistent with the overall shift towards a more structured vector indexing approach.

dbms/src/Storages/DeltaMerge/DeltaMergeDefines.h (2)

95-95: LGTM: Updated member variable type in ColumnDefine.

Changing the vector_index member type to TiDB::VectorIndexDefinitionPtr indicates a refinement in vector index handling, potentially enhancing its definition and utilization.


102-102: LGTM: Updated constructor parameter type in ColumnDefine.

The update to TiDB::VectorIndexDefinitionPtr for the vector_index_ parameter ensures consistency in the ColumnDefine structure's handling of vector indices.

dbms/src/Storages/DeltaMerge/File/DMFileWithVectorIndexBlockInputStream.h (7)

94-94: LGTM: Constructor declaration only.

The removal of the constructor implementation from the header file suggests a move towards better modularity by defining it in a separate source file.


96-96: LGTM: Destructor declaration only.

The removal of the destructor implementation from the header file aligns with the approach of separating interface and implementation for improved modularity.


108-108: LGTM: read method declaration only.

The change to only declare the read method in the header file is consistent with the refactoring for better modularity and reduced compilation dependencies.


113-113: LGTM: readImpl method declaration only.

The change to only declare the readImpl method in the header file aligns with the strategy to separate interface and implementation for better modularity.


144-144: LGTM: readByIndexReader method declaration only.

The change to only declare the readByIndexReader method in the header file is consistent with the refactoring to separate interface and implementation.


148-148: LGTM: readByFollowingOtherColumns method declaration only.

The change to only declare the readByFollowingOtherColumns method in the header file aligns with the strategy to separate interface and implementation for better modularity.


182-182: LGTM: Updated member variable type.

Changing the vec_index member variable to VectorIndexViewerPtr suggests a more efficient or flexible handling of vector indices, aligning with the design improvements.

dbms/src/Storages/DeltaMerge/Index/VectorIndexHNSW/Index.cpp (7)

38-54: LGTM: Metric kind mapping is clear and robust.

The getUSearchMetricKind function effectively maps distance metrics and includes a runtime check for unsupported metrics.


58-64: LGTM: Constructor ensures correct initialization.

The VectorIndexHNSWBuilder constructor correctly initializes the index with the specified metric kind and dimensions.


Line range hint 67-106: LGTM: Data addition to index is well-handled.

The addBlock method effectively manages data addition to the index, including checks for dimensions and nullable columns.


110-113: LGTM: Index saving is robust.

The save method correctly handles saving the index and includes a runtime check for errors.


116-129: LGTM: Viewer initialization is thorough.

The view method properly initializes the index viewer and includes checks for index kind and distance metric.


Line range hint 134-199: LGTM: Search functionality is well-implemented.

The search method includes comprehensive checks and exception handling, ensuring robust search operations.


202-205: LGTM: Data retrieval is straightforward.

The get method correctly retrieves data from the index, ensuring the output vector is appropriately sized.

dbms/src/Common/LRUCache.h (1)

74-80: LGTM: Key existence check is efficient and thread-safe.

The contains method provides a non-intrusive way to verify key presence in the cache, ensuring thread safety.

dbms/src/Storages/S3/FileCache.h (5)

75-76: LGTM: Synchronization for file readiness is well-managed.

The waitForNotEmpty method effectively uses a condition variable to manage synchronization for file segment readiness.


228-234: LGTM: File download and eviction prevention are well-handled.

The downloadFileForLocalRead method ensures files are downloaded and guarded against eviction, enhancing cache reliability.


249-253: LGTM: Blocking file retrieval is effectively implemented.

The getOrWait method enhances cache functionality by providing a blocking mechanism for file retrieval.


256-256: LGTM: Foreground download control is well-implemented.

The fgDownload method provides controlled and synchronous file downloading, enhancing cache management.


273-273: LGTM: Enhanced locking control for file removal.

The remove method's use of a unique lock provides more granular control over file removal operations.

dbms/src/TiDB/Schema/TiDB.h (1)

209-209: Verify compatibility of VectorIndexDefinitionPtr.

The change from VectorIndexInfoPtr to VectorIndexDefinitionPtr indicates a shift in how vector indices are represented. Ensure that this change is compatible with other parts of the code that use ColumnInfo.

Verification successful

Change from VectorIndexInfoPtr to VectorIndexDefinitionPtr is compatible.

The vector_index member in ColumnInfo has been updated across various parts of the codebase, including runtime checks and JSON handling, ensuring compatibility with the new type. No issues were found in its integration.

  • Files affected:
    • dbms/src/TiDB/Schema/TiDB.cpp
    • dbms/src/Storages/StorageDeltaMerge.cpp
    • dbms/src/Storages/DeltaMerge/DeltaMergeDefines.h
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the usage of `ColumnInfo` struct to ensure compatibility with the new `vector_index` type.

# Test: Search for all usages of `ColumnInfo` to check if they are compatible with `VectorIndexDefinitionPtr`.
rg --type cpp 'ColumnInfo'

Length of output: 52672


Script:

#!/bin/bash
# Description: Search for interactions with the `vector_index` member in `ColumnInfo` to verify compatibility with `VectorIndexDefinitionPtr`.

# Test: Look for usages of `vector_index` within the `ColumnInfo` struct.
rg --type cpp 'vector_index'

Length of output: 8746

dbms/src/Storages/DeltaMerge/File/DMFileWriter.cpp (2)

104-108: Verify logic with VectorIndexDefinitionPtr.

The change in the parameter type from VectorIndexInfoPtr to VectorIndexDefinitionPtr suggests a more structured approach to managing vector index definitions. Ensure that the logic within this method correctly handles this new type.

Verification successful

Verified: Logic correctly handles VectorIndexDefinitionPtr.

The transition from VectorIndexInfoPtr to VectorIndexDefinitionPtr in the addStreams method is correctly handled. The logic aligns with the new parameter type, ensuring proper functionality.

  • The vector_index is defined and used as a VectorIndexDefinitionPtr.
  • The addStreams method uses this type appropriately in its logic.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the logic in `addStreams` method to ensure it correctly handles `VectorIndexDefinitionPtr`.

# Test: Search for all usages of `addStreams` to check if they are correctly updated with the new parameter type.
rg --type cpp 'addStreams'

Length of output: 387


Script:

#!/bin/bash
# Search for the definition and usage of `cd.vector_index` to ensure it aligns with `VectorIndexDefinitionPtr`.
rg --type cpp 'vector_index' -A 5

Length of output: 30639


342-355: Verify vector index metadata handling.

The logic for handling vector indices has been streamlined, improving clarity and efficiency. Ensure that the new logic correctly sets vector index metadata, especially with the use of emplace.

dbms/src/Storages/DeltaMerge/ScanContext.h (1)

93-93: Verify initialization of total_vector_idx_load_from_s3.

The atomic variable total_vector_idx_load_from_s3 is initialized to zero. Ensure that this initialization aligns with its intended usage and that the change does not affect the logic.

Verification successful

Initialization of total_vector_idx_load_from_s3 is appropriate.

The atomic variable total_vector_idx_load_from_s3 is correctly initialized to zero, aligning with its usage in increment operations and test assertions. This initialization does not adversely affect the logic. The tests confirm the expected behavior.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the initialization and usage of `total_vector_idx_load_from_s3`.

# Test: Search for all usages of `total_vector_idx_load_from_s3` to ensure the initialization aligns with its intended usage.
rg --type cpp 'total_vector_idx_load_from_s3'

Length of output: 2852

dbms/src/Storages/DeltaMerge/File/DMFileWithVectorIndexBlockInputStream.cpp (10)

34-81: Constructor is well-implemented.

The constructor efficiently initializes member variables using move semantics and performs necessary runtime checks to ensure correct initialization.


83-103: Destructor is well-implemented.

The destructor correctly logs performance metrics and updates the scan context only if vec_column_reader is initialized.


106-123: Method read is well-implemented.

The method efficiently handles filtering logic based on the return_filter flag, ensuring data is processed correctly.


125-168: Method readImpl is well-implemented.

The method effectively uses filtering logic to optimize data retrieval, ensuring only relevant data is processed and returned.


170-215: Method readByIndexReader is well-implemented.

The method efficiently utilizes the pack filter to skip unnecessary data packs and reads data using the vector index, ensuring performance metrics are updated accurately.


217-255: Method readByFollowingOtherColumns is well-implemented.

The method ensures consistent data reading across columns and vector indices, using the header layout to assemble the block correctly.


258-271: Method load is well-implemented.

The method efficiently manages the loading of vector index and search result data, ensuring it is only performed once and measuring the duration accurately.


273-384: Method loadVectorIndex is well-implemented.

The method handles vector index loading from both local and S3 storage with comprehensive error handling and performance tracking, ensuring robustness.


386-475: Method loadVectorSearchResult is well-implemented.

The method efficiently processes vector search results, updating performance metrics and optimizing data retrieval through the pack filter.


477-485: Method getPackIdFromBlock is well-implemented.

The method correctly retrieves the pack ID by ensuring alignment with the block's start offset, using a map lookup efficiently.

dbms/src/Common/FailPoint.cpp (1)

109-109: Addition of fail point file_cache_fg_download_fail is well-implemented.

The new fail point is added consistently with the existing pattern, enhancing error simulation capabilities.

dbms/src/Interpreters/Context.h (1)

395-395: Verify the impact of parameter name change in setVectorIndexCache.

The parameter name change from cache_size_in_bytes to cache_entities suggests a shift in semantics. Ensure that the method's usage throughout the codebase aligns with this change.

Verification successful

Parameter Name Change Verified: The change in parameter name from cache_size_in_bytes to cache_entities in the setVectorIndexCache method is consistently reflected across the codebase. The usage in Server.cpp and the test file gtest_dm_vector_index.cpp aligns with the new semantics. No further issues were found.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the usage of `setVectorIndexCache` to ensure it aligns with the new parameter semantics.

# Test: Search for the method usage. Expect: Occurrences of the method with updated semantics.
rg --type cpp --type hpp -A 2 $'setVectorIndexCache'

Length of output: 79


Script:

#!/bin/bash
# Description: Verify the usage of `setVectorIndexCache` to ensure it aligns with the new parameter semantics.

# Test: Search for the method usage. Expect: Occurrences of the method with updated semantics.
rg --type cpp --type h -A 2 $'setVectorIndexCache'

Length of output: 1032

dbms/src/Storages/S3/FileCache.cpp (3)

160-176: LGTM!

The logic for handling local file existence and cache inconsistencies is well implemented.


339-353: LGTM!

The overloaded remove method with a unique lock parameter enhances thread safety effectively.


683-711: LGTM!

The use of fail points for testing and the handling of exceptions are well implemented.

dbms/src/TiDB/Schema/TiDB.cpp (2)

410-416: LGTM!

The use of RUNTIME_CHECK and tipb namespace functions for vector index serialization is well implemented.


474-496: LGTM!

The parsing logic for vector index kinds and distance metrics is robust, and the dimension checks are appropriate.

dbms/src/Interpreters/Context.cpp (3)

57-57: Updated header inclusion to VectorIndexCache.h.

The inclusion of VectorIndex.h has been replaced with VectorIndexCache.h, indicating a shift towards cache management for vector indices.


1409-1410: Simplified reset call in dropVectorIndexCache.

The reset call on the shared pointer has been simplified, enhancing code readability and consistency.


1391-1397: Parameter change in setVectorIndexCache.

The parameter has been changed from cache_size_in_bytes to cache_entities, indicating a shift in focus from cache size to the number of entities. Ensure that this change aligns with the intended cache management strategy.

dbms/src/Server/Server.cpp (3)

1442-1442: Switch to entity-based cache configuration.

The configuration now uses vec_index_cache_entities with a default of 1000, replacing the previous size-based configuration. This change might impact how memory is managed, potentially simplifying cache management by focusing on the number of entities rather than size.


1443-1443: Conditional check for cache configuration.

The cache is configured only if vec_index_cache_entities is non-zero, preventing unnecessary allocation. This is a standard practice to ensure efficient resource usage.


1444-1444: Set vector index cache based on entities.

The setVectorIndexCache method is called with vec_index_cache_entities, aligning with the new entity-based cache configuration. This change reflects the shift from size-based to entity-based cache management.

dbms/src/Storages/DeltaMerge/tests/gtest_dm_vector_index.cpp (4)

1368-1385: Verify ANN Query Parameter Handling.

The computeNodeANNQuery method constructs an ANN query with provided parameters. Ensure that the ref_vec and top_k parameters are validated to prevent incorrect usage.


1427-1450: Verify Exception Handling for Disabled Cache.

The FileCacheNotEnabled test should ensure that the system raises the correct exception when the file cache is disabled. Verify that the test covers all relevant scenarios.

Verification successful

Verification Successful: Exception Handling in FileCacheNotEnabled Test is Correct.

The test correctly simulates the scenario where the file cache is disabled and verifies that the appropriate exception is raised. The exception message is checked to ensure it matches the expected output. No issues found.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify exception handling in `FileCacheNotEnabled` test.

# Test: Search for exception handling in the test to ensure correctness.
rg --type cpp 'TEST_F\(VectorIndexSegmentOnS3Test, FileCacheNotEnabled\)' -A 30

Length of output: 2595


1818-1881: Ensure Correct Concurrency Simulation.

The ConcurrentDownloadFromS3 test should accurately simulate concurrent downloads and verify correct behavior. Ensure that the sync points are correctly used to manage concurrency.

Verification successful

Concurrency Simulation in ConcurrentDownloadFromS3 Test is Correct.

The test accurately simulates concurrent S3 downloads using sync points to manage thread execution order, ensuring expected behavior is verified. The sync points "FileCache::fgDownload" and "before_FileSegment::waitForNotEmpty_wait" are correctly used to simulate concurrency. No issues found.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify concurrency simulation in `ConcurrentDownloadFromS3` test.

# Test: Search for sync point usage to ensure concurrency is correctly simulated.
rg --type cpp 'SyncPointCtl::enableInScope' -A 5

Length of output: 32670


1207-1274: Ensure S3 Configuration and Cache Initialization.

The SetUp method initializes S3 configurations and sets up the file cache. Ensure that these configurations are correctly reset in the TearDown method to avoid side effects in other tests.

Verification successful

S3 Configuration and Cache Cleanup Verified

The TearDown method in gtest_dm_vector_index.cpp correctly resets configurations and cleans up resources initialized in the SetUp method, ensuring no side effects on other tests.

  • Disables fail points and shuts down caches.
  • Resets global context configurations and deletes S3 resources.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that the `TearDown` method correctly resets configurations.

# Test: Search for the `TearDown` method to ensure configurations are reset.
rg --type cpp 'void TearDown\(\)' -A 30

Length of output: 115864

Comment on lines +61 to +64
&& sorted_results[results_it] >= pack_start //
&& sorted_results[results_it] < pack_end)
{
offsets_in_pack.push_back(results[results_it] - pack_start);
offsets_in_pack.push_back(sorted_results[results_it] - pack_start);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider optimizing the loop logic.

The loop that processes sorted_results could potentially be optimized by using a binary search to find the starting index for each pack, rather than iterating through all elements. This would improve performance, especially for large datasets.

Comment on lines +117 to +118
// TODO: We could fill multiple rows if rowid is continuous.
VectorIndexViewer::Key rowid = pack_start_rowid[pack_id] + offset_in_pack;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimize row filling logic.

The TODO comment suggests optimizing the filling of multiple rows when rowid is continuous. Implementing this optimization can significantly improve performance by reducing the number of insert operations.

Would you like assistance in implementing this optimization?

Comment on lines +1341 to +1349
static void removeAllFileCache()
{
auto * file_cache = FileCache::instance();
auto file_segments = file_cache->getAll();
for (const auto & file_seg : file_cache->getAll())
file_cache->remove(file_cache->toS3Key(file_seg->getLocalFileName()), true);

RUNTIME_CHECK(file_cache->getAll().empty());
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure Robust Cache Clearing.

The removeAllFileCache method should handle potential errors during cache clearing. Consider adding error handling to ensure robustness.

void removeAllFileCache()
{
    auto * file_cache = FileCache::instance();
    try
    {
        for (const auto & file_seg : file_cache->getAll())
            file_cache->remove(file_cache->toS3Key(file_seg->getLocalFileName()), true);
    }
    catch (const std::exception & e)
    {
        LOG_ERROR("Error clearing file cache: {}", e.what());
    }
    RUNTIME_CHECK(file_cache->getAll().empty());
}

Comment on lines +236 to +299
FileSegmentPtr FileCache::getOrWait(const S3::S3FilenameView & s3_fname, const std::optional<UInt64> & filesize)
{
auto s3_key = s3_fname.toFullKey();
auto file_type = getFileType(s3_key);
auto & table = tables[static_cast<UInt64>(file_type)];

std::unique_lock lock(mtx);

auto f = table.get(s3_key);
if (f != nullptr)
{
lock.unlock();
f->setLastAccessTime(std::chrono::system_clock::now());
auto status = f->waitForNotEmpty();
if (status == FileSegment::Status::Complete)
{
GET_METRIC(tiflash_storage_remote_cache, type_dtfile_hit).Increment();
return f;
}
else
{
// On-going download failed, let the caller retry.
return nullptr;
}
}

GET_METRIC(tiflash_storage_remote_cache, type_dtfile_miss).Increment();

auto estimzted_size = filesize ? *filesize : getEstimatedSizeOfFileType(file_type);
if (!reserveSpaceImpl(file_type, estimzted_size, /*try_evict*/ true))
{
// Space not enough.
GET_METRIC(tiflash_storage_remote_cache, type_dtfile_full).Increment();
LOG_DEBUG(
log,
"s3_key={} space not enough(capacity={} used={} estimzted_size={}), skip cache",
s3_key,
cache_capacity,
cache_used,
estimzted_size);

// Just throw, no need to let the caller retry.
throw Exception( //
ErrorCodes::S3_ERROR,
"Cannot reserve {} space for object {}",
estimzted_size,
s3_key);
return nullptr;
}

auto file_seg
= std::make_shared<FileSegment>(toLocalFilename(s3_key), FileSegment::Status::Empty, estimzted_size, file_type);
table.set(s3_key, file_seg);
lock.unlock();

PerfContext::file_cache.fg_download_from_s3++;
fgDownload(lock, s3_key, file_seg);
if (!file_seg || !file_seg->isReadyToRead())
throw Exception( //
ErrorCodes::S3_ERROR,
"Download object {} failed",
s3_key);

return file_seg;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the typo in the variable name.

The variable name estimzted_size should be corrected to estimated_size.

- auto estimzted_size = filesize ? *filesize : getEstimatedSizeOfFileType(file_type);
+ auto estimated_size = filesize ? *filesize : getEstimatedSizeOfFileType(file_type);
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
FileSegmentPtr FileCache::getOrWait(const S3::S3FilenameView & s3_fname, const std::optional<UInt64> & filesize)
{
auto s3_key = s3_fname.toFullKey();
auto file_type = getFileType(s3_key);
auto & table = tables[static_cast<UInt64>(file_type)];
std::unique_lock lock(mtx);
auto f = table.get(s3_key);
if (f != nullptr)
{
lock.unlock();
f->setLastAccessTime(std::chrono::system_clock::now());
auto status = f->waitForNotEmpty();
if (status == FileSegment::Status::Complete)
{
GET_METRIC(tiflash_storage_remote_cache, type_dtfile_hit).Increment();
return f;
}
else
{
// On-going download failed, let the caller retry.
return nullptr;
}
}
GET_METRIC(tiflash_storage_remote_cache, type_dtfile_miss).Increment();
auto estimzted_size = filesize ? *filesize : getEstimatedSizeOfFileType(file_type);
if (!reserveSpaceImpl(file_type, estimzted_size, /*try_evict*/ true))
{
// Space not enough.
GET_METRIC(tiflash_storage_remote_cache, type_dtfile_full).Increment();
LOG_DEBUG(
log,
"s3_key={} space not enough(capacity={} used={} estimzted_size={}), skip cache",
s3_key,
cache_capacity,
cache_used,
estimzted_size);
// Just throw, no need to let the caller retry.
throw Exception( //
ErrorCodes::S3_ERROR,
"Cannot reserve {} space for object {}",
estimzted_size,
s3_key);
return nullptr;
}
auto file_seg
= std::make_shared<FileSegment>(toLocalFilename(s3_key), FileSegment::Status::Empty, estimzted_size, file_type);
table.set(s3_key, file_seg);
lock.unlock();
PerfContext::file_cache.fg_download_from_s3++;
fgDownload(lock, s3_key, file_seg);
if (!file_seg || !file_seg->isReadyToRead())
throw Exception( //
ErrorCodes::S3_ERROR,
"Download object {} failed",
s3_key);
return file_seg;
FileSegmentPtr FileCache::getOrWait(const S3::S3FilenameView & s3_fname, const std::optional<UInt64> & filesize)
{
auto s3_key = s3_fname.toFullKey();
auto file_type = getFileType(s3_key);
auto & table = tables[static_cast<UInt64>(file_type)];
std::unique_lock lock(mtx);
auto f = table.get(s3_key);
if (f != nullptr)
{
lock.unlock();
f->setLastAccessTime(std::chrono::system_clock::now());
auto status = f->waitForNotEmpty();
if (status == FileSegment::Status::Complete)
{
GET_METRIC(tiflash_storage_remote_cache, type_dtfile_hit).Increment();
return f;
}
else
{
// On-going download failed, let the caller retry.
return nullptr;
}
}
GET_METRIC(tiflash_storage_remote_cache, type_dtfile_miss).Increment();
auto estimated_size = filesize ? *filesize : getEstimatedSizeOfFileType(file_type);
if (!reserveSpaceImpl(file_type, estimated_size, /*try_evict*/ true))
{
// Space not enough.
GET_METRIC(tiflash_storage_remote_cache, type_dtfile_full).Increment();
LOG_DEBUG(
log,
"s3_key={} space not enough(capacity={} used={} estimated_size={}), skip cache",
s3_key,
cache_capacity,
cache_used,
estimated_size);
// Just throw, no need to let the caller retry.
throw Exception( //
ErrorCodes::S3_ERROR,
"Cannot reserve {} space for object {}",
estimated_size,
s3_key);
return nullptr;
}
auto file_seg
= std::make_shared<FileSegment>(toLocalFilename(s3_key), FileSegment::Status::Empty, estimated_size, file_type);
table.set(s3_key, file_seg);
lock.unlock();
PerfContext::file_cache.fg_download_from_s3++;
fgDownload(lock, s3_key, file_seg);
if (!file_seg || !file_seg->isReadyToRead())
throw Exception( //
ErrorCodes::S3_ERROR,
"Download object {} failed",
s3_key);
return file_seg;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants