Skip to content

Conversation

meiravgri
Copy link
Collaborator

@meiravgri meiravgri commented Aug 28, 2025

Purpose

This PR decouples the shrinking and growing logic in vector index algorithms to avoid scenarios where updates can result in oscillating allocation and deallocation of large containers such as label to id hash tables, or id to metadata vectors.

Problem Statement

The previous implementation performed both growing and shrinking operations when index_size % blockSize == 0, causing frequent allocation/deallocation cycles during index updates at block boundaries. This was particularly problematic for large containers like:

  • Vector containers at index size
  • Hash tables with index-size buckets
  • Metadata containers that scale with index capacity

Solution Overview

The changes implement a more conservative resize strategy that decouples shrinking and growing logic to prevent oscillating allocation/deallocation cycles. The key improvements include:

Resize Strategy (Both Algorithms)

  • Growing: Triggered when indexSize() == indexCapacity() (capacity is full)
  • Shrinking: Only when indexCapacity() >= (indexSize() + 2 * blockSize) (2+ free blocks), shrinking by one block
  • Buffer Zone: Maintains at least 1 block buffer to prevent oscillation between growing and shrinking

HNSW-Specific Container Architecture Changes

Updated maxElements Definition:

  • Previously: maxElements represented capacity for both data containers and metadata containers
  • Now: maxElements only tracks capacity for data containers (graphDataBlocks, vectors)
  • Metadata containers (idToMetaData, visitedNodesHandlerPool, labelLookup) now resize independently based on the 2-block buffer strategy

HNSW Container Type Distinction:

  • Data Containers: Block-based containers that resize by adding/removing single blocks

    • graphDataBlocks - stores graph connectivity data in blocks
    • vectors (DataBlocksContainer) - stores vector data in blocks
    • Resize operation: emplace_back() or pop_back() single blocks
  • Metadata Containers: Index-sized containers that require full reallocation

    • idToMetaData - maps internal IDs to metadata (labels, flags)
    • visitedNodesHandlerPool - tracks visited nodes during search
    • labelLookup - hash table mapping labels to internal IDs
    • Resize operation: resize() + shrink_to_fit() entire container

BruteForce-Specific Container Changes

BruteForce Container Architecture:

  • Data Container: vectors (DataBlocksContainer) - block-based vector storage
  • Metadata Containers: Index-sized containers requiring full reallocation
    • idToLabelMapping - maps internal IDs to labels
    • labelLookup - hash table mapping labels to internal IDs (single/multi variants)
    • Resize operation: resize() + shrink_to_fit() entire container

Memory Management

  • Data Containers: Resize immediately when blocks are added/removed (both algorithms)
  • Metadata Containers: Resize only when there's significant over-allocation (2+ blocks) (both algorithms)
  • Hash Tables: Benefit from reduced rehashing due to more stable capacity (both algorithms)

Key Changes

BruteForce Algorithm (src/VecSim/algorithms/brute_force/brute_force.h)

  • New resizeIndexCommon() method: Centralizes resize logic with proper assertions and logging
  • Enhanced shrinkByBlock(): Only shrinks when there are 2+ free blocks (indexCapacity() >= (indexSize() + 2 * blockSize))
  • Improved growByBlock(): Simplified growing logic with better assertions
  • Fixed indexCapacity(): Now returns idToLabelMapping.capacity() instead of size()

HNSW Algorithm (src/VecSim/algorithms/hnsw/hnsw.h)

  • New isCapacityFull() method: Encapsulates capacity check logic (indexSize() == maxElements)
  • Refactored resize logic: Separates data container resizing from metadata container resizing
  • Improved storeNewElement(): Proactive capacity checking before element insertion
  • Enhanced shrinkByBlock(): Implements the same 2-block buffer strategy as BruteForce
  • Better memory management: Distinguishes between heavy containers (data) and lightweight containers (metadata)

Tiered Index (src/VecSim/algorithms/hnsw/hnsw_tiered.h)

  • Improved resize detection: Updated insertVectorToHNSW() to use the new isCapacityFull() method for cleaner capacity checking
    • Encapsulate lock management: Added helper methods lockMainIndexGuard() and unlockMainIndexGuard() to centralize lock acquisition/release
    • Test instrumentation: Added mainIndexGuard_write_lock_count (guarded by BUILD_TESTS macro) counter for testing resize operations in tiered indexes

Container Improvements (src/VecSim/containers/data_blocks_container.h)

  • Exposed numBlocks() method: Moved from test-only to public API for better introspection
  • Enhanced documentation: Clearer distinction between size (elements) and capacity (blocks)

Test Updates

  • Updated allocator tests: Reflect new resize behavior in memory allocation expectations
    • IndexAllocatorTest.test_bf_index_block_size_1: Enhanced with detailed container size verification and memory delta calculations
    • IndexAllocatorTest.test_hnsw_reclaim_memory: Updated memory allocation expectations for new resize behavior
  • Enhanced HNSW tests: Added comprehensive container size verification
    • HNSWTieredIndexTestBasic.HNSWResize_Test: New test for tiered index resize behavior and lock management verification
  • Enhanced BruteForce tests: Added comprehensive resize behavior verification
    • BruteForceTest.resize_and_align_index: Enhanced with detailed container size verification functions
    • BruteForceTest.brute_force_no_oscillation_test: New test to verify the 2-block buffer strategy prevents allocation/deallocation cycles
    • BruteForceMultiTest.resize_and_align_index: Enhanced multi-index resize testing with comprehensive verification
  • Improved logging tests: Updated expected log messages for new resize patterns
    • CommonAPITest.testlogTieredIndex: Updated to expect additional resize log messages from both FLAT and HNSW indexes
  • Enhanced test utilities: Added detailed container size verification functions and better error messages across multiple test files

Copy link

codecov bot commented Aug 28, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.86%. Comparing base (fd08b55) to head (b4f316a).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #753      +/-   ##
==========================================
+ Coverage   96.82%   96.86%   +0.04%     
==========================================
  Files         122      122              
  Lines        7496     7533      +37     
==========================================
+ Hits         7258     7297      +39     
+ Misses        238      236       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@meiravgri meiravgri changed the title brute force 75% capacity imp Decouple the shrinking and growing logic of large containers in Flat and HNSW Aug 28, 2025
@meiravgri meiravgri changed the title Decouple the shrinking and growing logic of large containers in Flat and HNSW [MOD-10559] Decouple the shrinking and growing logic of large containers in Flat and HNSW Aug 28, 2025
@meiravgri meiravgri requested review from alonre24 and Copilot and removed request for alonre24 August 31, 2025 13:50
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR decouples the shrinking and growing logic in vector index algorithms (HNSW and BruteForce) to prevent oscillating allocation/deallocation cycles at block boundaries. The changes implement a conservative resize strategy with a 2-block buffer zone to avoid frequent reallocations of large containers.

  • Implements separate growing and shrinking conditions to prevent oscillation between allocation/deallocation
  • Distinguishes between data containers (block-based) and metadata containers (index-sized) with different resize strategies
  • Adds comprehensive test coverage to verify the new resize behavior and prevent oscillation

Reviewed Changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/unit/test_hnsw_tiered.cpp Adds comprehensive test for HNSW tiered index resize behavior and lock management verification
tests/unit/test_common.cpp Updates log message expectations to reflect new resize patterns with additional FLAT and HNSW resize messages
tests/unit/test_bruteforce_multi.cpp Enhances multi-index resize testing with detailed container size verification and 2-block buffer strategy validation
tests/unit/test_bruteforce.cpp Adds comprehensive resize behavior verification and new oscillation prevention test
tests/unit/test_allocator.cpp Updates memory allocation tests with detailed container size verification and new resize behavior expectations
src/VecSim/vec_sim_tiered_index.h Adds helper methods for lock management and test instrumentation for resize operations
src/VecSim/vec_sim_index.h Fixes spelling error in comment (Stroage -> Storage)
src/VecSim/vec_sim_common.h Removes deprecated comment for initialCapacity field
src/VecSim/containers/data_blocks_container.h Exposes numBlocks() method from test-only to public API for better introspection
src/VecSim/containers/data_blocks_container.cpp Moves numBlocks() implementation outside BUILD_TESTS guard
src/VecSim/algorithms/hnsw/hnsw_tiered_tests_friends.h Adds friend class declaration for new resize test
src/VecSim/algorithms/hnsw/hnsw_tiered.h Updates to use new lock management helper methods and capacity checking
src/VecSim/algorithms/hnsw/hnsw.h Implements decoupled resize logic with isCapacityFull() method and 2-block buffer strategy
src/VecSim/algorithms/brute_force/brute_force.h Centralizes resize logic with new resizeIndexCommon() method and implements 2-block buffer strategy
Comments suppressed due to low confidence (1)

tests/unit/test_bruteforce_multi.cpp:1

  • There's a typo in the test comment 'Index size = 1Verifying allocation delta...'. It should be 'Index size = 1. Verifying allocation delta...' with proper spacing.
/*

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Collaborator

@alonre24 alonre24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!
Few small questions
Also - let'd add a short micro benchmark for the oscillation use case (and validate the improvement)

alonre24
alonre24 previously approved these changes Sep 11, 2025
@meiravgri meiravgri added this pull request to the merge queue Sep 11, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 11, 2025
@meiravgri meiravgri requested a review from alonre24 September 13, 2025 15:32
@meiravgri meiravgri enabled auto-merge September 13, 2025 15:44
@meiravgri meiravgri added this pull request to the merge queue Sep 13, 2025
Merged via the queue into main with commit df0664c Sep 13, 2025
40 of 48 checks passed
@meiravgri meiravgri deleted the meiravg_relax_resize branch September 13, 2025 18:55
Copy link

Backport failed for 0.6, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 0.6
git worktree add -d .worktree/backport-753-to-0.6 origin/0.6
cd .worktree/backport-753-to-0.6
git switch --create backport-753-to-0.6
git cherry-pick -x df0664cb1d45a46dc81c81d9eb235f08cffe1de3

Copy link

Backport failed for 0.7, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 0.7
git worktree add -d .worktree/backport-753-to-0.7 origin/0.7
cd .worktree/backport-753-to-0.7
git switch --create backport-753-to-0.7
git cherry-pick -x df0664cb1d45a46dc81c81d9eb235f08cffe1de3

Copy link

Backport failed for 0.8, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 0.8
git worktree add -d .worktree/backport-753-to-0.8 origin/0.8
cd .worktree/backport-753-to-0.8
git switch --create backport-753-to-0.8
git cherry-pick -x df0664cb1d45a46dc81c81d9eb235f08cffe1de3

github-actions bot pushed a commit that referenced this pull request Sep 13, 2025
…ers in Flat and HNSW (#753)

* brute force 75% capacity imp

* same solution brute force and hnsw

* align flat growby block with hnsw

* resisng tests for BF

* allocator tests that accounts for one block buffer

* tests

* remove comment

* remove

* fix resize test for macos

* change to estimation

* generelize test

(cherry picked from commit df0664c)
Copy link

github-merge-queue bot pushed a commit that referenced this pull request Sep 13, 2025
…ontainers in Flat and HNSW (#775)

[MOD-10559] Decouple the shrinking and growing logic of large containers in Flat and HNSW (#753)

* brute force 75% capacity imp

* same solution brute force and hnsw

* align flat growby block with hnsw

* resisng tests for BF

* allocator tests that accounts for one block buffer

* tests

* remove comment

* remove

* fix resize test for macos

* change to estimation

* generelize test

(cherry picked from commit df0664c)

Co-authored-by: meiravgri <109056284+meiravgri@users.noreply.github.com>
meiravgri added a commit that referenced this pull request Sep 15, 2025
shrink by blocksize
shrink to zero only if capcity is 1 blocksize.
github-merge-queue bot pushed a commit that referenced this pull request Sep 15, 2025
…ontainers in Flat and HNSW (#777)

* backport #753

shrink by blocksize
shrink to zero only if capcity is 1 blocksize.

* move public outside

* revert size
meiravgri added a commit that referenced this pull request Sep 16, 2025
…ontainers in Flat and HNSW (#777)

* backport #753

shrink by blocksize
shrink to zero only if capcity is 1 blocksize.

* move public outside

* revert size

(cherry picked from commit bcc4d67)
github-merge-queue bot pushed a commit that referenced this pull request Sep 16, 2025
…ontainers in Flat and HNSW (#780)

[0.8] [MOD-10559] Decouple the shrinking and growing logic of large containers in Flat and HNSW (#777)

* backport #753

shrink by blocksize
shrink to zero only if capcity is 1 blocksize.

* move public outside

* revert size

(cherry picked from commit bcc4d67)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants