[ENH]: Load HNSW index without disk intermediary #5159

tanujnay112 · 2025-07-29T14:18:54Z

Description of changes

(WIP)

Improvements & Bug fixes
- Loads the HNSW index from S3 without a disk intermediary by directly passing in the memory buffer given by S3.
New functionality
- ...

Test plan

How are these changes tested?

Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Migration plan

Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?

Observability plan

What is the plan to instrument and monitor this change?

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?

tanujnay112 · 2025-07-29T14:19:11Z

[ENH]: Load HNSW index without disk intermediary #5159 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

github-actions · 2025-07-29T14:19:23Z

propel-code-bot · 2025-07-29T14:22:13Z

Load HNSW Index Directly from S3 Memory, Removing Disk Intermediary

This PR overhauls the way HNSW indices are loaded from remote storage (S3) into the service, removing the former step of always copying index files to a temporary disk directory before constructing the in-memory index. The new flow initializes the HNSW index directly from S3-fetched memory buffers in all relevant paths. This change involves new APIs, significant modifications in Rust HNSW index provider code, updates to the hnswlib dependency, and adjustments to loading and forking logic for the index lifecycle.

Key Changes

• Introduces in-memory loading: fetches index segment files into memory buffers and loads the HNSW index directly from these buffers (without disk I/O) via load_from_hnsw_data.
• Updates Cargo.toml and Cargo.lock to point at a new commit in the hnswlib dependency, supporting in-memory data loading.
• Modifies the main HnswIndexProvider (and related) logic so open, fork, and similar operations use the new in-memory approach; new helper methods fetch and assemble memory buffers from S3.
• Augments the PersistentIndex trait with a load_from_hnsw_data method to support generic in-memory loading for other index types.
• Adds a new workflow for handling file-backed operations (fork/copy) so that all S3 fetches are memory-based, refactoring code to support this throughout the stack.

Affected Areas

• rust/index/src/hnsw_provider.rs
• rust/index/src/hnsw.rs
• rust/index/src/types.rs
• Cargo.toml
• Cargo.lock

This summary was automatically generated by @propel-code-bot

blacksmith-sh · 2025-07-29T14:22:37Z

9 Jobs Failed:

PR checks / Python tests / test-cluster-rust-frontend (3.9, chromadb/test/distributed/test_repair_collection_log_offset.py)

Step "Test" from job "Python tests / test-cluster-rust-frontend (3.9, chromadb/test/distributed/test_repair_collection_log_offset.py)" is failing. The last 20 log lines are:

[...]
            response = log_service_stub.InspectLogState(request, timeout=60)
            if '''LogPosition { offset: 1001 }''' in response.debug:
                return
            time.sleep(1)
>       raise RuntimeError("Test timed out without repair")
E       RuntimeError: Test timed out without repair

chromadb/test/distributed/test_repair_collection_log_offset.py:73: RuntimeError
----------------------------- Captured stdout call -----------------------------
Generating data with seed  1755032916.1666057
collection_id = 72cd6a9f-7869-4f23-a32b-c5f974f17bf9
============================= slowest 10 durations =============================
256.46s call     chromadb/test/distributed/test_repair_collection_log_offset.py::test_repair_collection_log_offset[basic_http_client]
0.52s setup    chromadb/test/distributed/test_repair_collection_log_offset.py::test_repair_collection_log_offset[basic_http_client]

(1 durations < 0.005s hidden.  Use -vv to show these durations.)
=========================== short test summary info ============================
FAILED chromadb/test/distributed/test_repair_collection_log_offset.py::test_repair_collection_log_offset[basic_http_client] - RuntimeError: Test timed out without repair
======================== 1 failed in 257.08s (0:04:17) =========================
Error: Process completed with exit code 1.

PR checks / Lint

Step "Clippy" from job "Lint" is failing. The last 20 log lines are:

[...]
   Compiling quote v0.6.13
   Compiling matrixmultiply v0.3.9
   Compiling tracing-test-macro v0.2.5
    Checking rawpointer v0.2.1
    Checking tracing-test v0.2.5
    Checking tokio-test v0.4.4
    Checking approx v0.5.1
    Checking ndarray v0.16.1
   Compiling convert_case v0.6.0
   Compiling proptest-derive v0.3.0
   Compiling napi-build v2.1.6
   Compiling chromadb-js-bindings v0.1.0 (/home/runner/_work/chroma/chroma/rust/js_bindings)
   Compiling napi-derive-backend v1.0.75
   Compiling ctor v0.2.9
    Checking napi-sys v2.4.0
    Checking napi v2.16.17
   Compiling napi-derive v2.16.13
error: could not compile `chroma-index` (lib test) due to 2 previous errors
    Checking chroma v0.1.0 (/home/runner/_work/chroma/chroma/rust/chroma)
Error: Process completed with exit code 101.

PR checks / Rust tests / test (blacksmith-8vcpu-ubuntu-2204)

Step "Test" from job "Rust tests / test (blacksmith-8vcpu-ubuntu-2204)" is failing. The last 20 log lines are:

[...]
      28: core::ops::function::FnOnce::call_once
                 at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/ops/function.rs:250:5
      29: core::ops::function::FnOnce::call_once
                 at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/ops/function.rs:250:5
    note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

  Cancelling due to test failure: 7 tests still running
        PASS [   7.724s] chroma-segment blockfile_metadata::test::test_simple_regex
        PASS [  10.006s] chroma-load tests::end_to_end
        PASS [  10.203s] chroma-segment blockfile_metadata::test::test_composite_regex
        PASS [  16.357s] chroma-log sqlite_log::tests::test_push_pull_logs
        PASS [  24.358s] chroma-blockstore::blockfile_writer_test tests::blockfile_writer_test
        PASS [  36.951s] chroma-frontend::test_collection test_collection_sqlite
        PASS [  58.629s] chroma-blockstore arrow::concurrency_test::tests::test_blockfile_shuttle
────────────
     Summary [  59.379s] 290/518 tests run: 289 passed, 1 failed, 78 skipped
        FAIL [   0.375s] chroma-segment distributed_spann::test::test_spann_segment_writer
warning: 228/518 tests were not run due to test failure (run with --no-fail-fast to run all tests, or run with --max-fail)
error: test run failed
Error: Process completed with exit code 100.

PR checks / Rust tests / Integration test ci_k8s_integration 2

Step "Run tests" from job "Rust tests / Integration test ci_k8s_integration 2" is failing. The last 20 log lines are:

[...]

    failures:

    failures:
        garbage_collector_component::tests::test_k8s_integration_ignores_forked_collections

    test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 37 filtered out; finished in 30.18s
    
  stderr ───
    thread 'garbage_collector_component::tests::test_k8s_integration_ignores_forked_collections' panicked at rust/garbage_collector/src/garbage_collector_component.rs:687:10:
    called `Result::unwrap()` on an `Err` value: "Timeout waiting for new version to be created"
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

  Cancelling due to test failure
────────────
     Summary [ 360.250s] 16/40 tests run: 15 passed (1 slow), 1 failed, 556 skipped
        FAIL [  30.207s] garbage_collector garbage_collector_component::tests::test_k8s_integration_ignores_forked_collections
warning: 24/40 tests were not run due to test failure (run with --no-fail-fast to run all tests, or run with --max-fail)
error: test run failed
Error: Process completed with exit code 100.

PR checks / Rust tests / test-long

Step "Test" from job "Rust tests / test-long" is failing. The last 20 log lines are:

[...]
      17: chroma_index::spann::types::tests::test_long_running_integrity_multiple_runs::{{closure}}
                 at ./src/spann/types.rs:4424:51
      18: core::ops::function::FnOnce::call_once
                 at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/ops/function.rs:250:5
      19: core::ops::function::FnOnce::call_once
                 at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/ops/function.rs:250:5
    note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

        PASS [  40.079s] chroma-index spann::types::tests::test_long_running_data_integrity_parallel
        SLOW [> 60.000s] chroma-index spann::types::tests::test_long_running_data_integrity
        SLOW [>120.000s] chroma-index spann::types::tests::test_long_running_data_integrity
        SLOW [>180.000s] chroma-index spann::types::tests::test_long_running_data_integrity
        PASS [ 184.354s] chroma-index spann::types::tests::test_long_running_data_integrity
────────────
     Summary [ 184.355s] 5 tests run: 2 passed (1 slow), 3 failed, 591 skipped
        FAIL [   3.172s] chroma-index spann::types::tests::test_long_running_data_integrity_multiple_parallel_runs
        FAIL [   2.887s] chroma-index spann::types::tests::test_long_running_data_integrity_multiple_parallel_runs_with_updates_deletes
        FAIL [   3.367s] chroma-index spann::types::tests::test_long_running_integrity_multiple_runs
error: test run failed
Error: Process completed with exit code 100.

PR checks / Rust tests / test-benches (blacksmith-16vcpu-ubuntu-2204, --bench get)

Step "Run benchmark" from job "Rust tests / test-benches (blacksmith-16vcpu-ubuntu-2204, --bench get)" is failing. The last 20 log lines are:

[...]
             at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/scheduler/multi_thread/mod.rs:87:22
  13: tokio::runtime::context::runtime::enter_runtime
             at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/context/runtime.rs:65:16
  14: tokio::runtime::scheduler::multi_thread::MultiThread::block_on
             at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/scheduler/multi_thread/mod.rs:86:9
  15: tokio::runtime::runtime::Runtime::block_on_inner
             at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/runtime.rs:370:50
  16: tokio::runtime::runtime::Runtime::block_on
             at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.41.1/src/runtime/runtime.rs:340:13
  17: get::bench_get
             at ./benches/get.rs:125:25
  18: get::benches
             at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/criterion-0.5.1/src/macros.rs:71:17
  19: get::main
             at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/criterion-0.5.1/src/macros.rs:124:17
  20: core::ops::function::FnOnce::call_once
             at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
error: bench failed, to rerun pass `-p worker --bench get`
Error: Process completed with exit code 101.

PR checks / Python tests / test-cluster-rust-frontend (3.9, chromadb/test/property/test_add.py)

Step "Test" from job "Python tests / test-cluster-rust-frontend (3.9, chromadb/test/property/test_add.py)" is failing. The last 20 log lines are:

[...]
chromadb/test/property/test_add.py::test_add_medium[basic_http_client]
  /opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/hypothesis/core.py:899: HypothesisWarning: Generating overly large repr. This is an expensive operation, and with a length of 9001 kB is unlikely to be useful. Use -Wignore to ignore the warning, or -Werror to get a traceback.
    text_repr = repr_call(test, args, kwargs)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================= slowest 10 durations =============================
770.30s call     chromadb/test/property/test_add.py::test_add_medium[basic_http_client]
579.69s call     chromadb/test/property/test_add.py::test_add_small[basic_http_client]
47.61s call     chromadb/test/property/test_add.py::test_add_large[basic_http_client]
27.58s call     chromadb/test/property/test_add.py::test_add_miniscule[basic_http_client]
1.38s call     chromadb/test/property/test_add.py::test_add_large_exceeding[basic_http_client]
0.41s setup    chromadb/test/property/test_add.py::test_add_miniscule[basic_http_client]
0.41s setup    chromadb/test/property/test_add.py::test_add_large_exceeding[basic_http_client]
0.41s setup    chromadb/test/property/test_add.py::test_add_small[basic_http_client]
0.41s setup    chromadb/test/property/test_add.py::test_add_partial[basic_http_client]
0.41s setup    chromadb/test/property/test_add.py::test_add_medium[basic_http_client]
=========================== short test summary info ============================
FAILED chromadb/test/property/test_add.py::test_add_medium[basic_http_client] - AssertionError() [single exception in FlakyFailure]
======= 1 failed, 5 passed, 1 xfailed, 525 warnings in 778.69s (0:12:58) =======
Error: Process completed with exit code 1.

PR checks / Python tests / test-cluster-rust-frontend (3.9, chromadb/test/property/test_embeddings.py)

Step "Test" from job "Python tests / test-cluster-rust-frontend (3.9, chromadb/test/property/test_embeddings.py)" is failing. The last 20 log lines are:

[...]
state.wait_for_compaction()
state.wait_for_compaction()
state.teardown()
state.teardown()
Explanation:
Explanation:
    These lines were always and only run by failing examples:
    These lines were always and only run by failing examples:
        /home/runner/_work/chroma/chroma/chromadb/test/utils/wait_for_version_increase.py:26
        /home/runner/_work/chroma/chroma/chromadb/test/utils/wait_for_version_increase.py:26
        /home/runner/_work/chroma/chroma/chromadb/test/utils/wait_for_version_increase.py:27
        /home/runner/_work/chroma/chroma/chromadb/test/utils/wait_for_version_increase.py:27


You can reproduce this example by temporarily adding @reproduce_failure('6.112.2', b'AXicXZZJiONVEMarks7SW3q6k+7O2ukk00umO1t3kk46nbQoKCOIgoio4D4jLugIenMZQXE5eNCT6M2bXvSm4MHlMCgqevQggiAIiowK6m38Vf3T49gJyX9571V99dVX9Z6I1FRyKjGVhkpXpCTykPwaUhWRu2RCVPM60H2uA93if01juqxl4XZVa3qgCd3TQ11S0Tkd6QntSUjGn7mrD1Ub2tSknpSwqs5rSDd1i3lNm6Xng3mM7GtGwxrVnJuuYXNBV5h/Sic0bTNmdMikFk6ruuDGupp3I88lzuZ/PnPfWwZoXisaZ2EEYz2dxnlMdzBg8y58/9aZwNtusK775d/31yYuqgUgequN7X6tAIy4/W0P5MI3RxgXiH2XIHs69T/w9mc8pflNEdwyFgYg3dElWzbLbRL4ITiqw49FOYOhguhttvalT1VP6qo7e270/LOX+JjJkNES4qssCU+o2WqL9m3swhfG5QbuNnhBPLKz/eFv5EwrDn2GrITxUdc+vuegJCl6t8jmHS8uG3FxzQlsYemJWYXQddhf1UlmVsnAEDtbmtIsiOe0QAZShvZekfnv3g9brkoEF9Uy16SEx2EfYLfH+kMwWaQ56DpJ/mIAb5HRJrFv8C6vCVbM8aTM2mVtX4ugTKMR0W1tMRohjQNojGlH26xcM4zC7ZhwJYKemt8i2FOiizbyetRSmWd2zf2k8LuOla6usSTN8llP0hqxmpymeDPC0w4IwmCZ5BrieQOPu2MbbSJ3sRT0+fLpm950HZimmgQ8z11MZxz2Ht+aIStzk3XO4xZL0TF/bixXsNdhfEX0apdOhrcsieF3E7MhWNgBhZkTkBX4RRjLYbHNaJzVO2BP6DJ+kp5ptOXK+fiyTEOQMeImBfthDR2p84CvYO6AxG1gZOA1tUhyCyZPnqgB4GCT1JS49qkAI6sBFSFqap2Y6+K5a+q0szKu8qusSWTAw2AjoKt58Z4zf76HTDvA6FETu/gsYzNNOsqIBP+3+/q8YjmEl20PqCfWAUQe/sGozCKDBTDGwAiZLXdYsRKGbHdU+uDcdTc+vmyoSiSEzoKTjuc7zXMDI5gEQJIwTPLmZAVIe6h0CtdR5h+4rY/liMOIIFxZfvqfJSPJqGsBwjTXhdND13yX9mK07jNjzTtYAa0ITyMaExMSsN2k0KydnWBwAUBDRG4tMfgMHUzdnX9y7rJ3axsjfIVAtsbzCJxN9zkH3hmWb/rCGpoyF7OicTrAVy8kvClbek74dNVokJQoCFvOQRRtZQDlre1yQ6xRGfsAnOd3ghcL+IxjvUgQGy6UKmNtN7qCQVIoxGSLaZhta5iMbIA4z3eH5NpziVVJ7Ip7O5/6sfP2T+Ztj7e7oFmHy7mA+ew7/lFFqk+JWBM01ANw94PV0bP2ecCwiFvfR6yXd5r/PrRNG80Sb5VY533uKWJGwF6oiyRhCbEF8nWahkDJ2U0WVN53U+g2DJc9pMHztPfwDjyt8z+FHOuiD5q7d9/0ZhSD9QrELFGaQnoIBDqG3GchxDaykjVqzHSQTAMCO3jJYcYAFG23daIH3MePJachMGq3+9b0y5BcRxZXasbEOABaCZ8NqJ0wZTmuIfPjZCrugmng+xBE8yhklacMd9NQRO3a5ymz3xTbGdK0oYpvelZJrkkD9Un890cez377xxFxq7jagijb7nghU7ZZcZ30Cp1nhBbypBu/1so5F8ixDN4UuJbwHsdThMkZ3pxyPNt0DOtRCyQwg7iyQDhgLGF4KihyFGzgVygqROi9Y30wSsaqNLawFzSWSDlVmPNM2xaXcSZTXBcDEU6/+lN99tlLth1nkLTYpgmaOAmfhd0lwqyCpMi7WVJSGou+Bfo8oo4QFW1i3GAmael7oJ6B75rotUHnUj/m5FmRRcC2IVkDmfKtpuqHnQpxh307XBS9wRatknfLRRGLYVdR0MSmsQ/t3CWJKyp6p81uX6Pe2jZ9h7L+vXusgKzj72PPtLFsm6cfFqogTmNzFYb6Zm2g8thE67S6isuMFgNmPfGLrImCtwaCBGui2g1kGOJwZDN6TKkCI+e0d3A05E3JN6LoMYFvMY+qu8VEPvOy9bst4JsUIsC349iV2fb4xzAyfhwrYR8RS4hjrJtbRttt/lvkoUo1CjFnyFjPGZzwA8s20DqcaMPBfmeyvd4M/9ILjjNtYiIqLAWnzP/OgQVG+zDSsoONqdm38DwMnfIKT7t2+lRaAS9DbGct005KgaD68NYly1nR0yLnbn7lUcqLeX2vvZFOmpMkk+sY22KxddFJniNcc3aqCYr/nsfePj9W+iZdW29zfH/puE+3PLSNY5WSJK8JyDjgmkF6XR//rHkUnMl5EVnWfXURbyOSvOKntTYrZryFpUElhFIEYxgJeY3ZxsJmhATSMGgJtGOdbSEptd1iP9iwJ8++9sYzH1Xc379Do/qL') as a decorator on your test case
You can reproduce this example by temporarily adding @reproduce_failure('6.112.2', b'AXicXZZJiONVEMarks7SW3q6k+7O2ukk00umO1t3kk46nbQoKCOIgoio4D4jLugIenMZQXE5eNCT6M2bXvSm4MHlMCgqevQggiAIiowK6m38Vf3T49gJyX9571V99dVX9Z6I1FRyKjGVhkpXpCTykPwaUhWRu2RCVPM60H2uA93if01juqxl4XZVa3qgCd3TQ11S0Tkd6QntSUjGn7mrD1Ub2tSknpSwqs5rSDd1i3lNm6Xng3mM7GtGwxrVnJuuYXNBV5h/Sic0bTNmdMikFk6ruuDGupp3I88lzuZ/PnPfWwZoXisaZ2EEYz2dxnlMdzBg8y58/9aZwNtusK775d/31yYuqgUgequN7X6tAIy4/W0P5MI3RxgXiH2XIHs69T/w9mc8pflNEdwyFgYg3dElWzbLbRL4ITiqw49FOYOhguhttvalT1VP6qo7e270/LOX+JjJkNES4qssCU+o2WqL9m3swhfG5QbuNnhBPLKz/eFv5EwrDn2GrITxUdc+vuegJCl6t8jmHS8uG3FxzQlsYemJWYXQddhf1UlmVsnAEDtbmtIsiOe0QAZShvZekfnv3g9brkoEF9Uy16SEx2EfYLfH+kMwWaQ56DpJ/mIAb5HRJrFv8C6vCVbM8aTM2mVtX4ugTKMR0W1tMRohjQNojGlH26xcM4zC7ZhwJYKemt8i2FOiizbyetRSmWd2zf2k8LuOla6usSTN8llP0hqxmpymeDPC0w4IwmCZ5BrieQOPu2MbbSJ3sRT0+fLpm950HZimmgQ8z11MZxz2Ht+aIStzk3XO4xZL0TF/bixXsNdhfEX0apdOhrcsieF3E7MhWNgBhZkTkBX4RRjLYbHNaJzVO2BP6DJ+kp5ptOXK+fiyTEOQMeImBfthDR2p84CvYO6AxG1gZOA1tUhyCyZPnqgB4GCT1JS49qkAI6sBFSFqap2Y6+K5a+q0szKu8qusSWTAw2AjoKt58Z4zf76HTDvA6FETu/gsYzNNOsqIBP+3+/q8YjmEl20PqCfWAUQe/sGozCKDBTDGwAiZLXdYsRKGbHdU+uDcdTc+vmyoSiSEzoKTjuc7zXMDI5gEQJIwTPLmZAVIe6h0CtdR5h+4rY/liMOIIFxZfvqfJSPJqGsBwjTXhdND13yX9mK07jNjzTtYAa0ITyMaExMSsN2k0KydnWBwAUBDRG4tMfgMHUzdnX9y7rJ3axsjfIVAtsbzCJxN9zkH3hmWb/rCGpoyF7OicTrAVy8kvClbek74dNVokJQoCFvOQRRtZQDlre1yQ6xRGfsAnOd3ghcL+IxjvUgQGy6UKmNtN7qCQVIoxGSLaZhta5iMbIA4z3eH5NpziVVJ7Ip7O5/6sfP2T+Ztj7e7oFmHy7mA+ew7/lFFqk+JWBM01ANw94PV0bP2ecCwiFvfR6yXd5r/PrRNG80Sb5VY533uKWJGwF6oiyRhCbEF8nWahkDJ2U0WVN53U+g2DJc9pMHztPfwDjyt8z+FHOuiD5q7d9/0ZhSD9QrELFGaQnoIBDqG3GchxDaykjVqzHSQTAMCO3jJYcYAFG23daIH3MePJachMGq3+9b0y5BcRxZXasbEOABaCZ8NqJ0wZTmuIfPjZCrugmng+xBE8yhklacMd9NQRO3a5ymz3xTbGdK0oYpvelZJrkkD9Un890cez377xxFxq7jagijb7nghU7ZZcZ30Cp1nhBbypBu/1so5F8ixDN4UuJbwHsdThMkZ3pxyPNt0DOtRCyQwg7iyQDhgLGF4KihyFGzgVygqROi9Y30wSsaqNLawFzSWSDlVmPNM2xaXcSZTXBcDEU6/+lN99tlLth1nkLTYpgmaOAmfhd0lwqyCpMi7WVJSGou+Bfo8oo4QFW1i3GAmael7oJ6B75rotUHnUj/m5FmRRcC2IVkDmfKtpuqHnQpxh307XBS9wRatknfLRRGLYVdR0MSmsQ/t3CWJKyp6p81uX6Pe2jZ9h7L+vXusgKzj72PPtLFsm6cfFqogTmNzFYb6Zm2g8thE67S6isuMFgNmPfGLrImCtwaCBGui2g1kGOJwZDN6TKkCI+e0d3A05E3JN6LoMYFvMY+qu8VEPvOy9bst4JsUIsC349iV2fb4xzAyfhwrYR8RS4hjrJtbRttt/lvkoUo1CjFnyFjPGZzwA8s20DqcaMPBfmeyvd4M/9ILjjNtYiIqLAWnzP/OgQVG+zDSsoONqdm38DwMnfIKT7t2+lRaAS9DbGct005KgaD68NYly1nR0yLnbn7lUcqLeX2vvZFOmpMkk+sY22KxddFJniNcc3aqCYr/nsfePj9W+iZdW29zfH/puE+3PLSNY5WSJK8JyDjgmkF6XR//rHkUnMl5EVnWfXURbyOSvOKntTYrZryFpUElhFIEYxgJeY3ZxsJmhATSMGgJtGOdbSEptd1iP9iwJ8++9sYzH1Xc379Do/qL') as a decorator on your test case
====== 3 failed, 20 passed, 1 xpassed, 369 warnings in 6729.98s (1:52:09) ======
====== 3 failed, 20 passed, 1 xpassed, 369 warnings in 6729.98s (1:52:09) ======
Error: Process completed with exit code 1.
Error: Process completed with exit code 1.

PR checks / all-required-pr-checks-passed

Step "Decide whether the needed jobs succeeded or failed" from job "all-required-pr-checks-passed" is failing. The last 20 log lines are:

[...]
}
EOM
)"
shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
env:
  GITHUB_REPO_NAME: chroma-core/chroma
  PYTHONPATH: /home/runner/_work/_actions/re-actors/alls-green/release/v1/src
# ❌ Some of the required to succeed jobs failed 😢😢😢

📝 Job statuses:
📝 python-tests → ❌ failure [required to succeed or be skipped]
📝 python-vulnerability-scan → ✓ success [required to succeed or be skipped]
📝 javascript-client-tests → ✓ success [required to succeed or be skipped]
📝 rust-tests → ❌ failure [required to succeed or be skipped]
📝 go-tests → ✓ success [required to succeed or be skipped]
📝 lint → ❌ failure [required to succeed]
📝 check-helm-version-bump → ⬜ skipped [required to succeed or be skipped]
📝 delete-helm-comment → ✓ success [required to succeed or be skipped]
Error: Process completed with exit code 1.

Summary: 1 successful workflow, 1 failed workflow

❌ PR checks (39 jobs succeeded, 9 jobs failed)
✅ Check PR Title (1 job succeeded)

Last updated: 2025-08-12 23:00:48 UTC

propel-code-bot · 2025-07-29T14:24:56Z

rust/index/src/hnsw_provider.rs

+            None => match HnswIndex::load_from_hnsw_data(
+                self.fetch_hnsw_segment(&new_id, prefix_path)
+                    .await
+                    .map_err(|e| Box::new(HnswIndexProviderForkError::FileError(*e)))?,
+                &index_config,
+                ef_search,
+                new_id,
+            ) {


[CriticalError]

The logic for loading the index within the fork method appears to be incorrect. It attempts to fetch the segment from remote storage using new_id, but the index for new_id doesn't exist in storage yet. The files for source_id have just been copied to a local directory.

The previous implementation using HnswIndex::load(storage_path_str, ...) correctly loaded the index from this new local directory. Since fork is intended to create a mutable, file-backed copy of an index, it seems the original approach of loading from the local path should be restored.

propel-code-bot · 2025-07-29T14:24:57Z

rust/index/src/hnsw_provider.rs

+        let hnsw_data = self.fetch_hnsw_segment(source_id, prefix_path).await?;
+        let getters = [
+            |hnsw_data: &hnswlib::HnswData| Arc::new(Vec::from(hnsw_data.header_buffer())),
+            |hnsw_data: &hnswlib::HnswData| Arc::new(Vec::from(hnsw_data.data_level0_buffer())),
+            |hnsw_data: &hnswlib::HnswData| Arc::new(Vec::from(hnsw_data.length_buffer())),
+            |hnsw_data: &hnswlib::HnswData| Arc::new(Vec::from(hnsw_data.link_list_buffer())),
+        ];
+
+        for (file, getter) in FILES.iter().zip(getters) {
+            let file_path = index_storage_path.join(file);
+            self.copy_bytes_to_local_file(&file_path, getter(&hnsw_data))
+                .await?;
+        }
+        Ok(())
+    }


[PerformanceOptimization]

This function now fetches the entire HNSW segment into an in-memory HnswData object before writing the individual files to disk. The previous implementation streamed each file directly. For large indexes, this change could significantly increase memory usage during the fork operation. Was this change intentional? If fork still needs to write to disk, perhaps restoring the previous file-by-file download logic for this function would be more memory-efficient.

propel-code-bot · 2025-07-29T14:24:58Z

rust/index/src/hnsw_provider.rs

+            None => match HnswIndex::load_from_hnsw_data(
+                self.fetch_hnsw_segment(id, prefix_path)
+                    .await
+                    .map_err(|e| Box::new(HnswIndexProviderOpenError::FileError(*e)))?,
+                &index_config,
+                ef_search,
+                *id,
+            ) {


[PerformanceOptimization]

This change successfully loads the index from memory, which aligns with the PR's goal. However, the open function still contains calls that write the index to a temporary directory on disk (create_dir_all, load_hnsw_segment_into_directory) before this memory-based loading occurs. These disk operations now seem redundant.

To fully load without a disk intermediary and improve efficiency, you could remove the calls to create_dir_all, load_hnsw_segment_into_directory, and purge_one_id from this function.

tanujnay112 marked this pull request as ready for review July 29, 2025 14:20

propel-code-bot bot reviewed Jul 29, 2025

View reviewed changes

[ENH]: Load HNSW index without disk intermediary

7558379

tanujnay112 force-pushed the hnsw_no_disk branch from e9564cf to 7558379 Compare August 12, 2025 21:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ENH]: Load HNSW index without disk intermediary #5159

[ENH]: Load HNSW index without disk intermediary #5159

Uh oh!

tanujnay112 commented Jul 29, 2025 •

edited

Loading

Uh oh!

tanujnay112 commented Jul 29, 2025

Uh oh!

github-actions bot commented Jul 29, 2025

Uh oh!

propel-code-bot bot commented Jul 29, 2025 •

edited

Loading

Uh oh!

blacksmith-sh bot commented Jul 29, 2025 •

edited

Loading

Uh oh!

propel-code-bot bot Jul 29, 2025

Uh oh!

propel-code-bot bot Jul 29, 2025

Uh oh!

propel-code-bot bot Jul 29, 2025

Uh oh!

Uh oh!

[ENH]: Load HNSW index without disk intermediary #5159

Are you sure you want to change the base?

[ENH]: Load HNSW index without disk intermediary #5159

Uh oh!

Conversation

tanujnay112 commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of changes

Test plan

Migration plan

Observability plan

Documentation Changes

Uh oh!

tanujnay112 commented Jul 29, 2025

Uh oh!

github-actions bot commented Jul 29, 2025

Reviewer Checklist

Testing, Bugs, Errors, Logs, Documentation

System Compatibility

Quality

Uh oh!

propel-code-bot bot commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

blacksmith-sh bot commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

propel-code-bot bot Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

propel-code-bot bot Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

propel-code-bot bot Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tanujnay112 commented Jul 29, 2025 •

edited

Loading

propel-code-bot bot commented Jul 29, 2025 •

edited

Loading

blacksmith-sh bot commented Jul 29, 2025 •

edited

Loading