Skip to content

Conversation

@jairad26
Copy link
Contributor

@jairad26 jairad26 commented Oct 23, 2025

Description of changes

Summarize the changes made by this PR.

  • Improvements & Bug fixes
    • This PR adds support for discovering new metadata keys during log flushing for local chroma
  • New functionality
    • ...

Test plan

added e2e test support to run for distributed, local, and single node to ensure metadata discovery and related tests run
How are these changes tested?

  • [x ] Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Migration plan

Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?

Observability plan

What is the plan to instrument and monitor this change?

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?

Copy link
Contributor Author

jairad26 commented Oct 23, 2025

@github-actions
Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@jairad26 jairad26 force-pushed the jai/add-new-schema-keys-local-compaction branch from b139cb0 to 6162b25 Compare October 23, 2025 18:15
@jairad26 jairad26 force-pushed the jai/local-schema-support branch from 9ff8b57 to f2c2971 Compare October 23, 2025 18:15
@blacksmith-sh blacksmith-sh bot deleted a comment from jairad26 Oct 23, 2025
@jairad26 jairad26 force-pushed the jai/local-schema-support branch from f2c2971 to edc07ff Compare October 23, 2025 20:08
@jairad26 jairad26 force-pushed the jai/add-new-schema-keys-local-compaction branch from 6162b25 to 25cc1e2 Compare October 23, 2025 20:08
@jairad26 jairad26 force-pushed the jai/local-schema-support branch from edc07ff to c2e5338 Compare October 23, 2025 20:11
@jairad26 jairad26 force-pushed the jai/add-new-schema-keys-local-compaction branch from 25cc1e2 to 5f6b983 Compare October 23, 2025 20:11
@jairad26 jairad26 force-pushed the jai/local-schema-support branch from c2e5338 to 67ee627 Compare October 23, 2025 20:43
@jairad26 jairad26 force-pushed the jai/add-new-schema-keys-local-compaction branch 3 times, most recently from fee4f7e to c8d1659 Compare October 23, 2025 22:33
@jairad26 jairad26 force-pushed the jai/local-schema-support branch 2 times, most recently from 5e2d880 to a0e7c9c Compare October 23, 2025 23:17
@jairad26 jairad26 force-pushed the jai/add-new-schema-keys-local-compaction branch 2 times, most recently from 9fdfe9a to b1db218 Compare October 23, 2025 23:25
@jairad26 jairad26 force-pushed the jai/local-schema-support branch 2 times, most recently from 9c47af4 to 7e9e084 Compare October 23, 2025 23:46
@jairad26 jairad26 force-pushed the jai/add-new-schema-keys-local-compaction branch 2 times, most recently from 3ff589e to ac13295 Compare October 23, 2025 23:58
@jairad26 jairad26 force-pushed the jai/local-schema-support branch from 7e9e084 to 387c681 Compare October 23, 2025 23:58
@jairad26 jairad26 force-pushed the jai/add-new-schema-keys-local-compaction branch from ac13295 to 038041f Compare October 24, 2025 00:22
@jairad26 jairad26 force-pushed the jai/local-schema-support branch from 387c681 to 17e078b Compare October 24, 2025 00:22
@jairad26 jairad26 force-pushed the jai/add-new-schema-keys-local-compaction branch from 038041f to d2242b5 Compare October 24, 2025 06:37
@jairad26 jairad26 force-pushed the jai/local-schema-support branch 2 times, most recently from 32d4b95 to 1ce0bbb Compare October 24, 2025 06:48
@jairad26 jairad26 marked this pull request as ready for review October 24, 2025 18:54
@propel-code-bot
Copy link
Contributor

propel-code-bot bot commented Oct 24, 2025

Dynamic Schema Discovery & Persistence During Local Compaction

Adds end-to-end support for discovering previously unseen metadata keys when processing local log compaction. rust/segment now detects new keys while flushing logs, mutates an in-memory Schema, and returns an ApplyLogsOutcome that signals if the schema changed. rust/log persists that new schema (when the collection already had a schema) and keeps metadata/segment updates in sync. The PR also hardens schema rules (blocks keys starting with CHROMA_), refactors error types, bumps deps (adds serde_json), and updates the large Python E2E suite to cope with conditional SPANN behaviour.

Key Changes

• Extended SqliteMetadataWriter.apply_logs() signature: new schema: Option<Schema> param and returns ApplyLogsOutcome { schema_update }
• Introduced ApplyLogsOutcome struct and helper ensure_schema_for_update_value() to auto-populate Schema on new key discovery while filtering system keys (CHROMA_KEY)
• Added update_collection_schema() method in SqliteMetadataWriter for in-place schema persistence (includes JSON serialization error handling)
• Updated LocalCompactionManager logic: tracks whether a schema was already persisted, writes logs, updates schema inside same backfill flow
• Strengthened Schema.ensure_key_from_metadata() guard against special keys; added CHROMA_KEY constant
• Cargo: added serde_json to chroma-segment
• Large test-suite rewrite (chromadb/test/api/test_schema_e2e.py) to adapt to new conditional schema persistence paths; removed SPANN-skips and performs runtime checks

Affected Areas

rust/segment/src/sqlite_metadata.rs
rust/log/src/local_compaction_manager.rs
rust/types/src/collection_schema.rs
• Python e2e schema tests
rust/segment/Cargo.toml dependency list

This summary was automatically generated by @propel-code-bot

@jairad26 jairad26 force-pushed the jai/add-new-schema-keys-local-compaction branch from 38172c2 to 09a866f Compare October 24, 2025 23:19
@jairad26 jairad26 force-pushed the jai/local-schema-support branch from bee55b4 to 296af1b Compare October 24, 2025 23:19
Comment on lines +318 to +317
if let Some(schema_mut) = schema.as_mut() {
if let Ok(metadata_value) = MetadataValue::try_from(value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

Error handling gap: MetadataValue::try_from(value) failure is silently ignored in schema discovery logic. If the conversion fails, the key won't be added to the schema even though it might be valid metadata that should be discoverable.

if let Ok(metadata_value) = MetadataValue::try_from(value) {
    return schema_mut
        .ensure_key_from_metadata(key, metadata_value.value_type());
} else {
    // Log the conversion failure for debugging - using standard tracing pattern
    tracing::warn!(key = %key, "Failed to convert metadata value during schema discovery");
}

This could lead to inconsistent schema discovery where some valid metadata keys are missed due to conversion issues. The enhanced logging follows tracing best practices with structured fields.

Context for Agents
[**BestPractice**]

Error handling gap: `MetadataValue::try_from(value)` failure is silently ignored in schema discovery logic. If the conversion fails, the key won't be added to the schema even though it might be valid metadata that should be discoverable.

```rust
if let Ok(metadata_value) = MetadataValue::try_from(value) {
    return schema_mut
        .ensure_key_from_metadata(key, metadata_value.value_type());
} else {
    // Log the conversion failure for debugging - using standard tracing pattern
    tracing::warn!(key = %key, "Failed to convert metadata value during schema discovery");
}
```

This could lead to inconsistent schema discovery where some valid metadata keys are missed due to conversion issues. The enhanced logging follows tracing best practices with structured fields.

File: rust/segment/src/sqlite_metadata.rs
Line: 319

@jairad26 jairad26 force-pushed the jai/local-schema-support branch from 296af1b to dd3f346 Compare October 27, 2025 17:44
@jairad26 jairad26 force-pushed the jai/add-new-schema-keys-local-compaction branch from 09a866f to 67f3870 Compare October 27, 2025 17:44
Comment on lines +989 to +991
runtime.block_on(sqlite_seg_writer.apply_logs(
data,
metadata_seg_id,
test_data
.collection_and_segments
.collection
.schema
.clone(),
&mut *tx,
))
.expect("Should be able to apply logs");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

The apply_logs method signature change adds a new schema parameter, but in the test cases you're passing the collection's schema from test_data. However, the production code in local_compaction_manager.rs passes either the collection's schema or None based on schema_previously_persisted. This inconsistency could lead to different behavior between tests and production.

Consider aligning the test calls with the production logic:

runtime.block_on(sqlite_seg_writer.apply_logs(
    data,
    metadata_seg_id,
    if schema_previously_persisted {
        test_data.collection_and_segments.collection.schema.clone()
    } else {
        None
    },
    &mut *tx,
))
Context for Agents
[**BestPractice**]

The `apply_logs` method signature change adds a new schema parameter, but in the test cases you're passing the collection's schema from `test_data`. However, the production code in `local_compaction_manager.rs` passes either the collection's schema or None based on `schema_previously_persisted`. This inconsistency could lead to different behavior between tests and production.

Consider aligning the test calls with the production logic:
```rust
runtime.block_on(sqlite_seg_writer.apply_logs(
    data,
    metadata_seg_id,
    if schema_previously_persisted {
        test_data.collection_and_segments.collection.schema.clone()
    } else {
        None
    },
    &mut *tx,
))
```

File: rust/segment/src/sqlite_metadata.rs
Line: 999

Comment on lines 228 to 232
metadata_writer
.update_collection_schema(
collection_and_segments.collection.collection_id,
&updated_schema,
)
.await
.map_err(|_| CompactionManagerError::MetadataApplyLogsFailed)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

Missing error handling for schema update: If update_collection_schema() fails (lines 228-234), the error is converted to MetadataApplyLogsFailed which loses important context about what specifically went wrong during schema persistence. This could make debugging schema-related issues very difficult.

// Consider more specific error handling
metadata_writer
    .update_collection_schema(
        collection_and_segments.collection.collection_id,
        &updated_schema,
    )
    .await
    .map_err(|e| CompactionManagerError::SchemaUpdateFailed(e))?;
Context for Agents
[**BestPractice**]

Missing error handling for schema update: If `update_collection_schema()` fails (lines 228-234), the error is converted to `MetadataApplyLogsFailed` which loses important context about what specifically went wrong during schema persistence. This could make debugging schema-related issues very difficult.

```rust
// Consider more specific error handling
metadata_writer
    .update_collection_schema(
        collection_and_segments.collection.collection_id,
        &updated_schema,
    )
    .await
    .map_err(|e| CompactionManagerError::SchemaUpdateFailed(e))?;
```

File: rust/log/src/local_compaction_manager.rs
Line: 234

@jairad26 jairad26 force-pushed the jai/local-schema-support branch from dd3f346 to df8c8c8 Compare October 27, 2025 19:47
@jairad26 jairad26 force-pushed the jai/add-new-schema-keys-local-compaction branch from 67f3870 to a43db01 Compare October 27, 2025 19:47
@jairad26 jairad26 changed the base branch from jai/local-schema-support to graphite-base/5728 October 27, 2025 22:49
@jairad26 jairad26 force-pushed the jai/add-new-schema-keys-local-compaction branch from a43db01 to 2dd65c5 Compare October 27, 2025 22:49
@jairad26 jairad26 changed the base branch from graphite-base/5728 to main October 27, 2025 22:50
Comment on lines +382 to +381
for (key, value) in meta.iter() {
if Self::ensure_schema_for_update_value(&mut schema, key, value) {
schema_modified = true;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[PerformanceOptimization]

Potential performance issue: The ensure_schema_for_update_value function is called for every key-value pair in every metadata update within the log processing loop. For large batches with many metadata fields, this could create significant overhead.

Consider batching schema updates or using a HashSet to track which keys have already been processed to avoid redundant schema checks. For better performance, pre-size the HashSet if you can estimate the number of unique keys:

// Pre-size for better performance
let mut processed_keys = HashSet::with_capacity(estimated_unique_keys);
for (key, value) in meta.iter() {
    if !processed_keys.contains(key) {
        if Self::ensure_schema_for_update_value(&mut schema, key, value) {
            schema_modified = true;
            processed_keys.insert(key.clone());
        }
    }
}

Note: HashSet operations are O(1) average case, but the overhead of hashing and set operations may not be beneficial for very small key sets. Consider the trade-off based on typical batch sizes in your use case.

Context for Agents
[**PerformanceOptimization**]

Potential performance issue: The `ensure_schema_for_update_value` function is called for every key-value pair in every metadata update within the log processing loop. For large batches with many metadata fields, this could create significant overhead.

Consider batching schema updates or using a `HashSet` to track which keys have already been processed to avoid redundant schema checks. For better performance, pre-size the HashSet if you can estimate the number of unique keys:

```rust
// Pre-size for better performance
let mut processed_keys = HashSet::with_capacity(estimated_unique_keys);
for (key, value) in meta.iter() {
    if !processed_keys.contains(key) {
        if Self::ensure_schema_for_update_value(&mut schema, key, value) {
            schema_modified = true;
            processed_keys.insert(key.clone());
        }
    }
}
```

Note: HashSet operations are O(1) average case, but the overhead of hashing and set operations may not be beneficial for very small key sets. Consider the trade-off based on typical batch sizes in your use case.

File: rust/segment/src/sqlite_metadata.rs
Line: 386

@jairad26 jairad26 force-pushed the jai/add-new-schema-keys-local-compaction branch 3 times, most recently from ed52a09 to f9c8123 Compare October 28, 2025 17:56
Comment on lines +334 to +342
schema: Option<Schema>,
tx: &mut C,
) -> Result<(), SqliteMetadataError>
) -> Result<ApplyLogsOutcome, SqliteMetadataError>
where
for<'connection> &'connection mut C: sqlx::Executor<'connection, Database = sqlx::Sqlite>,
{
if logs.is_empty() {
return Ok(());
return Ok(ApplyLogsOutcome {
schema_update: None,
});
}
let mut schema = schema;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CriticalError]

The changed apply_logs method signature adds a schema parameter but doesn't handle errors when the schema is None and metadata processing tries to access it. In ensure_schema_for_update_value, if schema.as_mut() returns None, the function silently returns false, but the caller expects schema discovery to work.

Consider adding error handling or initialization:

let mut schema = schema.unwrap_or_else(|| Schema::new());

This ensures schema discovery can always function properly during log flushing.

Context for Agents
[**CriticalError**]

The changed `apply_logs` method signature adds a `schema` parameter but doesn't handle errors when the schema is `None` and metadata processing tries to access it. In `ensure_schema_for_update_value`, if `schema.as_mut()` returns `None`, the function silently returns `false`, but the caller expects schema discovery to work.

Consider adding error handling or initialization:
```rust
let mut schema = schema.unwrap_or_else(|| Schema::new());
```

This ensures schema discovery can always function properly during log flushing.

File: rust/segment/src/sqlite_metadata.rs
Line: 345

@jairad26 jairad26 force-pushed the jai/add-new-schema-keys-local-compaction branch 2 times, most recently from 8f4f55e to 7826dad Compare October 28, 2025 18:22
.sysdb
.get_collection_with_segments(message.collection_id)
.await?;
let schema_previously_persisted = collection_and_segments.collection.schema.is_some();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CriticalError]

Potential Race Condition in Schema State Check: The schema_previously_persisted boolean is determined before schema reconciliation, but the schema could be modified during reconcile_schema_with_config(). This creates a potential inconsistency where:

  1. Schema starts as None (schema_previously_persisted = false)
  2. reconcile_schema_with_config() creates/modifies schema
  3. Later logic uses the stale schema_previously_persisted value

This could lead to schema updates being skipped when they should be persisted.

// Current approach - schema state checked before reconciliation
let schema_previously_persisted = collection_and_segments.collection.schema.is_some();
collection_and_segments
    .collection
    .reconcile_schema_with_config(KnnIndex::Hnsw)?;

// Later usage may be inconsistent with actual schema state
if schema_previously_persisted { /* ... */ }

Consider checking schema state after reconciliation or using a different approach to determine when schema updates should be persisted.

Context for Agents
[**CriticalError**]

**Potential Race Condition in Schema State Check**: The `schema_previously_persisted` boolean is determined before schema reconciliation, but the schema could be modified during `reconcile_schema_with_config()`. This creates a potential inconsistency where:

1. Schema starts as `None` (schema_previously_persisted = false)
2. `reconcile_schema_with_config()` creates/modifies schema
3. Later logic uses the stale `schema_previously_persisted` value

This could lead to schema updates being skipped when they should be persisted.

```rust
// Current approach - schema state checked before reconciliation
let schema_previously_persisted = collection_and_segments.collection.schema.is_some();
collection_and_segments
    .collection
    .reconcile_schema_with_config(KnnIndex::Hnsw)?;

// Later usage may be inconsistent with actual schema state
if schema_previously_persisted { /* ... */ }
```

Consider checking schema state after reconciliation or using a different approach to determine when schema updates should be persisted.

File: rust/log/src/local_compaction_manager.rs
Line: 143

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is intentional. if theres no schema_str found in the db, we shouldnt use&modify it in compaction

@jairad26 jairad26 force-pushed the jai/add-new-schema-keys-local-compaction branch from 7826dad to 5bc00ac Compare October 28, 2025 18:44
@jairad26 jairad26 enabled auto-merge (squash) October 28, 2025 19:00
@jairad26 jairad26 merged commit a1ea81a into main Oct 28, 2025
62 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants