Skip to content

Conversation

@yorickvP
Copy link
Contributor

@yorickvP yorickvP commented Nov 5, 2025

Problem

Running a profiler on my pinecone-using application. it was CPU-bottlenecked on json_format.MessageToDict (being able to do about 100 vectors per second in query and fetch responses).
It turns out converting the embeddings to a dict this way is very slow. It's much faster to convert them to a list without going through MessageToDict.

Solution

Changed parse_fetch_response and parse_query_response to directly read the protobuf structure instead of going through MessageToDict

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update
  • Infrastructure change (CI configs, etc)
  • Non-code change (docs, etc)
  • None of the above: (explain here)

Test Plan

Describe specific steps for validating this change.

  • Ran make test-grpc-unit.

Running a profiler on my pinecone program, it was CPU-bottlenecked
on json_format.MessageToDict (being able to do about 100 vectors per
second in query and fetch responses).
It turns out converting the embeddings to a dict this way is very
slow. It's much faster to convert them to a list without going through
MessageToDict.
jhamon added a commit that referenced this pull request Nov 18, 2025
## Problem

The current implementation uses `json_format.MessageToDict` to convert
entire protobuf messages to dictionaries when parsing gRPC responses.
This is a significant CPU bottleneck when processing large numbers of
vectors, as reported in PR #537 where users experienced ~100
vectors/second throughput.

The `MessageToDict` conversion is expensive because it:
1. Serializes the entire protobuf message to JSON
2. Deserializes it back into a Python dictionary
3. Does this for every field, even when we only need specific fields

Additionally, several other performance issues were identified:
- Metadata conversion using `MessageToDict` on `Struct` messages
- Inefficient list construction (append vs pre-allocation)
- Unnecessary dict creation for `SparseValues` parsing
- Response header processing overhead

## Solution

Optimized all gRPC response parsing functions in
`pinecone/grpc/utils.py` to directly access protobuf fields instead of
converting entire messages to dictionaries. This approach:

1. **Directly accesses protobuf fields**: Uses `response.vectors`,
`response.matches`, `response.namespace`, etc. directly
2. **Optimized metadata conversion**: Created `_struct_to_dict()` helper
that directly accesses `Struct` fields (~1.5-2x faster than
`MessageToDict`)
3. **Pre-allocates lists**: Uses `[None] * len()` for known-size lists
(~6.5% improvement)
4. **Direct SparseValues creation**: Creates `SparseValues` objects
directly instead of going through dict conversion (~410x faster)
5. **Caches protobuf attributes**: Stores repeated attribute accesses in
local variables
6. **Optimized response info extraction**: Improved
`extract_response_info()` performance with module-level constants and
early returns
7. **Maintains backward compatibility**: Output format remains identical
to the previous implementation

## Performance Impact

Performance testing of the response parsing functions show significant
improvements across all optimized functions.

## Changes

### Modified Files
- `pinecone/grpc/utils.py`: Optimized 9 response parsing functions with
direct protobuf field access
- Added `_struct_to_dict()` helper for optimized metadata conversion
(~1.5-2x faster)
  - Pre-allocated lists where size is known (~6.5% improvement)
  - Direct `SparseValues` creation (removed dict conversion overhead)
  - Cached protobuf message attributes
  - Removed dead code paths (dict fallback in `parse_usage`)
- `pinecone/grpc/index_grpc.py`: Updated to pass protobuf messages
directly to parse functions
- `pinecone/grpc/resources/vector_grpc.py`: Updated to pass protobuf
messages directly to parse functions
- `pinecone/utils/response_info.py`: Optimized `extract_response_info()`
with module-level constants and early returns
- `tests/perf/test_fetch_response_optimization.py`: New performance
tests for fetch response parsing
- `tests/perf/test_query_response_optimization.py`: New performance
tests for query response parsing
- `tests/perf/test_other_parse_methods.py`: New performance tests for
all other parse methods
- `tests/perf/test_grpc_parsing_perf.py`: Extended with additional
benchmarks

### Technical Details

**Core Optimizations**:

1. **`_struct_to_dict()` Helper Function**:
   - Directly accesses protobuf `Struct` and `Value` fields
   - Handles all value types (null, number, string, bool, struct, list)
   - Recursively processes nested structures
- ~1.5-2x faster than `json_format.MessageToDict` for metadata
conversion

2. **List Pre-allocation**:
- `parse_query_response`: Pre-allocates `matches` list with `[None] *
len(matches_proto)`
   - `parse_list_namespaces_response`: Pre-allocates `namespaces` list
   - ~6.5% performance improvement over append-based construction

3. **Direct SparseValues Creation**:
- Replaced `parse_sparse_values(dict)` with direct
`SparseValues(indices=..., values=...)` creation
   - ~410x faster (avoids dict creation and conversion overhead)

## Testing

- All existing unit tests pass (224 tests in `tests/unit_grpc`)
- Comprehensive pytest benchmark tests added for all optimized
functions:
- `test_fetch_response_optimization.py`: Tests for fetch response with
varying metadata sizes
- `test_query_response_optimization.py`: Tests for query response with
varying match counts, dimensions, metadata sizes, and sparse vectors
- `test_other_parse_methods.py`: Tests for all other parse methods
(fetch_by_metadata, list_namespaces, stats, upsert, update,
namespace_description)
- Mypy type checking passes with and without grpc extras (with types
extras)
- No breaking changes - output format remains identical

## Related

This addresses the performance issue reported in PR #537, implementing a
similar optimization approach but adapted for the current codebase
structure. All parse methods have been optimized with comprehensive
performance testing to verify improvements.
@yorickvP yorickvP closed this Nov 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant