Optimize gRPC Response Parsing Performance #553

jhamon · 2025-11-17T20:06:50Z

Problem

The current implementation uses json_format.MessageToDict to convert entire protobuf messages to dictionaries when parsing gRPC responses. This is a significant CPU bottleneck when processing large numbers of vectors, as reported in PR #537 where users experienced ~100 vectors/second throughput.

The MessageToDict conversion is expensive because it:

Serializes the entire protobuf message to JSON
Deserializes it back into a Python dictionary
Does this for every field, even when we only need specific fields

Additionally, several other performance issues were identified:

Metadata conversion using MessageToDict on Struct messages
Inefficient list construction (append vs pre-allocation)
Unnecessary dict creation for SparseValues parsing
Response header processing overhead

Solution

Optimized all gRPC response parsing functions in pinecone/grpc/utils.py to directly access protobuf fields instead of converting entire messages to dictionaries. This approach:

Directly accesses protobuf fields: Uses response.vectors, response.matches, response.namespace, etc. directly
Optimized metadata conversion: Created _struct_to_dict() helper that directly accesses Struct fields (~1.5-2x faster than MessageToDict)
Pre-allocates lists: Uses [None] * len() for known-size lists (~6.5% improvement)
Direct SparseValues creation: Creates SparseValues objects directly instead of going through dict conversion (~410x faster)
Caches protobuf attributes: Stores repeated attribute accesses in local variables
Optimized response info extraction: Improved extract_response_info() performance with module-level constants and early returns
Maintains backward compatibility: Output format remains identical to the previous implementation

Performance Impact

Performance testing of the response parsing functions show significant improvements across all optimized functions.

Changes

Modified Files

pinecone/grpc/utils.py: Optimized 9 response parsing functions with direct protobuf field access
- Added _struct_to_dict() helper for optimized metadata conversion (~1.5-2x faster)
- Pre-allocated lists where size is known (~6.5% improvement)
- Direct SparseValues creation (removed dict conversion overhead)
- Cached protobuf message attributes
- Removed dead code paths (dict fallback in parse_usage)
pinecone/grpc/index_grpc.py: Updated to pass protobuf messages directly to parse functions
pinecone/grpc/resources/vector_grpc.py: Updated to pass protobuf messages directly to parse functions
pinecone/utils/response_info.py: Optimized extract_response_info() with module-level constants and early returns
tests/perf/test_fetch_response_optimization.py: New performance tests for fetch response parsing
tests/perf/test_query_response_optimization.py: New performance tests for query response parsing
tests/perf/test_other_parse_methods.py: New performance tests for all other parse methods
tests/perf/test_grpc_parsing_perf.py: Extended with additional benchmarks

Technical Details

Core Optimizations:

_struct_to_dict() Helper Function:
- Directly accesses protobuf Struct and Value fields
- Handles all value types (null, number, string, bool, struct, list)
- Recursively processes nested structures
- ~1.5-2x faster than json_format.MessageToDict for metadata conversion
List Pre-allocation:
- parse_query_response: Pre-allocates matches list with [None] * len(matches_proto)
- parse_list_namespaces_response: Pre-allocates namespaces list
- ~6.5% performance improvement over append-based construction
Direct SparseValues Creation:
- Replaced parse_sparse_values(dict) with direct SparseValues(indices=..., values=...) creation
- ~410x faster (avoids dict creation and conversion overhead)

Testing

All existing unit tests pass (224 tests in tests/unit_grpc)
Comprehensive pytest benchmark tests added for all optimized functions:
- test_fetch_response_optimization.py: Tests for fetch response with varying metadata sizes
- test_query_response_optimization.py: Tests for query response with varying match counts, dimensions, metadata sizes, and sparse vectors
- test_other_parse_methods.py: Tests for all other parse methods (fetch_by_metadata, list_namespaces, stats, upsert, update, namespace_description)
Mypy type checking passes with and without grpc extras (with types extras)
No breaking changes - output format remains identical

⚠️ **Python 3.9 is no longer supported.** The SDK now requires Python 3.10 or later. Python 3.9 reached end-of-life on October 2, 2025. Users must upgrade to Python 3.10+ to continue using the SDK. ⚠️ **Namespace parameter default behavior changed.** The SDK no longer applies default values for the `namespace` parameter in GRPC methods. When `namespace=None`, the parameter is omitted from requests, allowing the API to handle namespace defaults appropriately. This change affects `upsert_from_dataframe` methods in GRPC clients. The API is moving toward `"__default__"` as the default namespace value, and this change ensures the SDK doesn't override API defaults. Note: The official SDK package was renamed last year from `pinecone-client` to `pinecone` beginning in version 5.1.0. Please remove `pinecone-client` from your project dependencies and add `pinecone` instead to get the latest updates if upgrading from earlier versions. You can now configure dedicated read nodes for your serverless indexes, giving you more control over query performance and capacity planning. By default, serverless indexes use OnDemand read capacity, which automatically scales based on demand. With dedicated read capacity, you can allocate specific read nodes with manual scaling control. **Create an index with dedicated read capacity:** ```python from pinecone import ( Pinecone, ServerlessSpec, CloudProvider, AwsRegion, Metric ) pc = Pinecone() pc.create_index( name='my-index', dimension=1536, metric=Metric.COSINE, spec=ServerlessSpec( cloud=CloudProvider.AWS, region=AwsRegion.US_EAST_1, read_capacity={ "mode": "Dedicated", "dedicated": { "node_type": "t1", "scaling": "Manual", "manual": { "shards": 2, "replicas": 2 } } } ) ) ``` **Configure read capacity on an existing index:** You can switch between OnDemand and Dedicated modes, or adjust the number of shards and replicas for dedicated read capacity: ```python from pinecone import Pinecone pc = Pinecone() pc.configure_index( name='my-index', read_capacity={"mode": "OnDemand"} ) pc.configure_index( name='my-index', read_capacity={ "mode": "Dedicated", "dedicated": { "node_type": "t1", "scaling": "Manual", "manual": { "shards": 3, "replicas": 2 } } } ) pc.configure_index( name='my-index', read_capacity={ "mode": "Dedicated", "dedicated": { "node_type": "t1", "scaling": "Manual", "manual": { "shards": 4, "replicas": 3 } } } ) ``` When you change read capacity configuration, the index will transition to the new configuration. You can use `describe_index` to check the status of the transition. See [PR #528](#528) for details. You can now fetch vectors using metadata filters instead of vector IDs. This is especially useful when you need to retrieve vectors based on their metadata properties. ```python from pinecone import Pinecone pc = Pinecone() index = pc.Index(host="your-index-host") response = index.fetch_by_metadata( filter={'genre': {'$in': ['comedy', 'drama']}, 'year': {'$eq': 2019}}, namespace='my_namespace', limit=50 ) print(f"Found {len(response.vectors)} vectors") for vec_id, vector in response.vectors.items(): print(f"ID: {vec_id}, Metadata: {vector.metadata}") ``` **Pagination support:** When fetching large numbers of vectors, you can use pagination tokens to retrieve results in batches: ```python response = index.fetch_by_metadata( filter={'status': 'active'}, limit=100 ) if response.pagination and response.pagination.next: next_response = index.fetch_by_metadata( filter={'status': 'active'}, pagination_token=response.pagination.next, limit=100 ) ``` The update method used to require a vector id to be passed, but now you have the option to pass a metadata filter instead. This is useful for bulk metadata updates across many vectors. There is also a dry_run option that allows you to preview the number of vectors that would be changed by the update before performing the operation. ```python from pinecone import Pinecone pc = Pinecone() index = pc.Index(host="your-index-host") response = index.update( set_metadata={'status': 'active'}, filter={'genre': {'$eq': 'drama'}}, dry_run=True ) print(f"Would update {response.matched_records} vectors") response = index.update( set_metadata={'status': 'active'}, filter={'genre': {'$eq': 'drama'}} ) ``` A new `FilterBuilder` utility class provides a type-safe, fluent interface for constructing metadata filters. While perhaps a bit verbose, it can help prevent common errors like misspelled operator names and provides better IDE support. When you chain `.build()` onto the `FilterBuilder` it will emit a python dictionary representing the filter. Methods that take metadata filters as arguments will continue to accept dictionaries as before. ```python from pinecone import Pinecone, FilterBuilder pc = Pinecone() index = pc.Index(host="your-index-host") filter1 = FilterBuilder().eq("genre", "drama").build() filter2 = (FilterBuilder().eq("genre", "drama") & FilterBuilder().gt("year", 2020)).build() filter3 = (FilterBuilder().eq("genre", "comedy") | FilterBuilder().eq("genre", "drama")).build() filter4 = ((FilterBuilder().eq("genre", "drama") & FilterBuilder().gte("year", 2020)) | (FilterBuilder().eq("genre", "comedy") & FilterBuilder().lt("year", 2000))).build() response = index.fetch_by_metadata(filter=filter2, limit=50) index.update( set_metadata={'status': 'archived'}, filter=filter3 ) ``` The FilterBuilder supports all Pinecone filter operators: `eq`, `ne`, `gt`, `gte`, `lt`, `lte`, `in_`, `nin`, and `exists`. Compound expressions are build with and `&` and or `|`. See [PR #529](#529) for `fetch_by_metadata`, [PR #544](#544) for `update()` with filter, and [PR #531](#531) for FilterBuilder. You can now create namespaces in serverless indexes directly from the SDK: ```python from pinecone import Pinecone pc = Pinecone() index = pc.Index(host="your-index-host") namespace = index.create_namespace(name="my-namespace") print(f"Created namespace: {namespace.name}, Vector count: {namespace.vector_count}") namespace = index.create_namespace( name="my-namespace", schema={ "fields": { "genre": {"filterable": True}, "year": {"filterable": True} } } ) ``` **Note:** This operation is not supported for pod-based indexes. See [PR #532](#532) for details. For sparse indexes with integrated embedding configured to use the `pinecone-sparse-english-v0` model, you can now specify which terms must be present in search results: ```python from pinecone import Pinecone, SearchQuery pc = Pinecone() index = pc.Index(host="your-index-host") response = index.search( namespace="my-namespace", query=SearchQuery( inputs={"text": "Apple corporation"}, top_k=10, match_terms={ "strategy": "all", "terms": ["apple", "corporation"] } ) ) ``` The `match_terms` parameter ensures that all specified terms must be present in the text of each search hit. Terms are normalized and tokenized before matching, and order does not matter. See [PR #530](#530) for details. **Update API keys, projects, and organizations:** ```python from pinecone import Admin admin = Admin() # Auth with PINECONE_CLIENT_ID and PINECONE_CLIENT_SECRET api_key = admin.api_key.update( api_key_id='my-api-key-id', name='updated-api-key-name', roles=['ProjectEditor', 'DataPlaneEditor'] ) project = admin.project.update( project_id='my-project-id', name='updated-project-name', max_pods=10, force_encryption_with_cmek=True ) organization = admin.organization.update( organization_id='my-org-id', name='updated-organization-name' ) ``` **Delete organizations:** ```python from pinecone import Admin admin = Admin() admin.organization.delete(organization_id='my-org-id') ``` See [PR #527](#527) and [PR #543](#543) for details. You can now configure which metadata fields are filterable when creating serverless indexes. This helps optimize performance by only indexing metadata fields that you plan to use for filtering: ```python from pinecone import ( Pinecone, ServerlessSpec, CloudProvider, AwsRegion, Metric ) pc = Pinecone() pc.create_index( name='my-index', dimension=1536, metric=Metric.COSINE, spec=ServerlessSpec( cloud=CloudProvider.AWS, region=AwsRegion.US_EAST_1, schema={ "genre": {"filterable": True}, "year": {"filterable": True}, "rating": {"filterable": True} } ) ) ``` When using schemas, only fields marked as `filterable: True` in the schema can be used in metadata filters. See [PR #528](#528) for details. The SDK now exposes header information from API responses. This information is available in response objects via the `_response_info` attribute and can be useful for debugging and monitoring. ```python from pinecone import Pinecone pc = Pinecone() index = pc.Index(host="your-index-host") response = index.query( vector=[0.1, 0.2, 0.3, ...], top_k=10, namespace='my_namespace' ) for k, v in response._response_info.get('raw_headers').items(): print(f"{k}: {v}") ``` See [PR #539](#539) for details. We've replaced Python's standard library `json` module with `orjson`, a fast JSON library written in Rust. This provides significant performance improvements for both serialization and deserialization of request payloads: - **Serialization (dumps)**: 10-23x faster depending on payload size - **Deserialization (loads)**: 4-7x faster depending on payload size These improvements are especially beneficial for: - High-throughput applications making many API calls - Applications handling large vector payloads - Real-time applications where latency matters No code changes are required - the API remains the same, and you'll automatically benefit from these performance improvements. See [PR #556](#556) for details. We've optimized gRPC response parsing by replacing `json_format.MessageToDict` with direct protobuf field access. This optimization provides approximately 2x faster response parsing for gRPC operations. Special thanks to [@yorickvP](https://github.com/yorickvP) for surfacing the `json_format.MessageToDict` refactor opportunity. While we didn't merge the specific PR, yorick's insight led us to implement a similar optimization that significantly improves gRPC performance. See [PR #553](#553) for details. - **Type hints and IDE support**: Comprehensive type hints throughout the SDK improve IDE autocomplete and type checking. The SDK now uses Python 3.10+ type syntax throughout. - **Documentation**: Updated docstrings with RST formatting and code examples for better developer experience. - **Dependency updates**: Updated protobuf to 5.29.5 to address security vulnerabilities. Updated `pinecone-plugin-assistant` to version 3.0.1. - **Build system**: Migrated from poetry to uv for faster dependency management. - [@yorickvP](https://github.com/yorickvP) - Thanks for surfacing the gRPC response parsing optimization opportunity!

jhamon added 8 commits November 17, 2025 14:59

Improve parsing of fetch response

9b038aa

Refactor message parsing for more actions

889534d

More refactoring

390ed24

Iterate

b3ca7ed

Leave headers exactly as they are coming back from the server

4a8bd46

More speed

c25c04e

More speeed

653b5fd

More speeeed

4fbcde2

jhamon changed the title ~~Improve parsing of fetch response~~ Optimize gRPC Response Parsing Performance Nov 18, 2025

jhamon marked this pull request as ready for review November 18, 2025 07:01

jhamon merged commit 27e751c into release-candidate/2025-10 Nov 18, 2025
27 checks passed

jhamon deleted the jhamon/grpc-parse-perf branch November 18, 2025 07:02

jhamon mentioned this pull request Nov 18, 2025

Merge release-candidate/2025-10 #562

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize gRPC Response Parsing Performance #553

Optimize gRPC Response Parsing Performance #553

Uh oh!

jhamon commented Nov 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Optimize gRPC Response Parsing Performance #553

Optimize gRPC Response Parsing Performance #553

Uh oh!

Conversation

jhamon commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Performance Impact

Changes

Modified Files

Technical Details

Testing

Related

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jhamon commented Nov 17, 2025 •

edited

Loading