.Net: Support for configuring dimensions in Google AI embeddings generation by ArieSLV · Pull Request #10489 · microsoft/semantic-kernel

ArieSLV · 2025-02-11T18:35:39Z

Motivation and Context

This change addresses a limitation in the current implementation of the Google AI embeddings generation service in Semantic Kernel. Currently, users cannot configure the output dimensionality of the embeddings, even though the underlying Google AI API supports specifying the number of dimensions via the output_dimensionality parameter.

Why is this change required?
Allowing configuration of the dimensions provides greater flexibility for users to tailor the embeddings to their specific use cases—whether for optimizing memory usage, improving performance, or ensuring compatibility with downstream systems that expect a particular embedding size.

What problem does it solve?
It solves the issue of inflexibility by exposing the dimensions parameter in the service constructors, builder methods, and API request payloads. This ensures that developers can leverage the full capabilities of the Google API without being limited to the default embedding size.

What scenario does it contribute to?
This feature is particularly useful in scenarios where:

Users need to optimize storage or computational resources.
Downstream tasks or integrations require embeddings of a specific dimensionality.
Fine-tuning the model output is essential for performance or compatibility reasons.
Resolves .Net: New Feature: Support for configuring dimensions in Google AI embeddings generation #10488

Description

This PR introduces support for specifying the output dimensionality in the Google AI embeddings generation workflow. The main changes include:

Service Constructor Update:
The GoogleAITextEmbeddingGenerationService constructor now accepts an optional dimensions parameter, which is then forwarded to the lower-level client implementations.
Builder and Extension Methods:
Extension methods such as AddGoogleAIEmbeddingGeneration have been updated to accept a dimensions parameter. This allows developers to configure the embedding dimensions using the builder pattern.
Request Payload Enhancement:
The GoogleAIEmbeddingRequest class now includes a new optional property Dimensions (serialized as output_dimensionality). When provided, this value is included in the JSON payload sent to the Google AI API.
Metadata and Attributes Update:
The service’s metadata now reflects the provided dimensions, ensuring consistency in configuration tracking.
Unit Testing:
New unit tests have been added to confirm that:
- When a dimensions value is provided, it is correctly included in the JSON request.
- When not provided, the default behavior remains unchanged.

This enhancement maintains backward compatibility since the new parameter is optional. Existing implementations that do not specify a dimension will continue to work as before.

Contribution Checklist

The code builds clean without any errors or warnings.
The PR follows the SK Contribution Guidelines and the pre-submission formatting script raises no violations.
All unit tests pass, and I have added new tests where possible.
I didn't break anyone 😄

ArieSLV · 2025-02-12T07:29:03Z

Hi team,
I see the Contributor License Agreement (CLA) message and understand that I need to agree to it before my submission can be processed. I've reached out to my manager for approval to sign the CLA under my employer’s name. Once I receive the necessary clearance, I will reply accordingly.
I appreciate your patience.

ArieSLV · 2025-02-16T09:31:34Z

@microsoft-github-policy-service agree

ArieSLV · 2025-02-21T14:56:43Z

Hi team,
The Contributor License Agreement issue has been resolved. Are any further actions required from me to get a review?

dotnet/src/SemanticKernel.Abstractions/Services/AIServiceExtensions.cs

...nnectors/Connectors.Google.UnitTests/Services/GoogleAITextEmbeddingGenerationServiceTests.cs

rogerbarreto · 2025-02-28T12:08:31Z

Hi @ArieSLV thanks for you contributions, most of it looking good so far.

Some small spell check errors to fix.
Missing Integration tests, ensure we have one integration test (can be marked as skip but working in your local environment tests`.
Add a sample to the Concepts\Memory\Google_EmbeddingGeneration.cs for this update, will be important exploring this new feature. Similar how we have here: OpenAI_EmbeddingGeneration.cs

…ration (PR comments)

ArieSLV · 2025-03-08T13:09:17Z

Hi @rogerbarreto,

Thanks for the thorough review. I've addressed all your feedback in the latest changes:

Added integration tests in EmbeddingGenerationTests.cs with both default and custom dimensions scenarios
Created a sample implementation in Concepts\Memory\Google_EmbeddingGeneration.cs showing how to use the new functionality
Added unit tests that verify proper request formatting with and without dimensions

Regarding the spell check errors, I believe those might have been resolved during the merge as they weren't directly related to my code changes.

dotnet/src/Connectors/Connectors.Google/Core/GoogleAI/GoogleAIEmbeddingRequest.cs

…mbedding requests

…ration (microsoft#10489) ### Motivation and Context This change addresses a limitation in the current implementation of the Google AI embeddings generation service in Semantic Kernel. Currently, users cannot configure the output dimensionality of the embeddings, even though the underlying Google AI API supports specifying the number of dimensions via the `output_dimensionality` parameter. **Why is this change required?** Allowing configuration of the dimensions provides greater flexibility for users to tailor the embeddings to their specific use cases—whether for optimizing memory usage, improving performance, or ensuring compatibility with downstream systems that expect a particular embedding size. **What problem does it solve?** It solves the issue of inflexibility by exposing the `dimensions` parameter in the service constructors, builder methods, and API request payloads. This ensures that developers can leverage the full capabilities of the Google API without being limited to the default embedding size. **What scenario does it contribute to?** This feature is particularly useful in scenarios where: - Users need to optimize storage or computational resources. - Downstream tasks or integrations require embeddings of a specific dimensionality. - Fine-tuning the model output is essential for performance or compatibility reasons. Relevant issue link: microsoft#10488 ### Description This PR introduces support for specifying the output dimensionality in the Google AI embeddings generation workflow. The main changes include: - **Service Constructor Update:** The `GoogleAITextEmbeddingGenerationService` constructor now accepts an optional `dimensions` parameter, which is then forwarded to the lower-level client implementations. - **Builder and Extension Methods:** Extension methods such as `AddGoogleAIEmbeddingGeneration` have been updated to accept a `dimensions` parameter. This allows developers to configure the embedding dimensions using the builder pattern. - **Request Payload Enhancement:** The `GoogleAIEmbeddingRequest` class now includes a new optional property `Dimensions` (serialized as `output_dimensionality`). When provided, this value is included in the JSON payload sent to the Google AI API. - **Metadata and Attributes Update:** The service’s metadata now reflects the provided dimensions, ensuring consistency in configuration tracking. - **Unit Testing:** New unit tests have been added to confirm that: - When a `dimensions` value is provided, it is correctly included in the JSON request. - When not provided, the default behavior remains unchanged. This enhancement maintains backward compatibility since the new parameter is optional. Existing implementations that do not specify a dimension will continue to work as before. ### Contribution Checklist - [x] The code builds clean without any errors or warnings. - [x] The PR follows the [SK Contribution Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md) and the [pre-submission formatting script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts) raises no violations. - [x] All unit tests pass, and I have added new tests where possible. - [x] I didn't break anyone 😄 --------- Co-authored-by: Roger Barreto <19890735+RogerBarreto@users.noreply.github.com>

#13612) ### Motivation and Context **Why is this change required?** PR #10489 added support for configuring embedding [dimensions] (outputDimensionality) for the Google AI connector, but the equivalent Vertex AI connector was not updated. This means specifying [Dimensions] in [EmbeddingGenerationOptions] or via the constructor has no effect when using Vertex AI — the API always returns the model's default dimensionality. **What problem does it solve?** When using [VertexAIEmbeddingGenerator] or [VertexAITextEmbeddingGenerationService]with a [dimensions] value (e.g. 128), the output embedding length is the model default (e.g. 3072) instead of the requested size. **What scenario does it contribute to?** Users who need to control embedding dimensionality for storage optimization, performance, or compatibility with downstream systems when using the Vertex AI endpoint. Fixes: #12988 ### Description This PR adds outputDimensionality support to the Vertex AI embedding connector, mirroring the existing Google AI implementation from PR #10489. The Google connector has two parallel embedding paths — Google AI (uses API key, calls generativelanguage.googleapis.com) and Vertex AI (uses bearer token, calls [{location}-aiplatform.googleapis.com]). PR #10489 only wired up dimensions for the Google AI path. This PR applies the same pattern to every layer of the Vertex AI path. The key structural difference between the two APIs is where outputDimensionality goes in the request JSON: Google AI puts it per-content-item: `{ "requests": [{ "content": {...}, "outputDimensionality": 128 }] }` Vertex AI puts it in the shared parameters block: `{ "instances": [...], "parameters": { "autoTruncate": false, "outputDimensionality": 128 } }` The implementation follows this difference. In [VertexAIEmbeddingRequest], outputDimensionality is added to the existing [RequestParameters] class (alongside autoTruncate), rather than on each instance item. Dimensions flow through the same chain as Google AI: 1. Extension methods accept int? dimensions = null and pass it to the generator/service constructor 2. [VertexAIEmbeddingGenerator] passes it to [VertexAITextEmbeddingGenerationService] 3. The service passes it to [VertexAIEmbeddingClient], which stores it as a default 4. At request time, the client resolves the final value as [options?.Dimensions ?? this._dimensions] — runtime [EmbeddingGenerationOptions] take priority over the constructor default 5. [VertexAIEmbeddingRequest.FromData()] sets it on the parameters block, and [JsonIgnore(WhenWritingNull)] ensures it's omitted when not specified All new parameters default to null, preserving full backward compatibility. ### Contribution Checklist  - [x] The code builds clean without any errors or warnings - [x] The PR follows the [SK Contribution Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md) and the [pre-submission formatting script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts) raises no violations - [x] All unit tests pass, and I have added new tests where possible - [x] I didn't break anyone 😄 --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: abbottdev <3226335+abbottdev@users.noreply.github.com> Co-authored-by: westey <164392973+westey-m@users.noreply.github.com>

Support for configuring dimensions in Google AI embeddings generation

8656ecb

ArieSLV requested a review from a team as a code owner February 11, 2025 18:35

markwallace-microsoft added .NET Issue or Pull requests regarding .NET code kernel Issues or pull requests impacting the core kernel kernel.core labels Feb 11, 2025

markwallace-microsoft assigned rogerbarreto Feb 13, 2025

ArieSLV temporarily deployed to integration February 13, 2025 16:34 — with GitHub Actions Inactive

markwallace-microsoft mentioned this pull request Feb 24, 2025

.Net: Phase 1 of the declarative agent schema for review #10260

Merged

4 tasks

rogerbarreto reviewed Feb 28, 2025

View reviewed changes

dotnet/src/SemanticKernel.Abstractions/Services/AIServiceExtensions.cs Outdated Show resolved Hide resolved

...nnectors/Connectors.Google.UnitTests/Services/GoogleAITextEmbeddingGenerationServiceTests.cs Show resolved Hide resolved

Merge branch 'main' into google-connector-dimensions

1c36ffa

rogerbarreto temporarily deployed to integration February 28, 2025 12:09 — with GitHub Actions Inactive

rogerbarreto added the PR: feedback to address Waiting for PR owner to address comments/questions label Feb 28, 2025

.Net: Support for configuring dimensions in Google AI embeddings gene…

d583bfe

…ration (PR comments)

ArieSLV requested a review from rogerbarreto March 8, 2025 13:09

Merge branch 'main' into google-connector-dimensions

1af7f37

rogerbarreto temporarily deployed to integration March 10, 2025 15:28 — with GitHub Actions Inactive

rogerbarreto added 2 commits March 11, 2025 20:51

Merge branch 'main' into google-connector-dimensions

3981a47

Merge branch 'main' into google-connector-dimensions

5e8ebf6

rogerbarreto removed the PR: feedback to address Waiting for PR owner to address comments/questions label Mar 12, 2025

rogerbarreto assigned ArieSLV Mar 12, 2025

rogerbarreto temporarily deployed to integration March 12, 2025 17:34 — with GitHub Actions Inactive

ppekrol mentioned this pull request Mar 13, 2025

RavenDB-23556 AI Integration - Embeddings generation tasks ravendb/ravendb#20273

Merged

25 tasks

Adjustments and readiness

62a9b3b

rogerbarreto had a problem deploying to integration March 15, 2025 11:27 — with GitHub Actions Error

rogerbarreto approved these changes Mar 15, 2025

View reviewed changes

Merge branch 'main' into google-connector-dimensions

88cb88e

rogerbarreto had a problem deploying to integration March 15, 2025 11:30 — with GitHub Actions Failure

Merge branch 'main' into google-connector-dimensions

56aaced

rogerbarreto temporarily deployed to integration March 18, 2025 10:11 — with GitHub Actions Inactive

markwallace-microsoft reviewed Mar 18, 2025

View reviewed changes

dotnet/src/Connectors/Connectors.Google/Core/GoogleAI/GoogleAIEmbeddingRequest.cs Show resolved Hide resolved

.Net: Fix JSON property name for output dimensionality in Google AI e…

84e91ac

…mbedding requests

ArieSLV had a problem deploying to integration March 18, 2025 14:50 — with GitHub Actions Error

Merge branch 'main' into google-connector-dimensions

bb521b0

rogerbarreto approved these changes Mar 18, 2025

View reviewed changes

rogerbarreto enabled auto-merge March 18, 2025 14:51

rogerbarreto temporarily deployed to integration March 18, 2025 14:51 — with GitHub Actions Inactive

markwallace-microsoft approved these changes Mar 18, 2025

View reviewed changes

rogerbarreto added this pull request to the merge queue Mar 18, 2025

Merged via the queue into microsoft:main with commit fc6c2d4 Mar 18, 2025
20 checks passed

github-project-automation bot moved this from Community PRs to Sprint: Done in Semantic Kernel Mar 18, 2025

abbottdev mentioned this pull request Aug 22, 2025

.Net: Bug: Embedding generation dimensions works in Google Gemini but fails in VertexAI #12988

Closed

abbottdev mentioned this pull request Mar 1, 2026

.Net: feat: Add support for dimensions in Vertex AI embedding services #13612

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.Net: Support for configuring dimensions in Google AI embeddings generation#10489

.Net: Support for configuring dimensions in Google AI embeddings generation#10489
rogerbarreto merged 11 commits intomicrosoft:mainfrom
ArieSLV:google-connector-dimensions

ArieSLV commented Feb 11, 2025 •

edited by rogerbarreto

Loading

Uh oh!

ArieSLV commented Feb 12, 2025

Uh oh!

ArieSLV commented Feb 16, 2025

Uh oh!

ArieSLV commented Feb 21, 2025

Uh oh!

Uh oh!

Uh oh!

rogerbarreto commented Feb 28, 2025 •

edited

Loading

Uh oh!

ArieSLV commented Mar 8, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ArieSLV commented Feb 11, 2025 • edited by rogerbarreto Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

Description

Contribution Checklist

Uh oh!

ArieSLV commented Feb 12, 2025

Uh oh!

ArieSLV commented Feb 16, 2025

Uh oh!

ArieSLV commented Feb 21, 2025

Uh oh!

Uh oh!

Uh oh!

rogerbarreto commented Feb 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ArieSLV commented Mar 8, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ArieSLV commented Feb 11, 2025 •

edited by rogerbarreto

Loading

rogerbarreto commented Feb 28, 2025 •

edited

Loading