Skip to content

Conversation

@shyamnamboodiripad
Copy link
Contributor

@shyamnamboodiripad shyamnamboodiripad commented May 1, 2025

This was an unfortunate regression that was introduced during a recent refactoring.

The metrics returned from the Azure AI Foundry Evaluation service have different names than the ones we use in the Safety library. We translate the EvaluationMetric.Name of the metrics returned by the service to the more display friendly names used in the library before returning the metrics to the caller.

While the returned metrics were correctly patched up, the EvaluationResult.Metrics dictionary still stored these metrics by the original names returned from the service. Unfortunately, this meant EvaluationResult.Get would throw an exception when trying to fetch metric with name ViolenceEvaluator.ViolenceMetricName.

The fix in this PR fixes the keys in the dictionary as well. This PR also updates tests to cover the case being fixed.

Fixes #6360

Microsoft Reviewers: Open in CodeFlow

…ct metric names for Safety evaluators

This is an unfortunate regression that was introduced during a recent refactoring.

The metrics returned from the Azure AI Foundry Evaluation service have different names than the ones we use in the Safety library. We translate the EvaluationMetric.Name name of the metrics returned by the service to the more display friendly names before returning the metrics to the caller.

While the returned metrics were correctly patched up, the EvaluationResult.Metrics dictionary still stored metrics by the original names returned by the service. Unfortunately, this means EvaluationResult.Get now throws an exception when trying to fetch metric with name ViolenceEvaluator.ViolenceMetricName. The fix would be to patch up the keys in the dictionary as well.
@shyamnamboodiripad shyamnamboodiripad requested a review from a team as a code owner May 1, 2025 22:18
@github-actions github-actions bot added the area-ai-eval Microsoft.Extensions.AI.Evaluation and related label May 1, 2025
@shyamnamboodiripad shyamnamboodiripad changed the title Fix keys for EvaluationResult.Metrics dictionary to reflect the corre… Fix keys for EvaluationResult.Metrics dictionary to reflect the correct metric names for Safety evaluators May 1, 2025
@shyamnamboodiripad shyamnamboodiripad enabled auto-merge (squash) May 1, 2025 22:30
@shyamnamboodiripad shyamnamboodiripad merged commit 6204b0b into dotnet:main May 2, 2025
7 checks passed
jeffhandley pushed a commit that referenced this pull request May 9, 2025
…ct metric names for Safety evaluators (#6361)

This was an unfortunate regression that was introduced during a recent refactoring.

The metrics returned from the Azure AI Foundry Evaluation service have different names than the ones we use in the Safety library. We translate the EvaluationMetric.Name of the metrics returned by the service to the more display friendly names used in the library before returning the metrics to the caller.

While the returned metrics were correctly patched up, the EvaluationResult.Metrics dictionary still stored these metrics by the original names returned from the service. Unfortunately, this meant EvaluationResult.Get would throw an exception when trying to fetch metric with name ViolenceEvaluator.ViolenceMetricName.

The fix in this commit fixes the keys in the dictionary as well. This commit also updates tests to cover the case being fixed.

Fixes #6360
jeffhandley added a commit that referenced this pull request May 9, 2025
* Update CHANGELOGS for M.E.AI libs (#6359)

* Add project template build tests + CG reporting (#6355)

Co-authored-by: Steve Sanderson <SteveSandersonMS@users.noreply.github.com>

* Use ConversationId instead of ChatThreadId (#6367)

* Update OpenTelemetryChatClient/EmbeddingGenerator to 1.33 (#6366)

Also adds the Enable SensitiveData property to OpenTelemetryEmbeddingGenerators. This was missed when adding it to OpenTelemetryChatClient.

* Add AsIEmbeddingGenerator for Azure.AI.Inference ImageEmbeddingsClient (#6363)

* Add DataContent.Base64Data (#6365)

* Fix AIFunctionFactory handling of default struct arguments (#6381)

* Fix AIFunctionFactory handling of default struct arguments

* Extend testing to custom structs

* Use GetUninitializedObject instead of Activator

* Add JS dependency update instructions to chat template README (#6376)

* Add see also links to conceptual docs (#6368)

* Translate OpenAI refusals to ErrorContent (#6393)

Refusals in OpenAI are errors reported when the service can't generate an output that matches the requested schema. Translate refusals to ErrorContent now that we have it.

* Rename useJsonSchema parameter (#6394)

* Rename useJsonSchema parameter

The `GetResponseAsync<T>` methods accept a `bool?` parameter currently called `useJsonSchema`. This is confusing, because the whole point of the method is to create and use a JSON schema from the `T`. The parameter actually controls _how_ that schema is used, whether it's included as part of the messages (false), as part of a ChatResponseFormat in the ChatOptions (true), or up to the system to decide.

I've clarified it by renaming it from `useJsonSchema` to `useJsonSchemaResponseFormat`. It's wordier, but it disambiguates the intent.

* Update src/Libraries/Microsoft.Extensions.AI/ChatCompletion/ChatClientStructuredOutputExtensions.cs

Co-authored-by: Eirik Tsarpalis <eirik.tsarpalis@gmail.com>

---------

Co-authored-by: Eirik Tsarpalis <eirik.tsarpalis@gmail.com>

* Add JSON schema transformation functionality to `AIJsonUtilities` (#6383)

* Add initial schema transformation functionality and incorporate into the OpenAI leaf client.

* Update all leaf client implementions, improve naming, add testing.

* Remove redundant suppressions

* Address feedback.

* Add ChatOptions.RawRepresentationFactory (#6319)

* Look for OpenAI.ChatCompletionOptions in top-level additional properties and stop looking for individually specific additional properties

* Add RawRepresentation to ChatOptions and use it in OpenAI and AzureAIInference

* Remove now unused locals

* Add [JsonIgnore] and update roundtrip tests

* Overwirte properties only if the underlying model don't specify it already

* Clone RawRepresentation

* Reflection workaround for ToolChoice not being cloned

* Style changes

* AI.Inference: Bring back propagation of additional properties

* Don't use 0.1f, it doesn't roundtrip properly in .NET Framework

* Add RawRepresentationFactory instead of object? property

* Augment remarks to discourage returning shared instances

* Documentation feedback

* AI.Inference: keep passing TopK as AdditionalProperty if not already there

* Avoid caching in CachingChatClient when ConversationId is set (#6400)

* Add comment LoggingChatClient et al trace-level logging (#6391)

Also fixed the name of the LoggingSpeechToTextClientBuilderExtensions type to conform to patterns used elsewhere in the library.

* Fix test validation of aggregate usage counts (#6401)

* Add BinaryEmbedding (#6398)

* Add BinaryEmbedding

Also:
- Renames the polymorphic discriminators to conform with typical lingo for these types.
- Adds an Embedding.Dimensions virtual property.

* Fix keys for EvaluationResult.Metrics dictionary to reflect the correct metric names for Safety evaluators (#6361)

This was an unfortunate regression that was introduced during a recent refactoring.

The metrics returned from the Azure AI Foundry Evaluation service have different names than the ones we use in the Safety library. We translate the EvaluationMetric.Name of the metrics returned by the service to the more display friendly names used in the library before returning the metrics to the caller.

While the returned metrics were correctly patched up, the EvaluationResult.Metrics dictionary still stored these metrics by the original names returned from the service. Unfortunately, this meant EvaluationResult.Get would throw an exception when trying to fetch metric with name ViolenceEvaluator.ViolenceMetricName.

The fix in this commit fixes the keys in the dictionary as well. This commit also updates tests to cover the case being fixed.

Fixes #6360

* Update branding for Azure service used by Safety evaluators (#6362)

`Azure AI Content Safety service` -> `Azure AI Foundry Evaluation service`

* Remove CacheOptions from DiskBasedResponseCache (#6395)

Details for this change are available in #6387.

Fixes #6387

* Add back net9.0 version of the aieval dotnet tool (#6396)

In #6148, we disabled net9.0 TFM for the aieval tool to work around the race described in dotnet/sdk#47696. The underlying issue was subsequently fixed in the SDK (via dotnet/sdk#47788). However, this fix has not been backported to the dotnet 9 SDK yet.

The SDK team is working on backporting the fix (see discussion in dotnet/sdk#47788 (comment)). But in the meanwhile, we can add back the net9.0 TFM and continue to work around the race by disabling parallel build.

This would help users of the aieval tool sidestep errors such as the ones described in #6388 when they dont have dotnet8 installed on the machine.

We can remove this workaround, once the backported fix is available in the dotnet 9 SDK.

Fixes #6388

* Add some additional documentation around usage of cache, and CSP properties on report (#6377)

* Add documentation around proper usage of IDistributedCache

* Add Content-Security-Policy to prevent page from calling into other sites.

* Remove remark about IDistributedCache usage

* Fix package-lock.json

* Remove <remarks> start tag.

* Some API related fixes for the evaluation libraries (#6402)

* Rename IResultStore and IResponseCacheProvider

IResultStore -> IEvaluationResultStore and
IResponseCacheProvider -> IEvaluationResponseCacheProvider

* Include missing EvaluationContextConverter in AzureStorageJsonUtilities

Also use linked files to avoid the need to duplicate code.

* Reorder enum members

The new order goes from least desirable rating to most desirable.

* Refactor extension method overloads

Implement overloads that take ChatMessage by calling corresponding overloads that take ChatResponse.

* Refactor AddTurnDetails to support adding details for multiple turns

Adding single turns continues to be supported via a params array overload.

* Add missing parameter for timeToLiveForCacheEntries to DiskBasedReportingConfiguration

This was missed in an earlier PR that introduced the timeToLiveForCacheEntries on the constructor of DiskBasedResponseCacheProvider.

Also reorder constructor parameters for AzureStorageReportingConfiguration so that the parameters for caching apear next to each other and so that the parameter ordering is aligned with DiskBasedReportingConfiguration.

* Minor formatting changes

* Bump vite from 6.2.6 to 6.3.4 in /src/Libraries/Microsoft.Extensions.AI.Evaluation.Reporting/TypeScript (#6354)

* Bump vite

Bumps [vite](https://github.com/vitejs/vite/tree/HEAD/packages/vite) from 6.2.6 to 6.3.4.
- [Release notes](https://github.com/vitejs/vite/releases)
- [Changelog](https://github.com/vitejs/vite/blob/main/packages/vite/CHANGELOG.md)
- [Commits](https://github.com/vitejs/vite/commits/v6.3.4/packages/vite)

---
updated-dependencies:
- dependency-name: vite
  dependency-version: 6.3.4
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Shyam Namboodiripad <gnamboo@microsoft.com>

* Allow image rendering in evaluation report (#6407)

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Stephen Toub <stoub@microsoft.com>
Co-authored-by: Mackinnon Buck <mackinnon.buck@gmail.com>
Co-authored-by: Steve Sanderson <SteveSandersonMS@users.noreply.github.com>
Co-authored-by: Genevieve Warren <24882762+gewarren@users.noreply.github.com>
Co-authored-by: Eirik Tsarpalis <eirik.tsarpalis@gmail.com>
Co-authored-by: David Cantú <dacantu@microsoft.com>
Co-authored-by: Shyam N <shyamnamboodiripad@users.noreply.github.com>
Co-authored-by: Peter Waldschmidt <pewaldsc@microsoft.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Shyam Namboodiripad <gnamboo@microsoft.com>
@shyamnamboodiripad shyamnamboodiripad deleted the safetyfix branch May 14, 2025 12:49
@github-actions github-actions bot locked and limited conversation to collaborators Jun 14, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-ai-eval Microsoft.Extensions.AI.Evaluation and related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[AI Evaluation] [Regression] EvaluationResult.Get throws when trying to fetch metric with name ViolenceEvaluator.ViolenceMetricName.

2 participants