Introduce AzureOpenAI transcription support #902

piotrooo · 2024-06-20T05:47:04Z

Motivation

Right now, Azure (Microsoft) is the biggest shareholder of OpenAI, and it puts a lot of effort into providing OpenAI
services. I think it is reasonable to have as many supported models as possible.

Description

Caution

This PR introduces breaking changes.
Classes from the org.springframework.ai.openai.metadata.audio.transcription package have been moved to the org.springframework.ai.audio.transcription package.

The repeated part was moved to the core package.

The AzureOpenAiAudioTranscriptionModel has been added to the auto-configuration with the following properties.
The spring.ai.azure.openai.audio.transcription prefix was introduced for properties. It also introduces options properties which cover all of them (see: AzureOpenAiAudioTranscriptionOptions).

The Azure SDK has been bumped to version 1.0.0-beta.9. This upgrade introduced a change in the JSON field - the prompt_annotations field was changed to prompt_filter_results (ref: Azure/azure-rest-api-specs#25880).
Another significant change in the SDK was the replacement of jackson-databind with azure-json (ref: Azure/azure-sdk-for-java#39825). As a result, in the AzureOpenAiEmbeddingModel class, we cannot use ModelOptionsUtils as it is based on the @JsonProperty annotation. I replaced this feature with manual assignment.

TODO

There are a couple of things to do. I don't want to do them right now for the sake of acceptance. But for sure, it should be done after the PR is merged.

Add docs in the Audio Model API ➡ Transcription API ➡ Azure OpenAI section
Add AudioTranscriptionMetadata for both OpenAI and Azure OpenAI with information from the VERBOSE_JSON response format (@tzolov what do you think?)

tzolov · 2024-06-21T17:28:48Z

Hi @piotrooo ,
Good job so far. Moving the transcription package to the core is good step as well.

FYI, the other day I've tried to migrate to the 1.0.0-beta.9. Fixed some compilation and other issues but got stuck with strange test failures (structured output stopped working) so left it for when i have time. If interested here is the branch: https://github.com/tzolov/spring-ai/tree/update_azure_openai_client_version

piotrooo · 2024-06-22T05:29:04Z

Sure @tzolov 👍

I'll look at this on Monday. Could you give me a hint about the falling tests?

tzolov · 2024-06-24T09:08:41Z

@piotrooo, seem like the structured output converters (e.g. AzureOpenAiChatModelIT's listOutputConverter, mapOutputConverter, beanOutputConverter) are not working after the upgrade.

piotrooo · 2024-06-24T17:08:00Z

@tzolov, it seems to be a bug in the newly introduced JSON serializer in the Azure SDK. Here is a referenced PR with a fix.

It looks like a problem with content serialization and deserialization in Azure. I made a test:

At the Azure OpenAI Chat playground:
Using cURL:

19:03 $ curl "https://<url>.openai.azure.com/openai/deployments/gpt-4o/chat/completions?api-version=2024-05-01-preview"   -H "Content-Type: application/json"   -H "api-key: <key>"   -d '{ 
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Generate the filmography of 5 movies for Tom Hanks.\nYour response should be in JSON format.\nDo not include any explanations, only provide a RFC8259 compliant JSON response following this format without deviation.\nDo not include markdown code blocks in your response.\nRemove the ```json markdown from the output.\nHere is the JSON Schema instance your output must adhere to:\n```{\n  \"$schema\" : \"https://json-schema.org/draft/2020-12/schema\",\n  \"type\" : \"object\",\n  \"properties\" : {\n    \"actor\" : {\n      \"type\" : \"string\"\n    },\n    \"movies\" : {\n      \"type\" : \"array\",\n      \"items\" : {\n        \"type\" : \"string\"\n      }\n    }\n  }\n}```\n\n"
        }
      ]
    }
  ],
  "max_tokens": 200,
  "stream": false,
  "model": "gpt-4o"
}'  | jq .

Response:

{
  "choices": [
    {
      "content_filter_results": {
        "hate": {
          "filtered": false,
          "severity": "safe"
        },
        "self_harm": {
          "filtered": false,
          "severity": "safe"
        },
        "sexual": {
          "filtered": false,
          "severity": "safe"
        },
        "violence": {
          "filtered": false,
          "severity": "safe"
        }
      },
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "{\n  \"actor\": \"Tom Hanks\",\n  \"movies\": [\n    \"Forrest Gump\",\n    \"Saving Private Ryan\",\n    \"Cast Away\",\n    \"The Green Mile\",\n    \"Toy Story\"\n  ]\n}",
        "role": "assistant"
      }
    }
  ],
  "created": 1719248590,
  "id": "chatcmpl-9dhNuNcvUDiOpEgc0fuObQJPzDoMx",
  "model": "gpt-4o-2024-05-13",
  "object": "chat.completion",
  "prompt_filter_results": [
    {
      "prompt_index": 0,
      "content_filter_results": {
        "hate": {
          "filtered": false,
          "severity": "safe"
        },
        "self_harm": {
          "filtered": false,
          "severity": "safe"
        },
        "sexual": {
          "filtered": false,
          "severity": "safe"
        },
        "violence": {
          "filtered": false,
          "severity": "safe"
        }
      }
    }
  ],
  "system_fingerprint": "fp_abc28019ad",
  "usage": {
    "completion_tokens": 47,
    "prompt_tokens": 169,
    "total_tokens": 216
  }
}

Everything works correctly. Unfortunately, we need to wait for the fix.

piotrooo · 2024-07-16T13:41:28Z

@tzolov, it seems thet everything works as expected. I've checked it in our Azure subscription on our internal models. Could you confirm it is working as expected?

tzolov · 2024-07-16T19:54:33Z

@piotrooo thanks for the update.
It looks like 1.0.0.beta10 has another issue with JSON handing during streaming. I'm considering rolling it back to beta8.
Will this affect your PR?

piotrooo · 2024-07-17T06:00:02Z

I'm considering rolling it back to beta8.
Will this affect your PR?

Yes, for sure. beta8 doesn't have support for the AudioTranscriptionTimestampGranularity functionality as far as I remember.

tzolov · 2024-07-17T10:09:05Z

Hi @piotrooo ,
Can you please add the related documentation?
I guess it should be under the https://docs.spring.io/spring-ai/reference/api/audio/transcriptions.html section.

The Antora docs source is under: https://github.com/spring-projects/spring-ai/tree/main/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/transcriptions
and you can use the https://github.com/spring-projects/spring-ai/blob/main/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/transcriptions/openai-transcriptions.adoc as an example.

Finally you can add the page to the catalog in front of the openai transcription:

spring-ai/spring-ai-docs/src/main/antora/modules/ROOT/nav.adoc

Line 63 in 4aacab0

**** xref:api/audio/transcriptions/openai-transcriptions.adoc[OpenAI]

piotrooo · 2024-07-19T15:04:46Z

@tzolov I've just added docs.

tzolov · 2024-07-20T12:09:37Z

Thanks @piotrooo ,
After rebasing I can see compilation error likely due to the recent improvements in the response metadata : #1070
Let me know if you have time to resolve. Hopefully it will be a trivial change.

piotrooo · 2024-07-21T07:34:01Z

@tzolov I think I did everything. The code compiles for me.

tzolov · 2024-07-21T10:57:15Z

Thank you @piotrooo

I could not find integration tests for the azure transcription service, so i've tired to add such in my merge branch: https://github.com/tzolov/spring-ai/tree/gh-902-pr

But they both ITs fail. Could you have a look please

piotrooo · 2024-07-22T06:31:50Z

@tzolov I've added the missing test you mentioned. I've also tested it on our internal models.

AzureOpenAiAutoConfigurationIT.transcribe()

AzureOpenAiAudioTranscriptionModelIT

tzolov · 2024-07-22T12:16:18Z

Thank you @piotrooo ! Great stuff
Thanks for contributing the azure transcription and for improving the API consistency!
Looking forward for the next contribution ;)

tzolov · 2024-07-22T12:16:38Z

Small test adjustments, rebased, squashed and merged at 0e97f9c

markpollack added the model client label Jun 20, 2024

piotrooo force-pushed the introduce-azure-transcription branch from 90a6204 to dc0eb97 Compare July 4, 2024 15:49

piotrooo force-pushed the introduce-azure-transcription branch from dc0eb97 to d479c36 Compare July 16, 2024 13:19

tzolov self-assigned this Jul 17, 2024

tzolov added this to the 1.0.0-M2 milestone Jul 17, 2024

piotrooo force-pushed the introduce-azure-transcription branch from 360a779 to 4400048 Compare July 19, 2024 14:23

piotrooo added 5 commits July 21, 2024 09:20

Introduce AzureOpenAI transcription support

4c95893

Bump Azure SDK to 1.0.0-beta.10

83923be

Formatting

3bbe9fe

Fix missing MutableResponseMetadata

e97faf3

Add docs

713657f

piotrooo force-pushed the introduce-azure-transcription branch from 44c2ca6 to 713657f Compare July 21, 2024 07:20

Adjust code to updated ResponseMetadata design

2b7218f

piotrooo added 3 commits July 22, 2024 07:45

Add test to AzureOpenAiAutoConfiguration

053aaf8

Add missing AzureOpenAiAudioTranscriptionModel tests

2b1edbd

Cleanup formatting

e721dfd

tzolov closed this Jul 22, 2024

piotrooo deleted the introduce-azure-transcription branch July 22, 2024 13:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce AzureOpenAI transcription support #902

Introduce AzureOpenAI transcription support #902

piotrooo commented Jun 20, 2024 •

edited

Loading

tzolov commented Jun 21, 2024

piotrooo commented Jun 22, 2024

tzolov commented Jun 24, 2024

piotrooo commented Jun 24, 2024 •

edited

Loading

piotrooo commented Jul 16, 2024

tzolov commented Jul 16, 2024

piotrooo commented Jul 17, 2024

tzolov commented Jul 17, 2024

piotrooo commented Jul 19, 2024

tzolov commented Jul 20, 2024

piotrooo commented Jul 21, 2024

tzolov commented Jul 21, 2024

piotrooo commented Jul 22, 2024 •

edited

Loading

tzolov commented Jul 22, 2024

tzolov commented Jul 22, 2024

Introduce AzureOpenAI transcription support #902

Introduce AzureOpenAI transcription support #902

Conversation

piotrooo commented Jun 20, 2024 • edited Loading

Motivation

Description

TODO

tzolov commented Jun 21, 2024

piotrooo commented Jun 22, 2024

tzolov commented Jun 24, 2024

piotrooo commented Jun 24, 2024 • edited Loading

piotrooo commented Jul 16, 2024

tzolov commented Jul 16, 2024

piotrooo commented Jul 17, 2024

tzolov commented Jul 17, 2024

piotrooo commented Jul 19, 2024

tzolov commented Jul 20, 2024

piotrooo commented Jul 21, 2024

tzolov commented Jul 21, 2024

piotrooo commented Jul 22, 2024 • edited Loading

tzolov commented Jul 22, 2024

tzolov commented Jul 22, 2024

piotrooo commented Jun 20, 2024 •

edited

Loading

piotrooo commented Jun 24, 2024 •

edited

Loading

piotrooo commented Jul 22, 2024 •

edited

Loading