Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce AzureOpenAI transcription support #902

Closed

Conversation

piotrooo
Copy link
Contributor

@piotrooo piotrooo commented Jun 20, 2024

Motivation

Right now, Azure (Microsoft) is the biggest shareholder of OpenAI, and it puts a lot of effort into providing OpenAI
services. I think it is reasonable to have as many supported models as possible.

Description

Caution

This PR introduces breaking changes.
Classes from the org.springframework.ai.openai.metadata.audio.transcription package have been moved to the org.springframework.ai.audio.transcription package.

The repeated part was moved to the core package.

The AzureOpenAiAudioTranscriptionModel has been added to the auto-configuration with the following properties.
The spring.ai.azure.openai.audio.transcription prefix was introduced for properties. It also introduces options properties which cover all of them (see: AzureOpenAiAudioTranscriptionOptions).

The Azure SDK has been bumped to version 1.0.0-beta.9. This upgrade introduced a change in the JSON field - the prompt_annotations field was changed to prompt_filter_results (ref: Azure/azure-rest-api-specs#25880).
Another significant change in the SDK was the replacement of jackson-databind with azure-json (ref: Azure/azure-sdk-for-java#39825). As a result, in the AzureOpenAiEmbeddingModel class, we cannot use ModelOptionsUtils as it is based on the @JsonProperty annotation. I replaced this feature with manual assignment.

TODO

There are a couple of things to do. I don't want to do them right now for the sake of acceptance. But for sure, it should be done after the PR is merged.

  • Add docs in the Audio Model API ➡ Transcription API ➡ Azure OpenAI section
  • Add AudioTranscriptionMetadata for both OpenAI and Azure OpenAI with information from the VERBOSE_JSON response format (@tzolov what do you think?)

@tzolov
Copy link
Contributor

tzolov commented Jun 21, 2024

Hi @piotrooo ,
Good job so far. Moving the transcription package to the core is good step as well.

FYI, the other day I've tried to migrate to the 1.0.0-beta.9. Fixed some compilation and other issues but got stuck with strange test failures (structured output stopped working) so left it for when i have time. If interested here is the branch: https://github.com/tzolov/spring-ai/tree/update_azure_openai_client_version

@piotrooo
Copy link
Contributor Author

Sure @tzolov 👍

I'll look at this on Monday. Could you give me a hint about the falling tests?

@tzolov
Copy link
Contributor

tzolov commented Jun 24, 2024

@piotrooo, seem like the structured output converters (e.g. AzureOpenAiChatModelIT's listOutputConverter, mapOutputConverter, beanOutputConverter) are not working after the upgrade.

@piotrooo
Copy link
Contributor Author

piotrooo commented Jun 24, 2024

@tzolov, it seems to be a bug in the newly introduced JSON serializer in the Azure SDK. Here is a referenced PR with a fix.

It looks like a problem with content serialization and deserialization in Azure. I made a test:

  • At the Azure OpenAI Chat playground:
    Screenshot from 2024-06-24 19-01-54

  • Using cURL:

19:03 $ curl "https://<url>.openai.azure.com/openai/deployments/gpt-4o/chat/completions?api-version=2024-05-01-preview"   -H "Content-Type: application/json"   -H "api-key: <key>"   -d '{ 
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Generate the filmography of 5 movies for Tom Hanks.\nYour response should be in JSON format.\nDo not include any explanations, only provide a RFC8259 compliant JSON response following this format without deviation.\nDo not include markdown code blocks in your response.\nRemove the ```json markdown from the output.\nHere is the JSON Schema instance your output must adhere to:\n```{\n  \"$schema\" : \"https://json-schema.org/draft/2020-12/schema\",\n  \"type\" : \"object\",\n  \"properties\" : {\n    \"actor\" : {\n      \"type\" : \"string\"\n    },\n    \"movies\" : {\n      \"type\" : \"array\",\n      \"items\" : {\n        \"type\" : \"string\"\n      }\n    }\n  }\n}```\n\n"
        }
      ]
    }
  ],
  "max_tokens": 200,
  "stream": false,
  "model": "gpt-4o"
}'  | jq .

Response:

{
  "choices": [
    {
      "content_filter_results": {
        "hate": {
          "filtered": false,
          "severity": "safe"
        },
        "self_harm": {
          "filtered": false,
          "severity": "safe"
        },
        "sexual": {
          "filtered": false,
          "severity": "safe"
        },
        "violence": {
          "filtered": false,
          "severity": "safe"
        }
      },
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "{\n  \"actor\": \"Tom Hanks\",\n  \"movies\": [\n    \"Forrest Gump\",\n    \"Saving Private Ryan\",\n    \"Cast Away\",\n    \"The Green Mile\",\n    \"Toy Story\"\n  ]\n}",
        "role": "assistant"
      }
    }
  ],
  "created": 1719248590,
  "id": "chatcmpl-9dhNuNcvUDiOpEgc0fuObQJPzDoMx",
  "model": "gpt-4o-2024-05-13",
  "object": "chat.completion",
  "prompt_filter_results": [
    {
      "prompt_index": 0,
      "content_filter_results": {
        "hate": {
          "filtered": false,
          "severity": "safe"
        },
        "self_harm": {
          "filtered": false,
          "severity": "safe"
        },
        "sexual": {
          "filtered": false,
          "severity": "safe"
        },
        "violence": {
          "filtered": false,
          "severity": "safe"
        }
      }
    }
  ],
  "system_fingerprint": "fp_abc28019ad",
  "usage": {
    "completion_tokens": 47,
    "prompt_tokens": 169,
    "total_tokens": 216
  }
}

Everything works correctly. Unfortunately, we need to wait for the fix.

@piotrooo piotrooo force-pushed the introduce-azure-transcription branch from 90a6204 to dc0eb97 Compare July 4, 2024 15:49
@piotrooo piotrooo force-pushed the introduce-azure-transcription branch from dc0eb97 to d479c36 Compare July 16, 2024 13:19
@piotrooo
Copy link
Contributor Author

@tzolov, it seems thet everything works as expected. I've checked it in our Azure subscription on our internal models. Could you confirm it is working as expected?

@tzolov
Copy link
Contributor

tzolov commented Jul 16, 2024

@piotrooo thanks for the update.
It looks like 1.0.0.beta10 has another issue with JSON handing during streaming. I'm considering rolling it back to beta8.
Will this affect your PR?

@piotrooo
Copy link
Contributor Author

I'm considering rolling it back to beta8.
Will this affect your PR?

Yes, for sure. beta8 doesn't have support for the AudioTranscriptionTimestampGranularity functionality as far as I remember.

@tzolov tzolov self-assigned this Jul 17, 2024
@tzolov tzolov added this to the 1.0.0-M2 milestone Jul 17, 2024
@tzolov
Copy link
Contributor

tzolov commented Jul 17, 2024

Hi @piotrooo ,
Can you please add the related documentation?
I guess it should be under the https://docs.spring.io/spring-ai/reference/api/audio/transcriptions.html section.

The Antora docs source is under: https://github.com/spring-projects/spring-ai/tree/main/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/transcriptions
and you can use the https://github.com/spring-projects/spring-ai/blob/main/spring-ai-docs/src/main/antora/modules/ROOT/pages/api/transcriptions/openai-transcriptions.adoc as an example.

Finally you can add the page to the catalog in front of the openai transcription:

**** xref:api/audio/transcriptions/openai-transcriptions.adoc[OpenAI]

@piotrooo piotrooo force-pushed the introduce-azure-transcription branch from 360a779 to 4400048 Compare July 19, 2024 14:23
@piotrooo
Copy link
Contributor Author

@tzolov I've just added docs.

@tzolov
Copy link
Contributor

tzolov commented Jul 20, 2024

Thanks @piotrooo ,
After rebasing I can see compilation error likely due to the recent improvements in the response metadata : #1070
Let me know if you have time to resolve. Hopefully it will be a trivial change.

@piotrooo piotrooo force-pushed the introduce-azure-transcription branch from 44c2ca6 to 713657f Compare July 21, 2024 07:20
@piotrooo
Copy link
Contributor Author

@tzolov I think I did everything. The code compiles for me.

@tzolov
Copy link
Contributor

tzolov commented Jul 21, 2024

@piotrooo
Copy link
Contributor Author

piotrooo commented Jul 22, 2024

@tzolov I've added the missing test you mentioned. I've also tested it on our internal models.

AzureOpenAiAutoConfigurationIT.transcribe()

Screenshot from 2024-07-22 08-30-19

AzureOpenAiAudioTranscriptionModelIT

Screenshot from 2024-07-22 08-29-04

@tzolov
Copy link
Contributor

tzolov commented Jul 22, 2024

Thank you @piotrooo ! Great stuff
Thanks for contributing the azure transcription and for improving the API consistency!
Looking forward for the next contribution ;)

@tzolov
Copy link
Contributor

tzolov commented Jul 22, 2024

Small test adjustments, rebased, squashed and merged at 0e97f9c

@tzolov tzolov closed this Jul 22, 2024
@piotrooo piotrooo deleted the introduce-azure-transcription branch July 22, 2024 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants