feat(openai): support for gpt-4o-transcribe-diarize #12408
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Background
Currently there is no reasonable way to use this model through the AI SDK.
Summary
diarize_jsonfor this model adhering to conventions established in fix(provider/openai): do not setresponse_formattoverbose_jsonif model isgpt-4o-transcribe#8246 (comment)TranscriptionDiarizedtype returned by OpenAI APIhttps://developers.openai.com/api/reference/resources/audio/subresources/transcriptions/methods/create
These two changes allow users to use the model in a meaningful way allowing for
diarize_jsontype responses.chunking_strategyso that the model can be used on audio longer than 30 seconds.This change allows users to use the model on audio that is greater than 30 seconds in length.
Manual Verification
I confirmed that on main code like
Will return a response_format error stemming from the AI SDK defaulting the response_format to verbose_json on this model which does not support that.
With this PR, it will pass the chunking_strategy and also return a fully diarized json in the response.
Checklist
pnpm changesetin the project root)Future Work
The examples can be a lot better if
bodyis added to TranscriptionModelResponseMetaDataIn general there is less parity between transcription models and providers than regular text generation models.
Related Issues
Fixes #11679
Related to #12409