Skip to content

Conversation

@psinha40898
Copy link
Contributor

@psinha40898 psinha40898 commented Feb 10, 2026

Background

Currently there is no reasonable way to use this model through the AI SDK.

Summary

  1. Set providerOptions default responseFormat to diarize_json for this model adhering to conventions established in fix(provider/openai): do not set response_format to verbose_json if model is gpt-4o-transcribe #8246 (comment)
  2. Enhance zod validation to respect TranscriptionDiarized type returned by OpenAI API
    https://developers.openai.com/api/reference/resources/audio/subresources/transcriptions/methods/create

These two changes allow users to use the model in a meaningful way allowing for diarize_json type responses.

  1. Allow users to pass chunking_strategy so that the model can be used on audio longer than 30 seconds.

This change allows users to use the model on audio that is greater than 30 seconds in length.

Manual Verification

I confirmed that on main code like

const transcript = await transcribe({
  model: openai.transcription('gpt-4o-transcribe-diarize'),
  audio: await readFile('audio.mp3'),
  providerOptions: {
        openai: {
          chunking_strategy: "auto",
        },
      };
});

Will return a response_format error stemming from the AI SDK defaulting the response_format to verbose_json on this model which does not support that.

With this PR, it will pass the chunking_strategy and also return a fully diarized json in the response.

console.log("Full Diarized Response:", transcript.responses); // this is now useful for this model

Checklist

  • Tests have been added / updated (for bug fixes / features)
  • Documentation has been added / updated (for bug fixes / features)
  • A patch changeset for relevant packages has been added (for bug fixes / features - run pnpm changeset in the project root)
  • I have reviewed this pull request (self-review)

Future Work

The examples can be a lot better if body is added to TranscriptionModelResponseMetaData
In general there is less parity between transcription models and providers than regular text generation models.

Related Issues

Fixes #11679

Related to #12409

@psinha40898 psinha40898 marked this pull request as ready for review February 10, 2026 21:36
@psinha40898 psinha40898 changed the title feat(openai): support for gpt-4o-transcribe feat(openai): support for gpt-4o-transcribe-diarize Feb 10, 2026
Copy link
Contributor

@vercel vercel bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Suggestion:

The response_format is only appended to form data when providerOptions.openai is provided, so transcription requests without provider options send no response_format to the API.

Fix on Vercel

@psinha40898
Copy link
Contributor Author

Additional Suggestion:

The response_format is only appended to form data when providerOptions.openai is provided, so transcription requests without provider options send no response_format to the API.

Fix on Vercel

This is because the scope of this PR is not to change the opinions established in the codebase but instead extend those opinions to improve DX and increase feature coverage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenAI gpt-4o-transcribe-diarize model not accessible

1 participant