Skip to content

feat: Add audio parameter support to gemini tts models #11287

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 31, 2025

Conversation

AyrennC
Copy link
Contributor

@AyrennC AyrennC commented May 31, 2025

Title

Add 'audio' params support to all gemini tts models

Relevant issues

Fixes #11250
Fixes #11118

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • I have added a screenshot of my new test passing locally
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem

image

Type

🆕 New Feature
🐛 Bug Fix
✅ Test

Changes

  • Add is_model_gemini_audio_model() method to detect TTS models
  • Include 'audio' parameter in supported params for TTS models
  • Map OpenAI audio parameter to Gemini speechConfig format
  • Add assistant message transformation for Gemini audio output

Copy link

vercel bot commented May 31, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
litellm ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 31, 2025 9:02pm

@AyrennC AyrennC changed the title feat: Add Gemini TTS audio parameter support feat: Add audio parameter support to gemini tts models May 31, 2025
@AyrennC AyrennC marked this pull request as draft May 31, 2025 12:45
- Add is_model_gemini_audio_model() method to detect TTS models
- Include 'audio' parameter in supported params for TTS models
- Map OpenAI audio parameter to Gemini speechConfig format
- Add _extract_audio_response_from_parts() method to transform audio
  output to openai format
@AyrennC AyrennC force-pushed the gemini-tts-audio branch from b801117 to eb0b111 Compare May 31, 2025 13:18
@AyrennC AyrennC marked this pull request as ready for review May 31, 2025 13:24
@AyrennC AyrennC marked this pull request as draft May 31, 2025 13:36
@AyrennC
Copy link
Contributor Author

AyrennC commented May 31, 2025

  • Squashed commits for a cleaner history
  • Tested gemini tts models locally to be working, gemini currently only support pcm16 audio format:
    image

@AyrennC
Copy link
Contributor Author

AyrennC commented May 31, 2025

LiteLLM Mock Tests timed out after 8 minutes, all test were successful until time out.

@AyrennC AyrennC marked this pull request as ready for review May 31, 2025 13:58
)

# Map OpenAI audio parameter to Gemini speech config
speech_config = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have these be typed dict's inside types/llms/vertex_ai.py - so any future updates are also tracked correctly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added typed dict for SpeechConfig and its child in types/llms/vertex_ai.py

Copy link
Contributor

@krrishdholakia krrishdholakia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

migrate test to test_litellm, and simplify tts model check

rest looks great. thank you for your work on this

- simplified gemini tts model detection
- moved gemini_tts test to test_litellm
@AyrennC
Copy link
Contributor Author

AyrennC commented May 31, 2025

migrate test to test_litellm, and simplify tts model check

rest looks great. thank you for your work on this

Thanks! I went ahead and made the suggested changes:

  • created typedict for speechconfig
  • simplified tts model detection
  • moved test to test_litellm
    image

@AyrennC AyrennC requested a review from krrishdholakia May 31, 2025 22:36
@krrishdholakia krrishdholakia merged commit 8ae7917 into BerriAI:main May 31, 2025
6 checks passed
@krrishdholakia
Copy link
Contributor

Thanks @AyrennC would you mind contributing docs for the change, so people know how to use this?

For VertexAI - here
For Google AI Studio - here

Contributing guide - https://docs.litellm.ai/docs/extras/contributing (although it's just an .md change, so i'm sure you can just do it on github as well)

@AyrennC AyrennC deleted the gemini-tts-audio branch June 1, 2025 05:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants