Skip to content

[Bug]: LiteLLM Proxy Mapping openai-gpt-4o-mini-transcribe to openai/whisper-1 for Audio Transcription #11263

Open
@arnoulddw

Description

@arnoulddw

What happened?

TLDR: LiteLLM Proxy is incorrectly mapping the requested OpenAI audio transcription model openai-gpt-4o-mini-transcribe and openai-gpt-4o-transcribe to the older openai/whisper-1 model, preventing the use of the specified GPT-4o model despite it being a valid option in my LiteLLM config and according to OpenAI documentation.

I am using the LiteLLM proxy with the following config.yaml:

model_list:
  # ... other models ...
  - model_name: openai-whisper-1
    litellm_params:
      model: openai/whisper-1
      api_key: "..."
    model_info:
      mode: audio_transcription
  - model_name: openai-gpt-4o-mini-transcribe
    litellm_params:
      model: openai/gpt-4o-mini-transcribe
      api_key: "..."
    model_info:
      mode: audio_transcription
  - model_name: openai-gpt-4o-transcribe
    litellm_params:
      model: openai/gpt-4o-transcribe
      api_key: "..."
    model_info:
      mode: audio_transcription
# ... other models ...

According to the OpenAI documentation, gpt-4o-mini-transcribe and openai-gpt-4o-transcribe are valid models identifier for their audio transcription endpoint (/v1/audio/transcriptions).

However, when making an audio transcription request to the LiteLLM proxy using the model name openai-gpt-4o-mini-transcribe or openai-gpt-4o-transcribe, the LiteLLM logs indicate that the request is being routed to the openai/whisper-1 model instead.

LiteLLM Log Snippets Demonstrating the Issue:

Initial Request Log showing model_map_information:

{
  "model_name": "openai-gpt-4o-mini-transcribe",
  "litellm_params": {
    "use_in_pass_through": false,
    "use_litellm_proxy": false,
    "merge_reasoning_content_in_choices": false,
    "model": "openai/gpt-4o-mini-transcribe"
  },
  "model_info": {
    "id": "bc1c1175a643fd93d6011a841b1950d6b9fd71b91b648b6f39046a83d5b16f7d",
    "db_model": false,
    "mode": "audio_transcription",
    "key": "gpt-4o-mini-transcribe",
    "max_tokens": null,
    "max_input_tokens": 16000,
    "max_output_tokens": 2000,
    "input_cost_per_token": 0.00000125,
    "cache_creation_input_token_cost": null,
    "cache_read_input_token_cost": null,
    "input_cost_per_character": null,
    "input_cost_per_token_above_128k_tokens": null,
    "input_cost_per_token_above_200k_tokens": null,
    "input_cost_per_query": null,
    "input_cost_per_second": null,
    "input_cost_per_audio_token": 0.000003,
    "input_cost_per_token_batches": null,
    "output_cost_per_token_batches": null,
    "output_cost_per_token": 0.000005,
    "output_cost_per_audio_token": null,
    "output_cost_per_character": null,
    "output_cost_per_reasoning_token": null,
    "output_cost_per_token_above_128k_tokens": null,
    "output_cost_per_character_above_128k_tokens": null,
    "output_cost_per_token_above_200k_tokens": null,
    "output_cost_per_second": null,
    "output_cost_per_image": null,
    "output_vector_size": null,
    "litellm_provider": "openai",
    "supports_system_messages": null,
    "supports_response_schema": null,
    "supports_vision": false,
    "supports_function_calling": false,
    "supports_tool_choice": false,
    "supports_assistant_prefill": false,
    "supports_prompt_caching": false,
    "supports_audio_input": false,
    "supports_audio_output": false,
    "supports_pdf_input": false,
    "supports_embedding_image_input": false,
    "supports_native_streaming": null,
    "supports_web_search": false,
    "supports_reasoning": false,
    "supports_computer_use": false,
    "search_context_cost_per_query": null,
    "tpm": null,
    "rpm": null,
    "supported_openai_params": [
      "frequency_penalty",
      "logit_bias",
      "logprobs",
      "top_logprobs",
      "max_tokens",
      "max_completion_tokens",
      "modalities",
      "prediction",
      "n",
      "presence_penalty",
      "seed",
      "stop",
      "stream",
      "stream_options",
      "temperature",
      "top_p",
      "tools",
      "tool_choice",
      "function_call",
      "functions",
      "max_retries",
      "extra_headers",
      "parallel_tool_calls",
      "audio",
      "web_search_options",
      "response_format"
    ]
  },
  "provider": "openai",
  "input_cost": "1.25",
  "output_cost": "5.00",
  "litellm_model_name": "openai/gpt-4o-mini-transcribe",
  "max_tokens": null,
  "max_input_tokens": 16000,
  "cleanedLitellmParams": {
    "use_in_pass_through": false,
    "use_litellm_proxy": false,
    "merge_reasoning_content_in_choices": false
  }
}

Specific audio transcription request log showing the model mapping:

12:25:35 - LiteLLM Router:DEBUG: router.py:1975 - Inside _atranscription()- model: openai-gpt-4o-mini-transcribe; kwargs: {...}
...
"model_map_information": {
  "model_map_key": "whisper-1",
  "model_map_value": {
    "key": "whisper-1",
    // ... details about whisper-1 ...
    "litellm_provider": "openai",
    "mode": "audio_transcription",
    // ...
  }
},
...
INFO:     192.168.86.19:37846 - "POST /audio/transcriptions HTTP/1.1" 200 OK

Expected Behavior:

When requesting the model name openai-gpt-4o-mini-transcribe or openai-gpt-4o-transcribe in the LiteLLM proxy for an audio transcription task, the underlying OpenAI model used should be the one specified in the litellm_params.model field of the configuration and as supported by the OpenAI API.

Observed Behavior:

The LiteLLM logs indicate that the request is being mapped to the openai/whisper-1 model, even though openai/gpt-4o-mini-transcribe or openai-gpt-4o-transcribe is specified in the configuration and is a valid OpenAI transcription model.

Impact:

This prevents users from utilizing the newer, potentially higher-quality gpt-4o-mini-transcribe or openai-gpt-4o-transcribe models through the LiteLLM proxy and forces the use of the older whisper-1 model for requests directed to openai-gpt-4o-mini-transcribe or openai-gpt-4o-transcribe.

Steps to Reproduce:

  1. Configure LiteLLM proxy with the provided proxy_config.yaml, including the openai-gpt-4o-mini-transcribe and openai-gpt-4o-transcribemodel entries.
  2. Make an audio transcription request to the LiteLLM proxy specifying the model name openai-gpt-4o-mini-transcribe.
  3. Examine the LiteLLM logs (with verbose logging enabled) to observe the model_map_information and the underlying model used for the OpenAI API call.

Environment:

  • LiteLLM Version: 1.71.1

Possible Cause:

It appears LiteLLM's internal model mapping logic might be defaulting to whisper-1 for audio transcription tasks, potentially overriding the explicit model setting in the proxy_config.yaml for the newer GPT-4o transcription models.

Suggestion:

Investigate the internal mapping logic for OpenAI audio transcription models to ensure that gpt-4o-mini-transcribe and gpt-4o-transcribe are correctly recognized and routed when specified in the configuration.

Relevant log output

Are you a ML Ops Team?

No

What LiteLLM version are you on ?

v1.71.1

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions