Skip to content

Vertex AI Gemini generateContent w/ audio input doesn't work #14745

@BoyuanLong

Description

@BoyuanLong

Description

Hi,

I'm using the Vertex AI generateContent API to call Gemini (2.0 flash/flash-lite etc) with audio input, and sometimes it doesn't work. It throws internalError that is hard to fix with application code.

My code is something like this:

    var parts: [PartsRepresentable] = []
    parts.append(InlineDataPart(audio, "audio/mp3": mimeType))
    ...
    try await model.generateContent(prompt, parts)

And it will throw something like this most of the time:

internalError(underlying: Swift.DecodingError.keyNotFound(CodingKeys(stringValue: "tokenCount", intValue: nil), Swift.DecodingError.Context(codingPath: [CodingKeys(stringValue: "usageMetadata", intValue: nil), CodingKeys(stringValue: "promptTokensDetails", intValue: nil), _CodingKey(stringValue: "Index 0", intValue: 0)], debugDescription: "No value associated with key CodingKeys(stringValue: \"tokenCount\", intValue: nil) (\"tokenCount\").", underlyingError: nil)))

I believe there's some issue in how the server counts audio tokens, and this field will be missing some of the time. So the tokenCount in audio modality will be missing, and thus the error.

Thank you for taking a look, and please let me know if there's anything we could do as a short term fix :)

Reproducing the issue

No response

Firebase SDK Version

11.11

Xcode Version

16.3

Installation Method

Swift Package Manager

Firebase Product(s)

VertexAI

Targeted Platforms

iOS

Relevant Log Output

11.11.0 - [FirebaseVertexAI][I-VTX003000] JSON response: {
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "{\n  \"has_animal\": true,\n  \"thoughts\": \"Dog: 'Ugh, more human legs in my face. A dog deserves a better view. 🙄'\"\n}"
          }
        ]
      },
      "finishReason": "STOP",
      "avgLogprobs": -0.56113096383901739
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 2997,
    "candidatesTokenCount": 39,
    "totalTokenCount": 3036,
    "trafficType": "ON_DEMAND",
    "promptTokensDetails": [
      {
        "modality": "AUDIO"
      },
      {
        "modality": "TEXT",
        "tokenCount": 675
      },
      {
        "modality": "IMAGE",
        "tokenCount": 2322
      }
    ],
    "candidatesTokensDetails": [
      {
        "modality": "TEXT",
        "tokenCount": 39
      }
    ]
  },
  "modelVersion": "gemini-2.0-flash",
  "createTime": "2025-04-21T05:23:14.199357Z",
  "responseId": "QtYFaL2VDImThMIPgdDpqAQ"
}

If using Swift Package Manager, the project's Package.resolved

Expand Package.resolved snippet
Replace this line with the contents of your Package.resolved.

If using CocoaPods, the project's Podfile.lock

Expand Podfile.lock snippet
Replace this line with the contents of your Podfile.lock!

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions