Skip to content

Error when transcribing an audio file #5380

Open
@pmarini-nc

Description

@pmarini-nc

LocalAI version:
2.29.0 Binary

Environment, CPU architecture, OS, and Version:
LXD Container, Ubuntu24.04, x86-64

Describe the bug
Audio transcription fails with "{"error":{"code":500,"message":"there is no uploaded file associated with the given key","type":""}}"

To Reproduce

root@localai:~# ls -lhrt /root/test/gb1.ogg 
-rw-r--r-- 1 root root 1.6M Oct  4  2013 /root/test/gb1.ogg
root@localai:~# ls -lhrt /etc/local-ai/models/
total 2.0G
-rw-r--r-- 1 root root 1.9G May 12 16:14 Falcon3-3B-Instruct-Q4_K_M.gguf
-rw------- 1 root root 1.2K May 12 16:14 falcon3-3b-instruct.yaml
-rw-r--r-- 1 root root 142M May 16 16:18 ggml-whisper-base.bin
-rw------- 1 root root   76 May 16 16:18 whisper-1.yaml
root@localai:~# cat /etc/local-ai/models/whisper-1.yaml 
backend: whisper
name: whisper-1
parameters:
  model: ggml-whisper-base.bin
root@localai:~# systemctl stop local-ai
root@localai:~# LOCALAI_MODELS_PATH=/etc/local-ai/models /opt/local-ai/2.28.0/local-ai 
4:26PM INF Setting logging to info
4:26PM INF Starting LocalAI using 14 threads, with models path: /etc/local-ai/models
4:26PM INF LocalAI version: v2.28.0 (56f44d448ca585642de4ca13f0521c3096268f1f)
4:26PM INF Preloading models from /etc/local-ai/models

  Model name: falcon3-3b-instruct                                             



  Model name: whisper-1                                                       


4:26PM INF core/startup process completed!
4:26PM INF LocalAI API is listening! Please connect to the endpoint for API documentation. endpoint=http://0.0.0.0:8080
4:27PM ERR Server error error="there is no uploaded file associated with the given key" ip=127.0.0.1 latency=1.376121ms method=POST status=500 url=/v1/audio/transcriptions

Where the last line is generated by running locally:

curl http://localhost:8080/v1/audio/transcriptions -H "Content-Type: multipart/form-data" -F file=/root/test/gb1.ogg -F model=whisper-1

Expected behavior
The audio is transcribed.

Logs

root@localai:~# DEBUG=true LOCALAI_MODELS_PATH=/etc/local-ai/models /opt/local-ai/2.28.0/local-ai 
4:31PM DBG Setting logging to debug
4:31PM INF Starting LocalAI using 14 threads, with models path: /etc/local-ai/models
4:31PM INF LocalAI version: v2.28.0 (56f44d448ca585642de4ca13f0521c3096268f1f)
4:31PM DBG CPU capabilities: [3dnowprefetch abm acpi adx aes aperfmperf apic arat arch_perfmon avx avx2 bmi1 bmi2 bts cat_l3 cdp_l3 clflush cmov constant_tsc cpuid cpuid_fault cqm cqm_llc cqm_mbm_local cqm_mbm_total cqm_occup_llc cx16 cx8 dca de ds_cpl dtes64 dtherm dts epb ept ept_ad erms est f16c flexpriority flush_l1d fma fpu fsgsbase fxsr hle ht ibpb ibrs ida intel_pt invpcid lahf_lm lm mca mce md_clear mmx monitor movbe msr mtrr nonstop_tsc nopl nx pae pat pbe pcid pclmulqdq pdcm pdpe1gb pebs pge pln pni popcnt pse pse36 pti pts rdrand rdseed rdt_a rdtscp rep_good rtm sdbg sep smap smep smx ss ssbd sse sse2 sse4_1 sse4_2 ssse3 stibp syscall tm tm2 tpr_shadow tsc tsc_adjust tsc_deadline_timer vme vmx vnmi vpid x2apic xsave xsaveopt xtopology xtpr]
4:31PM DBG GPU count: 1
4:31PM DBG GPU: card #0  [affined to NUMA node 0]@0000:0b:00.0 -> driver: 'mgag200' class: 'Display controller' vendor: 'Matrox Electronics Systems Ltd.' product: 'G200eR2'
4:31PM DBG guessDefaultsFromFile: template already set name=falcon3-3b-instruct
4:31PM INF Preloading models from /etc/local-ai/models

  Model name: falcon3-3b-instruct                                             



  Model name: whisper-1                                                       


4:31PM DBG Model: falcon3-3b-instruct (config: {PredictionOptions:{BasicModelRequest:{Model:Falcon3-3B-Instruct-Q4_K_M.gguf} Language: Translate:false N:0 TopP:0xc00147a768 TopK:0xc00147a770 Temperature:0xc00147a778 Maxtokens:0xc00147a7a8 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 RepeatLastN:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc00147a7a0 TypicalP:0xc00147a798 Seed:0xc00147a7c0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:falcon3-3b-instruct F16:0xc00147a5e0 Threads:0xc00147a748 Debug:0xc00147a7b8 Roles:map[] Embeddings:0xc00147a7b9 Backend: TemplateConfig:{Chat:{{.Input }}
<|im_start|>assistant
 ChatMessage:<|{{ .RoleName }}|>
{{ if .FunctionCall -}}
Function call:
{{ else if eq .RoleName "tool" -}}
Function response:
{{ end -}}
{{ if .Content -}}
{{.Content }}
{{ end -}}
{{ if .FunctionCall -}}
{{toJson .FunctionCall}}
{{ end -}}
{{ if eq .RoleName "assistant" }}<|endoftext|>{{ end }}
 Completion:{{.Input}}
 Edit: Functions:<|system|>
You are a function calling AI model. You are provided with functions to execute. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools:
{{range .Functions}}
{'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
{{end}}
For each function call return a json object with function name and arguments
{{.Input }}
<|im_start|>assistant
 UseTokenizerTemplate:false JoinChatMessagesByCharacter:<nil> Multimodal: JinjaTemplate:false ReplyPrefix:} KnownUsecaseStrings:[FLAG_CHAT FLAG_COMPLETION FLAG_ANY] KnownUsecases:<nil> PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: ResponseFormat: ResponseFormatMap:map[] FunctionsConfig:{DisableNoAction:false GrammarConfig:{ParallelCalls:false DisableParallelNewLines:false MixedMode:false NoMixedFreeString:false NoGrammar:false Prefix: ExpectStringsAfterJSON:false PropOrder: SchemaType: GrammarTriggers:[]} NoActionFunctionName: NoActionDescriptionName: ResponseRegex:[] JSONRegexMatch:[] ArgumentRegex:[] ArgumentRegexKey: ArgumentRegexValue: ReplaceFunctionResults:[] ReplaceLLMResult:[] CaptureLLMResult:[] FunctionNameKey: FunctionArgumentsKey:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc00147a790 MirostatTAU:0xc00147a788 Mirostat:0xc00147a780 NGPULayers:0xc00147a7b0 MMap:0xc00147a5e1 MMlock:0xc00147a7b9 LowVRAM:0xc00147a7b9 Grammar: StopWords:[<|endoftext|> <dummy32000> </s>] Cutstrings:[] ExtractRegex:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc00147a5d0 NUMA:false LoraAdapter: LoraBase: LoraAdapters:[] LoraScales:[] LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: LoadFormat: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 DisableLogStatus:false DType: LimitMMPerPrompt:{LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0} MMProj: FlashAttention:false NoKVOffloading:false CacheTypeK: CacheTypeV: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 CFGScale:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} TTSConfig:{Voice: AudioPath:} CUDA:false DownloadFiles:[] Description: Usage: Options:[]})
4:31PM DBG Model: whisper-1 (config: {PredictionOptions:{BasicModelRequest:{Model:ggml-whisper-base.bin} Language: Translate:false N:0 TopP:0xc000d468a0 TopK:0xc000d468a8 Temperature:0xc000d46900 Maxtokens:0xc000d46a20 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 RepeatLastN:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc000d469a8 TypicalP:0xc000d469a0 Seed:0xc000d46ab8 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:whisper-1 F16:0xc000d46868 Threads:0xc000d46860 Debug:0xc000d46ab0 Roles:map[] Embeddings:0xc000d46ab1 Backend:whisper TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions: UseTokenizerTemplate:false JoinChatMessagesByCharacter:<nil> Multimodal: JinjaTemplate:false ReplyPrefix:} KnownUsecaseStrings:[FLAG_TRANSCRIPT FLAG_ANY] KnownUsecases:<nil> PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: ResponseFormat: ResponseFormatMap:map[] FunctionsConfig:{DisableNoAction:false GrammarConfig:{ParallelCalls:false DisableParallelNewLines:false MixedMode:false NoMixedFreeString:false NoGrammar:false Prefix: ExpectStringsAfterJSON:false PropOrder: SchemaType: GrammarTriggers:[]} NoActionFunctionName: NoActionDescriptionName: ResponseRegex:[] JSONRegexMatch:[] ArgumentRegex:[] ArgumentRegexKey: ArgumentRegexValue: ReplaceFunctionResults:[] ReplaceLLMResult:[] CaptureLLMResult:[] FunctionNameKey: FunctionArgumentsKey:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc000d46958 MirostatTAU:0xc000d46950 Mirostat:0xc000d46908 NGPULayers:0xc000d46a28 MMap:0xc000d46ab0 MMlock:0xc000d46ab1 LowVRAM:0xc000d46ab1 Grammar: StopWords:[] Cutstrings:[] ExtractRegex:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc000d46af0 NUMA:false LoraAdapter: LoraBase: LoraAdapters:[] LoraScales:[] LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: LoadFormat: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 DisableLogStatus:false DType: LimitMMPerPrompt:{LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0} MMProj: FlashAttention:false NoKVOffloading:false CacheTypeK: CacheTypeV: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 CFGScale:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} TTSConfig:{Voice: AudioPath:} CUDA:false DownloadFiles:[] Description: Usage: Options:[]})

4:31PM DBG Extracting backend assets files to /tmp/localai/backend_data
4:31PM DBG processing api keys runtime update
4:31PM DBG processing external_backends.json
4:31PM DBG external backends loaded from external_backends.json
4:31PM INF core/startup process completed!
4:31PM DBG No configuration file found at /tmp/localai/upload/uploadedFiles.json
4:31PM DBG No configuration file found at /tmp/localai/config/assistants.json
4:31PM DBG No configuration file found at /tmp/localai/config/assistantsFile.json
4:31PM INF LocalAI API is listening! Please connect to the endpoint for API documentation. endpoint=http://0.0.0.0:8080
4:31PM DBG context local model name not found, setting to the first model first model name=whisper-1
4:31PM ERR Server error error="there is no uploaded file associated with the given key" ip=127.0.0.1 latency=1.400659ms method=POST status=500 url=/v1/audio/transcriptions

Additional context
This error appears when following the tutorial https://localai.io/features/audio-to-text/

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions