-
-
Notifications
You must be signed in to change notification settings - Fork 380
Description
Scope check
- This is core LLM communication (not application logic)
- This benefits most users (not just my use case)
- This can't be solved in application code with current RubyLLM
- I read the Contributing Guide
Due diligence
- I searched existing issues
- I checked the documentation
What problem does this solve?
I’d like to add support for streaming transcription events in RubyLLM for OpenAI transcription models (especially diarization models), so users can process transcript updates in real time while still receiving a final RubyLLM::Transcription result.
Today transcription is primarily handled as a final response (link). Ideally, we should be able to cut down some latency for translations using the streaming functionality.
Proposed solution
Allow RubyLLM.transcribe(...) to accept a block and stream normalized transcription events to the caller.
Proposed behavior:
- If no block is provided: existing behavior unchanged.
- If block is provided (OpenAI provider): stream SSE transcription events as
RubyLLM::TranscriptionChunkobjects. - Continue returning a final
RubyLLM::Transcriptionobject at the end.
Proposed API
transcription = RubyLLM.transcribe(
"meeting.wav",
model: "gpt-4o-transcribe-diarize",
provider: :openai
) do |chunk|
case chunk.type
when "transcript.text.delta"
print chunk.delta
when "transcript.text.segment"
puts "#{chunk.segment['speaker']}: #{chunk.segment['text']}"
when "transcript.text.done"
puts "[done]"
end
end
puts transcription.textWhy this belongs in RubyLLM
It belongs in RubyLLM because streaming transcription is core provider communication behavior, not app-specific logic, and enables reusable real-time workflows for all users across CLI, web, and backend jobs.