Skip to content

[FEATURE] Streaming support for RubyLLM.transcribe #628

@patvice

Description

@patvice

Scope check

  • This is core LLM communication (not application logic)
  • This benefits most users (not just my use case)
  • This can't be solved in application code with current RubyLLM
  • I read the Contributing Guide

Due diligence

  • I searched existing issues
  • I checked the documentation

What problem does this solve?

I’d like to add support for streaming transcription events in RubyLLM for OpenAI transcription models (especially diarization models), so users can process transcript updates in real time while still receiving a final RubyLLM::Transcription result.

Today transcription is primarily handled as a final response (link). Ideally, we should be able to cut down some latency for translations using the streaming functionality.

Proposed solution

Allow RubyLLM.transcribe(...) to accept a block and stream normalized transcription events to the caller.

Proposed behavior:

  • If no block is provided: existing behavior unchanged.
  • If block is provided (OpenAI provider): stream SSE transcription events as RubyLLM::TranscriptionChunk objects.
  • Continue returning a final RubyLLM::Transcription object at the end.

Proposed API

transcription = RubyLLM.transcribe(
  "meeting.wav",
  model: "gpt-4o-transcribe-diarize",
  provider: :openai
) do |chunk|
  case chunk.type
  when "transcript.text.delta"
    print chunk.delta
  when "transcript.text.segment"
    puts "#{chunk.segment['speaker']}: #{chunk.segment['text']}"
  when "transcript.text.done"
    puts "[done]"
  end
end

puts transcription.text

Why this belongs in RubyLLM

It belongs in RubyLLM because streaming transcription is core provider communication behavior, not app-specific logic, and enables reusable real-time workflows for all users across CLI, web, and backend jobs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions