Skip to content

Use the moq libraries instead of hand-rolling the wire codec#4557

Draft
kixelated wants to merge 2 commits into
pipecat-ai:vp-moq-vibefrom
kixelated:vp-moq-vibe
Draft

Use the moq libraries instead of hand-rolling the wire codec#4557
kixelated wants to merge 2 commits into
pipecat-ai:vp-moq-vibefrom
kixelated:vp-moq-vibe

Conversation

@kixelated
Copy link
Copy Markdown

A pass on top of the POC swapping the hand-rolled wire codec on both sides for the upstream moq-rs (Python) and @moq/publish (browser) libraries. Net ~−1,500 lines vs. the POC.

Benefits

  • Opus-encoded audio on the wire. No more raw 16-bit PCM streaming. Bandwidth drops from ~384 kbps (24 kHz mono PCM) to ~32 kbps for the bot's audio, ~−12×. libopus also handles packet loss concealment, so dropouts sound like brief artifacts instead of clicks.

  • Battle-tested Rust QUIC stack. moq-rs wraps moq-native, which gets built-in caching, fan-out, prioritization, congestion control, and WebTransport ALPN negotiation. We were re-implementing slivers of all of these in Python on top of aioquic. Gone.

  • max_latency_ms SLA on the subscriber. Default bumped to 500 ms (was 100 in the previous draft). Under healthy networks frames flow with no buffering; under congestion the consumer waits up to 500 ms for late groups to fill, then skips ahead. This is the structural win over WebRTC — there's no "choose audio loss vs. stall," you set a deadline and the library obeys it. Tunable via MOQParams.audio_in_max_latency_ms.

  • No moq-relay process for local dev. New --moq-serve flag binds the bot to a local UDP socket via moq.Server. The bot is the relay. scripts/moq-dev-setup.sh is deleted. One terminal, not two.

  • Self-signed certs minted in-process. tls_generate=[\"localhost\"] produces a cert on startup; server.cert_fingerprints() exposes the SHA-256, the runner threads it back through /start, and /api/config hands it to the browser for WebTransport pinning. No PEM files, no symlinks across repos.

  • @moq/publish does browser audio capture. Replaces ~250 lines of AudioWorklet + Int16 PCM + framing with Publish.Broadcast({ audio: { source: mic.source } }). We get Opus encoding via WebCodecs, catalog publishing, and the subscribe-fulfilment loop for free. AudioWorklet code now lives entirely upstream.

  • WebSocket fallback for free. @moq/net races WebTransport and WebSocket — Firefox (no WebTransport yet) just works without the demo knowing.

  • Catalog-driven track discovery. The bot writes a catalog from publish_audio's AudioEncoderInput; the browser reads it from subscribe(\"catalog.json\") and uses whatever rate/channel count it advertises. Track names aren't pinned in the config — adding a video track or a screen-share later is a one-liner.

  • Better resampling. moq-rs uses rubato inside its publish_audio / subscribe_audio so you can hand it any sample rate and it does sinc-interpolation to/from libopus's native rates. The earlier draft used a hand-rolled 4-point linear interpolator that was fine for speech and dubious for anything else.

  • Single dependency. pyproject.toml [moq] extra is now just moq-rs>=0.2.13 and cryptography. aioquic and opuslib are both gone; libopus is statically bundled into the moq-rs wheel so there's no brew install opus / apt-get install libopus0.

  • Clean separation of concerns. transport.py is down to ~700 lines — basically a thin glue layer between the pipecat Frame pipeline and the moq library. No varint codec, no stream-state machine, no codec helpers.

What's still hand-rolled

The browser-side playback loop — @moq/hang's Container.Consumer + WebCodecs AudioDecoder + Web Audio scheduling, about 60 lines. Could swap for @moq/watch.Audio.Decoder later, but that pulls in Sync/Source/jitter-buffer machinery that's overkill for a single-track playback path.

Things to know before merging

  1. The pipecat Client SDK transport plugin (@pipecat-ai/moq-transport) is still a future item, per moq_prebuilt/PLAN-moq-transport-package.md. The browser demo here is still ad-hoc.

  2. End-to-end testing requires moq-rs 0.2.13 installed (auto via uv sync --extra moq). I haven't run the live mic-to-speaker round-trip in this branch — the smoke test is uv run python examples/transports/transports-moq.py -t moq --moq-serve and opening localhost:7860.

  3. The browser loads @moq/net, @moq/publish, @moq/hang, @moq/signals from esm.sh. If that bothers you for production, the path is a tiny Vite build dropped into moq_prebuilt/client/. For dev / demo it works as-is.

kixelated added 2 commits May 24, 2026 19:42
- Python: depend on moq-rs 0.2.13 from PyPI; transport.py drives
  moq.Client (relay mode) or moq.Server (serve mode) through a shared
  OriginProducer, so the only difference between dialing a remote
  relay and being the relay is the context-manager type. Audio uses
  the 0.2.13 publish_audio / subscribe_audio API — the library
  handles Opus encode/decode + rubato resample, and the
  AudioDecoderOutput.latency_max_ms knob bounds how long the consumer
  waits for late frames (MoQ's congestion-control SLA). Transcript
  (RTVI JSON) rides on a raw byte track in the same broadcast.

- Runner: new --moq-serve flag puts the bot in server mode using a
  generated self-signed cert; /api/config hands the fingerprint to
  the browser so cert pinning works with no PEM files on disk.
  scripts/moq-dev-setup.sh is gone — no separate moq-relay process,
  no cert juggling.

- Browser: @moq/publish (Publish.Broadcast + Source.Microphone)
  handles mic capture, Opus encoding, catalog publishing, and serving
  subscribe requests. Bot-audio playback uses @moq/hang's
  Container.Consumer + WebCodecs AudioDecoder, ~60 lines instead of
  the 800-line hand-rolled moq-lite-02 implementation. Catalog
  discovery instead of pinned track names.

Net diff vs. the original POC: ~−1500 lines.
@kixelated kixelated marked this pull request as draft May 25, 2026 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant