Gemini Realtime produces zero audio output when used with Simli avatar plugin (works with Hedra)

### Environment

- **livekit-agents:** 1.3.11 (also confirmed on 1.3.12 plugin source)
- **livekit-plugins-simli:** 1.3.11
- **livekit-plugins-hedra:** 1.3.11
- **Python:** 3.11
- **OS:** macOS (LiveKit Cloud deployment)
- **LLM:** Google Gemini Native Audio (`gemini-2.5-flash-native-audio-preview-12-2025`)

### Description

Google Gemini Realtime (Native Audio) produces **zero audio output** when routed through the Simli avatar plugin's `DataStreamAudioOutput`. The same Gemini model works correctly with Hedra's `DataStreamAudioOutput`. The issue is Simli-specific, not a general `DataStreamAudioOutput` or Gemini problem.

### Steps to Reproduce

1. Create an `AgentSession` with `google.beta.realtime.RealtimeModel(modalities=["AUDIO"])`
2. Create a `simli.AvatarSession` with a valid `face_id` and `api_key`
3. Call `avatar_session.start(session, room)`, then `session.start(agent, room)`
4. Call `session.generate_reply()` or wait for user audio input
5. Observe: Gemini creates generations but produces **zero `model_turn` content** — no `inline_data` (audio), no `text`, no `output_transcription`

### Expected Behavior

Gemini should produce audio output that flows through Simli's `DataStreamAudioOutput` to the Simli avatar worker, identical to how it works with Hedra.

### Actual Behavior

Gemini acknowledges requests (generation lifecycle events fire) but produces empty generations:

```json
// Response #1: generation_complete with no model_turn
{
  "has_model_turn": false,
  "turn_complete": null,
  "generation_complete": true
}

// Response #2: turn_complete with no model_turn
{
  "has_model_turn": false,
  "turn_complete": true,
  "generation_complete": null
}
```

The audio routing chain is healthy — `DataStreamAudioOutput._started=True`, `audio_enabled=True` — but zero frames arrive because Gemini produces nothing upstream.

### Root Cause Analysis

The Simli plugin is the **only** avatar plugin that omits `wait_remote_track` when creating `DataStreamAudioOutput`:

| Plugin | `wait_remote_track` | Result with Gemini |
|--------|---------------------|--------------------|
| **Hedra** | `rtc.TrackKind.KIND_VIDEO` | **Works** |
| **Simli** | `None` (omitted) | **Fails** |
| Bey | `KIND_VIDEO` | Untested |
| Tavus | `KIND_VIDEO` | Untested |
| Anam | `KIND_VIDEO` | Untested |
| Avatario | `KIND_VIDEO` | Untested |
| LemonSlice | `KIND_VIDEO` | Untested |

**Attempted fixes (all failed):**

1. **Adding `wait_remote_track=KIND_VIDEO` after `avatar_session.start()`:** Replacing Simli's `DataStreamAudioOutput` with one that includes `wait_remote_track=KIND_VIDEO` causes the session to **deadlock** — Simli's worker does not publish a video track that can be awaited.

2. **Updating plugin version:** Simli plugin source is identical across 1.3.11, 1.3.12, and 1.4.0rc2 — no changes to `DataStreamAudioOutput` construction.

3. **Adding startup delay:** 3-second `asyncio.sleep()` before `generate_reply()` for Simli sessions — no effect.

### Key Observations

- Simli is the only avatar plugin that omits `wait_remote_track` in its `DataStreamAudioOutput` constructor
- Simli's worker does **not** publish a video track (adding `wait_remote_track=KIND_VIDEO` deadlocks)
- OpenAI Realtime + Simli works — the issue is specific to Gemini + Simli
- Gemini + Hedra works — the issue is specific to Simli, not Gemini or `DataStreamAudioOutput` in general
- This may be related to #3353 (Simli + LLM second connection failure), which also reports Simli-specific issues

### Minimal Reproduction

```python
from livekit.agents import Agent, AgentSession, JobContext, cli
from livekit.plugins import google, simli

async def entrypoint(ctx: JobContext):
    await ctx.connect()

    session = AgentSession(
        llm=google.beta.realtime.RealtimeModel(
            model="gemini-2.5-flash-native-audio-preview-12-2025",
            modalities=["AUDIO"],
            voice="Puck",
        ),
        resume_false_interruption=False,
    )

    avatar = simli.AvatarSession(
        simli_config=simli.SimliConfig(
            api_key="YOUR_SIMLI_API_KEY",
            face_id="YOUR_FACE_ID",
        ),
    )
    await avatar.start(session, ctx.room)

    await session.start(
        agent=Agent(instructions="Greet the user."),
        room=ctx.room,
    )

    # This produces zero audio output with Simli.
    # Replace simli with hedra.AvatarSession and it works.
    await session.generate_reply(instructions="Say hello")
```

### Suggested Investigation

The architectural difference (no `wait_remote_track`, no video track publication) suggests Simli's `DataStreamAudioOutput` enters `_started=True` earlier than other plugins. This may cause a race condition where Gemini's realtime session sees the audio output as "ready" but Simli's downstream worker isn't actually prepared to receive audio, causing Gemini to silently produce empty generations. OpenAI's realtime model may be more resilient to this timing issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gemini Realtime produces zero audio output when used with Simli avatar plugin (works with Hedra) #4648

Environment

Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Root Cause Analysis

Key Observations

Minimal Reproduction

Suggested Investigation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Plugin	`wait_remote_track`	Result with Gemini
Hedra	`rtc.TrackKind.KIND_VIDEO`	Works
Simli	`None` (omitted)	Fails
Bey	`KIND_VIDEO`	Untested
Tavus	`KIND_VIDEO`	Untested
Anam	`KIND_VIDEO`	Untested
Avatario	`KIND_VIDEO`	Untested
LemonSlice	`KIND_VIDEO`	Untested

Gemini Realtime produces zero audio output when used with Simli avatar plugin (works with Hedra) #4648

Description

Environment

Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Root Cause Analysis

Key Observations

Minimal Reproduction

Suggested Investigation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions