Skip to content

Conversation

@hiroshihorie
Copy link
Member

No description provided.

@hiroshihorie hiroshihorie merged commit b553e3e into main Nov 25, 2025
16 checks passed
@hiroshihorie hiroshihorie deleted the hiroshi/fix-pending-tracks branch November 25, 2025 13:43
rokk4 added a commit to rokk4/client-sdk-flutter that referenced this pull request Dec 2, 2025
Fixes race condition where tracks arriving before participant metadata
were permanently dropped from the pending queue after timeout, causing
10-60 second delays or complete failures when participants rejoin.

Changes:
1. Retry transient failures: Modified _flushPendingTracks() to differentiate
   between transient (notTrackMetadataFound) and permanent failures. Transient
   failures now keep tracks in queue for retry instead of removing them.

2. Additional flush trigger: Added listener to flush pending tracks when
   SignalParticipantUpdateEvent contains track publications, ensuring tracks
   are subscribed once metadata becomes available.

3. Improved logging: Transient failures logged at fine level to reduce noise,
   permanent failures at severe level for visibility.

The fix maintains the existing timeout configuration from connectOptions
while enabling retry logic that resolves the race condition where:
- WebRTC track arrives first → queued
- ParticipantInfo arrives → participant created → flush fails (no publications)
- TrackPublishedResponse arrives later → second flush succeeds

This reduces track subscription latency after rejoin from 10-60s to <1s
and improves reliability on slower devices where the race condition was
more pronounced.

Related: livekit#928
rokk4 added a commit to rokk4/client-sdk-flutter that referenced this pull request Dec 2, 2025
… logic

Combines defensive and reactive approaches to fix race condition where tracks
arriving before participant metadata caused 10-60s delays or failures on rejoin.

Root Cause:
When a participant rejoins, WebRTC tracks can arrive before signaling metadata.
The previous logic had three critical gaps:
1. Tracks queued but dropped on timeout (no retry)
2. Missing flush triggers when metadata finally arrives
3. Insufficient deferral check (only participant existence, not publication)

Solution - Three-Layer Defense:

1. PREVENTIVE: Enhanced deferral logic (NEW)
   Check not just participant existence, but also publication metadata:
   - connectionState != connected (pre-connection tracks)
   - participant == null (tracks before participant)
   - publication == null (tracks before metadata) ← NEW CHECK

   This prevents premature subscription attempts that would timeout.

2. REACTIVE: Retry transient failures
   Modified _flushPendingTracks() to differentiate failure types:
   - notTrackMetadataFound → return false (keep in queue, retry)
   - Other failures → return true (remove from queue)

   Handles micro-timing races where flush happens before metadata processed.

3. AGGRESSIVE: Additional flush trigger
   Added SignalParticipantUpdateEvent listener to flush when track
   publications arrive, ensuring queued tracks are processed promptly.

Impact:
- Reduces rejoin latency from 10-60s to <1s
- Eliminates frozen frames on rejoin
- More robust on slower devices (reduced CPU-dependent timing sensitivity)
- Maintains configurable timeout from connectOptions

The combined approach is superior because:
- Prevention reduces unnecessary timeout waits
- Retry ensures recovery from edge cases
- Aggressive flush ensures timely processing
- Event-driven design scales better than polling

Related: livekit#928
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants