Description
Linked with #160, #153, #152, #39, #34, #30
Also read: m1k1o/neko#371
In the v1.6.0 release, there is much higher confidence in our performance optimizations in the WebRTC stack.
We have achieved a way to eliminate jitterbuffer latency from the WebRTC decoder using playout-delay
and jitterBufferTarget
, along with many other measures to stabilize and improve the video and input (DataChannel) stack.
Moreover, we have incorporated smaller frames for the Opus codec to see if the latency improves (tracked in #153), but NetEQ in Chrome mostly works on its own.
There are still multiple interventions that may bring this WebRTC stack to the maximum and achieve the most ideal and optimal performance possible.
Backend:
- Correctly implement YUV 4:4:4 color
https://issues.chromium.org/issues/40198264
This is possible in WebRTC, where Nutanix Frame implemented YUV 4:4:4 within Chromium quite some time ago.
First, however, color in YUV 4:2:0 (#160) should be solved first as there is no legitimate reason that color in YUV 4:2:0 should be over +/- 1 different from the original source.
- Obtain the sweet spot of video encoder maximum and minimum QP parameters
https://multi.app/blog/making-illegible-slow-webrtc-screenshare-legible-and-fast
https://multi.app/blog/measuring-shared-control-latency
- Investigate the usage of queues to GStreamer RTP payloaders
Currently, the Opus queue is commented out. However, queues may have useful features.
Along with re-investigating the effectiveness of queues in Opus and their roles in latency, queues in video RTP payloaders may (or may not) also help during congestion where certain latency spikes might stay for >5-15 seconds because the WebRTC decoder scrambles to decode very late frames instead of simply dropping them.
An unknown configuration from the web browser may also totally eliminate this situation.
This must work nicely with infinite keyframe/GOP configurations and NACK/PLI with RTX.
- Compress DataChannel using GZip
It seems that Nestri saw some effective input latency drops with this.
Frontend:
- Override system power settings (Especially Chromium on Windows) to decode full frames
https://web.dev/articles/requestvideoframecallback-rvfc
It seems that when the system is in a low power efficiency mode, video decoding is not done quickly, as in the example. This leads to perceived increased latency because the frames aren't getting painted as often as they should.
Some settings in WebRTC or
- Moreover,
jitterBufferTarget
/jitterBufferDelayHint
/playoutDelayHint
are not well understood. Find out where this and other hidden WebRTC settings can improve upon the current approach.
Current configuration (reference from https://groups.google.com/g/discuss-webrtc/c/wtuhQu6c1KY/m/Usq84y0mAQAJ, a bit of a CPU hog but acceptable with async, could be more optimized or otherwise able to assess the effect of this configuration in web browsers):
// Repeatedly emit minimum latency target
webrtc.peerConnection.getReceivers().forEach((receiver) => {
let intervalLoop = setInterval(async () => {
if (receiver.track.readyState !== "live" || receiver.transport.state !== "connected") {
clearInterval(intervalLoop);
return;
} else {
receiver.jitterBufferTarget = receiver.jitterBufferDelayHint = receiver.playoutDelayHint = 0;
}
}, 15);
});
WebRTC:
Check if merging webrtcbin
back to one session is plausible: It seems that the video-delay
could have reduced the video latency without needing to have two separate sessions.
- Merge two different WebRTC sessions into one with multiple independent streams:
Use a=rtcp-mux
, a=group:BUNDLE 0 1 2 3 ...
and a=mid:0
, a=mid:1
, ... to establish one SDP session, but with independent streams for Audio, Video, DataChannel (m=application x UDP/DTLS/SCTP webrtc-datachannel
), Microphone, Webcam, and other types of streams which don't interfere nor do audio/video sync.
Such as:
v=0
o=- 2 IN IP4 1.1.1.1
t=0 0
a=group:BUNDLE 0 1 2 3
a=fingerprint:sha-256
a=setup:actpass
m=audio x UDP/TLS/RTP/SAVPF 111 63
c=IN IP4 0.0.0.0
a=rtcp:x IN IP4 0.0.0.0
a=mid:0
a=extmap:3 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01
a=sendonly
a=msid:id audio
a=rtcp-mux
a=rtcp-rsize
a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1
a=rtpmap:63 red/48000/2
a=rtcp-fb:63 transport-cc
a=fmtp:63 111/111
a=ptime:10
m=video x UDP/TLS/RTP/SAVPF 96 97 101 102 98
c=IN IP4 0.0.0.0
a=rtcp:x IN IP4 0.0.0.0
a=mid:1
a=extmap:2 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=extmap:3 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01
a=extmap:7 http://www.webrtc.org/experiments/rtp-hdrext/video-timing
a=extmap:12 http://www.webrtc.org/experiments/rtp-hdrext/playout-delay
a=sendonly
a=msid:id video
a=rtcp-mux
a=rtcp-rsize
a=rtpmap:101 H264/90000
a=rtcp-fb:101 transport-cc
a=rtcp-fb:101 ccm fir
a=rtcp-fb:101 nack
a=rtcp-fb:101 nack pli
a=fmtp:101 level-asymmetry-allowed=1;packetization-mode=1;sps-pps-idr-in-keyframe=1;profile-level-id=42e01f
a=rtpmap:102 rtx/90000
a=fmtp:102 apt=101;rtx-time=125
m=application x UDP/DTLS/SCTP webrtc-datachannel
c=IN IP4 0.0.0.0
a=mid:2
a=sctp-port:5000
a=max-message-size:262144
m=audio x UDP/TLS/RTP/SAVPF 111
c=IN IP4 0.0.0.0
a=rtcp:x IN IP4 0.0.0.0
The main purpose of doing this is to still isolate different streams so that there is no audio/video sync at all (which adds inevitable latency) and at the same time improve the performance of DataChannels as well by maintaining an independent stream separate from the video, but handle all of them with one TURN relay port or other types of WebRTC port in one single SDP.
- RTP Header Extensions and other WebRTC browser-side, server-side settings to implement and improve:
https://www.rtcbits.com/2023/05/webrtc-header-extensions.html
a=extmap:1 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=extmap:2 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01
a=extmap:3 http://www.webrtc.org/experiments/rtp-hdrext/video-timing
a=extmap:4 http://www.webrtc.org/experiments/rtp-hdrext/playout-delay
Note: http://www.webrtc.org/experiments/rtp-hdrext/color-space
causes the Chrome WebRTC decoder to skip the Hardware Decoder and go straight to the Software FFmpeg decoder.
The above RTP Header Extensions are known to help with controlling latency and timing. These can be implemented in GStreamer so that it can be emitted into RTP payloaders.
https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/3549
https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/3550
https://gstreamer.freedesktop.org/documentation/rtpmanager/rtphdrextclientaudiolevel.html
https://gstreamer.freedesktop.org/documentation/rtpmanager/rtphdrextmid.html
SDP support in web browsers: https://codepen.io/kwst/full/yLaaxRy
draft-holmer-rmcat-transport-wide-cc-extensions-01
is enabled for video and audio when rtpgccbwe
is active. abs-send-time
, video-timing
are not available in GStreamer. playout-delay
has been implemented in a very restricted temporary form in gstwebrtc_app.py
, where the only zero values can be sent (which is what we need, anyways).
- Investigate
imageattr
andflexfec
in video:
a=imageattr:96 send [x=[1280:1920],y=[720:1080],fps=[30:60]]
a=imageattr:97 send [x=[1280:1920],y=[720:1080],fps=[30:60]]
a=rtpmap:98 flexfec-03/90000
a=rtcp-fb:98 transport-cc
a=fmtp:98 repair-window=10000000
a=ssrc-group:FEC-FR
- Larger DataChannels:
a=max-message-size:262144
- Understand the effects of
b=AS:
andx-google-max-bitrate
(in the receiving-side, not the sending-side or browser-to-browser !!):
nextcloud/spreed#6739
https://groups.google.com/g/discuss-webrtc/c/u7k1_hASS4Q
https://stackoverflow.com/questions/57653899/how-to-increase-the-bitrate-of-webrtc
https://groups.google.com/g/discuss-webrtc/c/udyHHPnrQMo
pion/webrtc#1827
https://ekobit.com/blog/diving-deeper-into-webrtc-advanced-options-and-possibilities/
https://chromium.googlesource.com/external/webrtc/+/a6b99448eec51527eca0bc59f6da71061d02e807/webrtc/media/base/mediaconstants.cc
https://groups.google.com/g/discuss-webrtc/c/ORJdeoFAaBE
https://webrtc.googlesource.com/src/+/refs/heads/main/docs/native-code/sdp-ext/fmtp-x-google-per-layer-pli.md
The above links may have irrelevant information (controlling sender bitrate, this is because webrtcbin
is the sender and it does not use libwebrtc).
b=AS:300000
a=fmtp:96 sps-pps-idr-in-keyframe=1;x-google-max-bitrate=300000;x-google-min-bitrate=0;x-google-start-bitrate=12000
- Different protocol topologies to TURN and STUN
https://neko.m1k1o.net/#/getting-started/configuration?id=webrtc
Pion provides various WebRTC configurations and protocols including EPR, UDPMUX, TCPMUX, NAT1TO1, ICE-LITE, ICE-TCP, etc. These techniques allow more setup flexibility in addition to TURN/STUN and allow limiting port ranges or using a single port for many numbers of connections. This should be implemented with GStreamer's webrtcbin
.
https://www.w3.org/2021/03/media-production-workshop/talks/slides/sergio-garcia-murillo-whip.pdf
https://groups.google.com/g/discuss-webrtc/c/wtuhQu6c1KY
https://henbos.github.io/webrtc-timing/
https://github.com/jakearchibald/web-platform-tests/blob/master/webrtc-extensions/RTCRtpReceiver-playoutDelayHint.html
https://mediasoup.discourse.group/t/webrtc-playout-delay-extension/2067
https://issues.chromium.org/issues/324276557
https://bugzilla.mozilla.org/show_bug.cgi?id=1592988
https://groups.google.com/a/chromium.org/g/blink-dev/c/4W4orKqA3Rs
https://www.reddit.com/r/WebRTC/comments/ipewaq/disable_use_of_jitter_buffer/?rdt=58693