Description
🚀 The feature
Currently, both audio and video decoding depend on knowing both the total number of frames and the duration of the stream. As a consequence, we can't decode live streaming video or audio because we cannot know the total number of frames or the duration.
This feature would be to somehow enable decoding of live media. Because the APIs for VideoDecoder
and AudioDecoder
assume the ability to seek to arbitrary places in the stream, I suspect we might need a new top-level public API specifically for live streams.
Motivation, pitch
On the training side, I suspect this feature will not be used much. That is, training tends to happen on large corpuses of pre-existing media. I doubt there's much demand to train on a live stream. But doing inference on a live stream seems like it would be a common use-case, and we can't currently support it.