Skip to content

Fix buffering handling in progressive playback #1556

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: wpe-2.46
Choose a base branch
from

Conversation

filipe-norte-red
Copy link

@filipe-norte-red filipe-norte-red commented Aug 6, 2025

When performing a seek operation during playback, the pipeline switches to the PAUSED state and starts buffering data. As this transition is async, we may reach the 100% buffer level before the state change completes. This may cause updateBufferingStatus() to stop getting called while the async transition is still in progress. As m_isBuffering was getting set back to m_wasBuffering value in updateStates() async state handling, once the state change completed, the updateStates() might not detect that we are not buffering anymore, as m_isBuffering is holding the wrong value.

4fcc021

Build-Tests Layout-Tests
✅ 🛠 wpe-246-amd64-build ❌ 🧪 wpe-246-amd64-layout
✅ 🛠 wpe-246-arm32-build ❌ 🧪 wpe-246-arm32-layout

When performing a seek operation during playback, the pipeline switches
to the PAUSED state and starts buffering data. As this transition is
async, we may reach the 100% buffer level before the state change
completes. This may cause updateBufferingStatus() to stop getting
called while the async transition is still in progress.
As m_isBuffering was getting set back to m_wasBuffering value
in updateStates() async state handling, once the state change
completed, the updateStates() might not detect that we are not
buffering anymore, as m_isBuffering is holding the wrong value.
@eocanha
Copy link
Member

eocanha commented Aug 6, 2025

On first sight, this looks like a good cleanup (but I would have to check if it causes regressions in upstream layout tests; it was a bit hard to get all the moving parts in place (no regressions) when introducing the buffering hysteresis patch back in the day (there's even a blog post about it).

To understand a bit better the codepaths that trigger the problem, I would need to know if you're using playbin2 (I assume you are, see the pipeline dumps for that) or playbin3, and if the specific video triggering the issue is using in-memory buffering (I assume it is if you're using a set-top-box, you can see the gst logs for that) or on-disk buffering.

So, assuming playbin2 and in-memory buffering, what would happen on a seek would be:

  1. doSeek() resets the buffering level to zero and clears the history without calling updateStates(). This sets m_isBuffering = true (or leaves it like that if it already had that value).
  2. (Optional) Before the gst_element_seek() call returns (while the seek event is propagating through the pipeline), the buffering level may rise via GST_MESSAGE_BUFFERING messages. That ends up calling updateBufferingStatus(). Buffering level might reach 100% and trigger a true --> false transition in m_isBuffering. This might trigger changes in updateStates() in that cycle, but updateStates() knows that we're on an async change and would undo the transition. Except the pipeline might not yet be aware about being in an async change because the gst_element_seek() call hastn't returned yet. Is this the case you're trying to address?
  3. The gst_element_seek() call in doSeek() has returned, so the pipeline is now in async change state.
  4. The buffering level may rise via GST_MESSAGE_BUFFERING messages, reach 100%, trigger transition and transition being undone, so m_isBuffering is still true.
  5. Seek completes asynchronously, so GST_MESSAGE_ASYNC_DONE is handled (but this is unconsequential) and the state change is reported and handled, triggering updateStates()... without having called updateBufferingStatus() (oops!! we're depending on a new buffering message to trigger the m_isBuffering true --> false transition, and that buffering message might not come).

To make easier to locate the different function calls, I'm attaching a "call hierarchy" capture from my IDE (tree nodes call their parent node):

image

Which of the two "(2) the pipeline might not yet be aware about being in an async change because the gst_element_seek() call hastn't returned yet" and "(5) that buffering message might not come" cases is your patch trying to solve? Can you elaborate the timeline of events that the patch tries to address a bit more, so that I can understand it completely?

A full GStreamer log (with the interesting places highlighted) would be helpful to understand the specific case.

@filipe-norte-red
Copy link
Author

Thanks for the prompt reply, @eocanha. I appreciate the extra validation, as this was not a straightforward change considering all different execution paths.

Issue happens with playbin (gstreamer 1.18) and in-memory buffering (media disk cache is disabled). My use case is "(5) that buffering message might not come" .

You can find the logs in the attached logs.zip file.

The execution sequence is as follows:

  • Pipeline is switched from PLAYING to PAUSED state
  • Seek is triggered (timestamp 0:00:50.154692187)
  • updateBufferingStatus() is called with "reset history" flag set.
    • This sets m_isBuffering to true and m_wasBuffering to true due "reset history" flag
  • Async state change is in progress
  • updateBufferingStatus() is called a few times with increasing buffering level, but reaches eventually 100%
    • This sets m_isBuffering to false while m_wasBuffering is true (timestamp 0:00:51.206861128), BUT
    • updateStates() is called, which sets m_isBuffering to true again to delay true -> false transition (timestamp 0:00:51.206954836, and others)
  • updateBufferingStatus() stops being called while we are still in the async state transition
  • updateStates() is called with the state change to PAUSED now completed (timestamp 0:00:52.274080382), but m_isBuffering is still true and not reset afterward (no further calls to updateBufferingStatus()), so player believes we are still buffering

As on-disk buffering is disabled, the "MediaPlayerPrivateGStreamer::fillTimerFired()" won't get called to update the buffering status and recover from this situation.

Let me know if anything remains unclear.

Thanks

@eocanha
Copy link
Member

eocanha commented Aug 6, 2025

updateBufferingStatus() stops being called while we are still in the async state transition

That's not completely true. It gets called as long as there are GST_MESSAGE_BUFFERING messages (there are a lot of updateBufferingStatus() calls between 0:00:51.206954836 and 0:00:51.498112047 (last time before finishing seek). The problem is that in all those cases any m_isBuffering true --> false transition is reverted and there's no spureous GST_MESSAGE_BUFFERING after the finished seek to unlock the blockage.

Now I'm trying to see how your solution would actually solve this problem, considering the 100% reaching that happens at 0:00:51.498112047. Thinking aloud here:

Let's suppose that 0:00:51.498112047 (the one that ultimately causes the seek "hang") is the case to be fixed, and let's also suppose that any other 100% level reach before that hasn't happened in these logs (for the sake of simplicity). In that case, we can assume that m_wasBuffering is true and m_isBuffering will now be false, triggering the true --> false transition and calling updateStates(). So far so good.

But... what will trigger more updateBufferingStatus() calls if not a new GST_MESSAGE_BUFFERING that will never come?

Maybe a better and more reliable solution to this "no more GST_MESSAGE_BUFFERING" messages before async change completion would be to reevaluate the buffering level in some way on async change completion, here.

@filipe-norte-red
Copy link
Author

filipe-norte-red commented Aug 7, 2025

I may have over-simplified the execution steps. I didn't mean to say updateBufferingStatus() was never called again as it is called a few times due GST_MESSAGE_BUFFERING message as you say. I meant that at some point, and after a few calls to it happened, there will be no more GST_MESSAGE_BUFFERING messages and consequently no call to updateBufferingStatus(), which happens while the async transition is still in progress.

But... what will trigger more updateBufferingStatus() calls if not a new GST_MESSAGE_BUFFERING that will never come?

If we have already the transition for m_isBuffering from true --> false, then we would not need any more buffering messages as long as we don't reset that state as currently done here. If we keep resetting it, then something else is needed (which goes to your next point).

Maybe a better and more reliable solution to this "no more GST_MESSAGE_BUFFERING" messages before async change completion would be to reevaluate the buffering level in some way on async change completion, here.

In my first approach (which also worked for this use case without apparent side-effects during smoke testing), I did precisely the query to the buffering status and called updateBufferingStatus() (similar to the fillTimerFired() implementation). I did it before the updateSates() call here, although updateBufferingStatus() was calling updateStates() already anyway.

I also agree that evaluating the buffering status again at a key point would be a more robust approach as it would not depend on some corner case execution sequence that may have not been identified yet. However, I had some concerns about the query operation being safe from within the message handler without the risk of causing blocks or dead-locks (despite done after state transition completed) although I could not pin point a specific case. There would also be an extra delay to the state transition completion handling, although probably irrelevant as it should be small enough (did not measure it) and acceptable in favor of a more robust solution.

I'm happy to change the solution if there is a more robust and safe way to do it.

@eocanha
Copy link
Member

eocanha commented Aug 7, 2025

I also agree that evaluating the buffering status again at a key point would be a more robust approach as it would not depend on some corner case execution sequence that may have not been identified yet. However, I had some concerns about the query operation being safe from within the message handler without the risk of causing blocks or dead-locks (despite done after state transition completed) although I could not pin point a specific case. There would also be an extra delay to the state transition completion handling, although probably irrelevant as it should be small enough (did not measure it) and acceptable in favor of a more robust solution.
I'm happy to change the solution if there is a more robust and safe way to do it.

I think I that evaluating the buffering status on seek completion would be more robust.

Yesterday I looked again at your code and I don't have the implications clear, since I realized that what you're doing is something different to "resetting to the previous state". What you're doing is "eating up one intermediate state", like this:

Old code: updateBufferingStatus() shifted state history to the past for iteration n and updateStates() "eats up" state n-2

  • Before calling updateBufferingStatus() on iteration n: m_isBuffering = <state n-1>, m_was_buffering = <state n-2>
  • After calling updateBufferingStatus() on iteration n: m_isBuffering = <state n>, m_wasBuffering = <state n-1>
  • After calling updateStates() when GST_STATE_CHANGE_ASYNC on iteration n: m_isBuffering = <state n-1>, m_wasBuffering = <state n-1>

Your proposed code: updateBufferingStatus() "eats up" state n-1 for iteration n when GST_STATE_CHANGE_ASYNC:

  • Before calling updateBufferingStatus() on iteration n: m_isBuffering = <state n-1>, m_was_buffering = <state n-2>
  • After calling updateBufferingStatus() on iteration n when GST_STATE_CHANGE_ASYNC: m_isBuffering = <state n>, m_wasBuffering = <state n-2>

...and I don't have the implications of "eating an intermediate state" very clear.

When performing a seek operation during playback, the pipeline switches
to the PAUSED state and starts buffering data. As this transition is
async, we may reach the 100% buffer level before the state change
completes. This may cause updateBufferingStatus() to stop getting
called while the async transition is still in progress.
As m_isBuffering was getting set back to m_wasBuffering value
in updateStates() async state handling, once the state change
completed, the updateStates() might not detect that we are not
buffering anymore, as m_isBuffering is holding the wrong value.
@filipe-norte-red
Copy link
Author

Actually, it is not eating up one intermediate state only. Instead, it is keeping m_wasBuffering with whatever original state it had before the async state change started. This may mean it will hold the very first value it got when the seek was initiated (the initialized value, which should get set to true as buffering starts) or the last value of m_wasBuffering set by updateBufferingStatus() before the async state change got in progress, which should be true (unless I missed something, should it return false, it would do so while the async state is already in progress and therefore not affect m_wasBuffering).

This means that it should keep m_wasBuffering = true while in the async state change is in progress. The method updateBufferingStatus() might get called multiple times, causing m_isBuffering to hold always the latest value that resulted from the "buffering" message.
Should m_isBuffering become false while the async state change is in progress, it will get "analyzed" once the async state change finishes via the updateStates() call, completing the state change (m_wasBuffering would still be true at that point).
Should m_isBuffering become false after in the async state change completes, it will get "analyzed" in updateStates() call done by updateBufferingStatus(), completing the state change.

As for having a more robust solution:

I reverted these changes and updated the pull request with the approach to update the buffering status when the async state completes (I kept previous commits for now to allow referencing back to them while we are evaluating our options). However, what I noticed is that when playback resumes, the first video frame may remain freezed for 1-2 seconds (this happens quite often with this approach). It still needs confirmation, but it would seem like there is not be enough video data buffered yet. This is because queryBufferingPercentage() (a existing method) will preferably get the buffering level from the audiosink only (if not overridden by any quirk), which may fill quicker than video. As the state change has completed, since we call now updateBufferingStatus(), it may cause a too early transition to the playing state because buffering has "completed" (for audio; for video it might not have). In contrast, the calls to updateBufferingStatus() due the "buffering" message will take into account both audio and video data as it comes from queue2 element (well, it really depends on how the different stream types are multiplexed, but assuming a balanced distribution across provided buffers).

That said, we do have another use case where we need to patch the buffering code, but which is not upstreamable. That patch also solves the use case I tried to address here even without any of my changes. While that solves the issue for us without the changes here, it would still make sense to fix this, as I'd expect someone else might run into this issue sooner or later.

@eocanha eocanha self-requested a review August 13, 2025 16:31
@eocanha
Copy link
Member

eocanha commented Aug 13, 2025

I reverted these changes and updated the pull request with the approach to update the buffering status when the async state completes (I kept previous commits for now to allow referencing back to them while we are evaluating our options). However, what I noticed is that when playback resumes, the first video frame may remain freezed for 1-2 seconds (this happens quite often with this approach). It still needs confirmation, but it would seem like there is not be enough video data buffered yet. This is because queryBufferingPercentage() (a existing method) will preferably get the buffering level from the audiosink only (if not overridden by any quirk), which may fill quicker than video. As the state change has completed, since we call now updateBufferingStatus(), it may cause a too early transition to the playing state because buffering has "completed" (for audio; for video it might not have). In contrast, the calls to updateBufferingStatus() due the "buffering" message will take into account both audio and video data as it comes from queue2 element (well, it really depends on how the different stream types are multiplexed, but assuming a balanced distribution across provided buffers).

That said, we do have another use case where we need to patch the buffering code, but which is not upstreamable. That patch also solves the use case I tried to address here even without any of my changes. While that solves the issue for us without the changes here, it would still make sense to fix this, as I'd expect someone else might run into this issue sooner or later.

I noticed that manually querying the buffering status with a query might not be the best way to get the actual buffering level. Those queries are meant for the case when a GstDownloadBuffer is used (on-disk buffering) and also for some special cases such as file:/// or mediastream urls. Also, using GST_BUFFERING_DOWNLOAD here feels wrong, because the problem we're trying to solve only happens on GST_BUFFERING_STREAM.

The approach I would suggest would be to store the last buffering GstMessage received here in a GRefPtr<GstMessage> m_lastBufferingMessage private field in the player private. That will only happen for buffering messages, so the on-disk buffering and file/mediastream use cases wouldn't be affected, as intended. That field would then be reset to nullptr on seeks after this line, as well as in changePipelineState() here when the change result variable says that the change has been GST_STATE_CHANGE_ASYNC. In general, we can be aggressive resetting the field even on cases where the buffering mode is not GST_BUFFERING_STREAM, since the field should already be null in those cases.

Then, when the async state change finishes here, and only if there's a stored buffering GstMessage, you can call processBufferingStats(m_lastBufferingMessage.get()) and set the field to null after that. This would trigger exactly the same buffering reaction as originally intended. You don't need your version of the method with no parameters in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants