Skip to content

Infinite retries even on CRITICAL severity #830

Closed
@bhh1988

Description

@bhh1988

Have you read the FAQ and checked for duplicate issues:
Yes, I'm filing a more general issue, which is the root cause of #762.

What version of Shaka Player are you using:
This issue exists in both 2.0.x and 2.1.x, and not before.

Can you reproduce the issue with our latest release version:
Yes

Can you reproduce the issue with the latest code from master:
Yes

Are you using the demo app or your own custom app:
Yes

If custom app, can you reproduce the issue using our demo app:
Yes

What browser and OS are you using:
Chrome, Mac (but should be reproducible in others).

What are the manifest and license server URIs:
(you can send the URIs to shaka-player-issues@google.com instead, but please use GitHub and the template for the rest)
Irrelevant. Only thing that matters is that segments return non 2XX http code. Even 401 or 403 reproduces.

What did you do?
Host segments on my local server, program server to return 401/403, or anything non 2XX for segments (manifest still served fine).

What did you expect to happen?
For 401/403, these are CRITICAL severity since #620 so I expect no retries to occur. For other non-2XX codes, I expect only up to "maxAttempts" retries to occur.

What actually happened?
Retries forever. This is counter-intuitive since it basically doesn't respect the maxAttempts in the retryParameters. Not to mention it can lead to unintentional DDOS'ing of the host...

I've debugged through the code, and I think I understand the issue and possible solutions. #390 was fixed by introducing another layer of retries in StreamingEngine (see 0d77ddf). However, the logic introduced there, by calling handleNetworkError_(), effectively leads to infinite retries. This double-layering of retries (both StreamingEngine and NetworkingEngine) is confusing and seems to defeat the point of having a retry-policy with the RetryParameters. I could be wrong, but it seems to me that #390 would have been better addressed differently, by telling the user to configuring shaka's RetryParameters appropriately to be robust to network failures, and closing the issue as "by-design". Or at the minimum, it seems that 0d77ddf is catching too many cases in the if-statement by including BAD_HTTP_STATUS, when #390 was really intended for bad network conditions (which would be HTTP_ERROR or TIMEOUT).

Here are the solutions I see for this in order of preference:

  1. Remove the logic for calling handleNetworkError_(). This would eliminate the double-layering of retries. remove the handleNetworkError_() function altogether, after lifting out some logic for handling text-stream failures.
  2. Add a config variable like "doNotRetryInfinitely", and make the above logic conditional on this config variable.
  3. Continue allowing infinite retries but only for HTTP_ERROR or TIMEOUT.

Let me know if I'm overlooking something...

Metadata

Metadata

Assignees

No one assigned

    Labels

    status: archivedArchived and locked; will not be updatedtype: bugSomething isn't working correctly

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions