Skip to content

Conversation

absurdlylongusername
Copy link
Member

  • I carefully read the contribution guidelines and agree to them.
  • I have tested the API against NewPipe.
  • I agree to create a pull request for NewPipe as soon as possible to make it compatible with the changed API.

Validate HTTP response codes for Soundcloud

Production code that was using Downloader does not do any checking on the response code: therefore errors and exceptions thrown due to error response code give false information (e.g. tests failing due to 404 were giving error about clientId when the error was 404).

I added a method in Response.java and a utility method for validating response codes and used it in places for soundcloud extraction. It will throw exception with information about the error code, which will propagate upwards to the call site which will print a more useful stack trace for debugging.

Fix failing tests in SoundcloudChannelTabExtractorTest

  • Fixed test failing due to 404
  • IDE refactoring to remove IDEA warnings

@absurdlylongusername absurdlylongusername changed the title Soundcloud validate http response [SoundCloud] Validate HTTP response codes Jul 8, 2025
@TobiGr TobiGr added bug Issue is related to a bug soundcloud service, https://soundcloud.com/ tests Issues and PR related to unit tests labels Jul 8, 2025
Copy link
Contributor

@TobiGr TobiGr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thaank you. Changes LGTM, but conflicts need to be resolved first.

Copy link
Member

@litetex litetex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this can be merged in the current state:

  • There are Todo comments
  • The changes in the test will be irrelevant after #1332
  • Response codes that are ok are 2XX response codes
  • I'm really not sure about another HttpUtils method and a new Exception class

I think the most simple approach would be to have a boolean Response#isOk() method and then use this method to control the underlying use cases accordingly.

@absurdlylongusername
Copy link
Member Author

@litetex

  • There are Todo comments

The comment was kind of intentionally left in there in the hopes that whoever reviewed it would provide an explanation and then I would remove it.

  • Response codes that are ok are 2XX response codes

Not exactly sure what you mean by this? Do you mean that 2xx is the only "success" code range all network calls throughout the codebase?

  • I'm really not sure about another HttpUtils method and a new Exception class
    I think the most simple approach would be to have a boolean Response#isOk() method and then use this method to control the underlying use cases accordingly.

Do you mean another as in having two validateResponseCode methods, one in Response and one in HttpUtils, or do you mean that there's another HttpUtils class somewhere (because I can't seem to find one)?

And does this also mean we shouldn't have a HttpResponseException? If so, why? I think having a specific exception for Http errors is better than throwing a generic IOException, and also makes for better logging and diagnostics, because right now our code does not give any useful information at all for what causes an error if the cause is an invalid response.


My idea was to have a generic method we can use to check for expected response codes instead of just checking it's 2xx or not 2xx, because I was of the assumption that:

  1. We are definitely not validating response codes anywhere in the codebase (I can't see anywhere we do this)
  2. There may be scenarios where we want to treat a specific 3xx or 4xx as not an error and it's not efficient to catch an exception in these cases and then check the response code (and then throw another exception with info about the http error that will get propagated upwards), as I believe we should never throw exception if we don't have to, and exceptions should only be thrown for unexpected behaviour, or in the same scenarios we already throw exceptions.

The end goal is that if a network call ever fails, we want to know why, using code that is:

  • As small as possible
  • Easy to understand
  • Easy to use
  • Easy to extend

If there's only one place in code where non-2xx codes shouldn't be considered an error then we could handle that case explicitly at that call site, but if it's at least 3 times then I think we should have more than a simple isOk method.
Could perhaps have other methods like Response#isRedirect for 3xx, Response#isClientError for 4xx, Response#isServerError instead of a single generic method.

Since I am almost certain 1. is true, I am more than happy to trawl through the entire codebase for every http call we make (not jusr for get, but for post and head too) and add response code validation and see which approach makes the most sense.

I could update this PR with these changes and/or split it into multiple PRs if it is too big.

Thoughts?

Copy link
Member

@Stypox Stypox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! We do need to check response codes more often. For example, PeerTube sometimes throws 429 Too Many Requests (TeamNewPipe/NewPipe#2979), but that shouldn't be treated as a recaptcha by the Downloader (while that's what happens now). It should be the PeerTube service's job to throw a normal ExtractorException for 429s, and instead the YouTube service's job to throw a RecaptchaException (along with a proper URL that users can use to solve the captcha, instead of having the Downloader guess that URL).

It would be good if you could take the above paragraph into consideration for this PR, and properly validate the HTTP response codes of all requests made by all services in the extractor, so we don't have to change the design once again later.

* or when {@code validResponseCodes} is empty and the code is a 4xx or 5xx error.
*/
// CHECKSTYLE:ON
public static void validateResponseCode(final Response response,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would inline this directly in Response, especially because it's first argument is a Response

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah changes in this file should be reverted as they were superseded

import java.io.IOException;
import org.schabi.newpipe.extractor.downloader.Response;

public class HttpResponseException extends IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an ExtractorException, since the extractor received an unexpected response. There was no error in data input/output, since the request went through just fine at the network layer.

Copy link
Member Author

@absurdlylongusername absurdlylongusername Oct 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, problem with this. If this is changed to ExtractionException, then that means any code that validates HttpResponseException will also now need to have throws ExtractionException.

Is this something we want or not?


return null;
}
// CHECKSTYLE:OFF
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why checkstyle off?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, checkstyle was complaining about line being too long in the comment below, so I had to disable it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is related to a bug soundcloud service, https://soundcloud.com/ tests Issues and PR related to unit tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants