Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace requests with httpx #2041

Merged
merged 5 commits into from
Oct 20, 2024

Conversation

sarayourfriend
Copy link
Contributor

Fixes #2039

This was surprisingly more straightforward than I anticipated, but there are a few quirks worth explaining in the PR, which I'll leave as inline comments.

This PR definitely supersedes #2040, so I will close that PR in favour of this one. In which case, this also fixes #1960.

PR Checklist:

  • All new features have been tested
  • All new features have been documented
  • I have read the CONTRIBUTING.md file
  • I will abide by the code of conduct

@@ -85,7 +85,7 @@ dependencies = [
"platformdirs >= 2.6, < 5.0",
"psutil >= 5.9, < 7.0",
"python-dateutil >= 2.9.0.post0", # transitive dependency (beeware/briefcase#1428)
"requests >= 2.28, < 3.0",
"httpx >= 0.20, < 1.0",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really wasn't sure what version range to select here. I chose 0.20 by searching httpx's changelog and finding the last time they made any noted change to streaming. Other notable API usages are response.url as httpx.URL, which at least pre-dates 0.20 based on changelog notes that mention it receiving bug fixes in 0.18.0, as does the introduction of retries to httpx.HTTPTransport (0.15.0).

Aside from that, 0.20.0 is kind of arbitrary. It might be possible to go back even further like to 0.15.0, but 0.20.0 is 3 years old at this point, and I don't really know how to validate that any older version is actually supported (to be fair, I haven't done anything to validate that 0.20.0 is either, other than check feature availability).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.20.0 seems like a completely reasonable arbitrary version to me :-)

@@ -90,9 +90,12 @@ def __str__(self):


class NetworkFailure(BriefcaseCommandError):
def __init__(self, action):
DEFAULT_HINT = "is your computer offline?"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default hint is meant to prevent needing to modify other code paths that already relied on NetworkFailure. Doing the change this way prevents those areas from needing to change or duplicate the generic hint.

create_command.tools.requests.get = mock.MagicMock(
side_effect=requests_exceptions.ConnectionError
stream_mock = create_command.tools.httpx.stream = mock.MagicMock()
stream_mock.return_value.__enter__.side_effect = httpx.TransportError(
Copy link
Contributor Author

@sarayourfriend sarayourfriend Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

httpx.stream returns a context manager, so it's necessary to mock __enter__ with either the mocked response or exception. This pattern repeats throughout the PR, especially in test_File__download.py.

sess.mount("https://", adapter)

sess.head(url).raise_for_status()
transport = httpx.HTTPTransport(
Copy link
Contributor Author

@sarayourfriend sarayourfriend Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

httpx has no equivalent to urllib3's Retry and the requests adapter pattern.

Generally speaking, httpx's transports replace the adapter pattern, and in this case, they handle retrying in the case of network connectivity issues.

There is nothing like Retry and its status_forcelist, so the while loop reimplements that aspect.

I arbitrarily chose 3 retries. Retry uses 10 retries by default, so I could increase retries and bad_response_retries to 5 to match, but the even split between networking errors and status code errors would still be arbitrary and wouldn't match requests behaviour anyway. There's currently no way to implement a combined retry count in httpx without overriding private methods deep in httpcore's connection handling.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, 3 seems like a completely reasonable arbitrary number of retries. If you've had 3 network failures, I don't think there's any particular reason to think attempts 4 through 10 will be better; and it's definitely not worth doing any major dentistry to replicate a requests-specific behavior. Ideally, networks won't fail; but over here in the real world, accommodating a couple of failures before giving up seems completely reasonable.

@sarayourfriend
Copy link
Contributor Author

Hmmm, I'm not understanding where the coverage issue is coming from. When I run the test locally I can reproduce the same issue with 223->230... but if I put a breakpoint, the TooManyRedirects branch is definitely exercised in the test that mocks that exception (test_get_connection_error).

Maybe I'm misunderstanding what 223->230 means in this failure 🤔

Copy link
Member

@freakboy3742 freakboy3742 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow - they weren't kidding when they said that the httpx API is mostly requests-compatible... I'm surprised it's this compatible.

Regarding the coverage failure - the case it's reporting is the "bucket else" - when the exception type isn't any of the three cases that have been explicitly reported. I'm not sure if it's actually possible to have an exception other than those three types; but regardless, there's no way for coverage to do that reasoning because exceptions aren't type described by Python code.

My usual approach to getting coverage in a case like this is to make the "default" case (i.e., TransportError in this case) the else case. If you can't do that because the order of the isinstance checks is significant (e.g., because of the specific class hierarchy), then you can put an else: pass and annotate it with #pragma: no cover (with a comment describing why that branch can't happen).

Once the coverage issue is resolved, this looks pretty much good to go. Nice work!

@@ -85,7 +85,7 @@ dependencies = [
"platformdirs >= 2.6, < 5.0",
"psutil >= 5.9, < 7.0",
"python-dateutil >= 2.9.0.post0", # transitive dependency (beeware/briefcase#1428)
"requests >= 2.28, < 3.0",
"httpx >= 0.20, < 1.0",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.20.0 seems like a completely reasonable arbitrary version to me :-)

sess.mount("https://", adapter)

sess.head(url).raise_for_status()
transport = httpx.HTTPTransport(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, 3 seems like a completely reasonable arbitrary number of retries. If you've had 3 network failures, I don't think there's any particular reason to think attempts 4 through 10 will be better; and it's definitely not worth doing any major dentistry to replicate a requests-specific behavior. Ideally, networks won't fail; but over here in the real world, accommodating a couple of failures before giving up seems completely reasonable.

Swapping the "else" case to the most general (generic, default hint) makes this safe. If httpx introduces a fourth RequestError, it will not be obvious in the code, but at least the hint will be generic enough to not confuse the user, and vague enough to perhaps prompt them to check the logs and find out what the underlying error was.
@sarayourfriend
Copy link
Contributor Author

Wow - they weren't kidding when they said that the httpx API is mostly requests-compatible... I'm surprised it's this compatible.

It's pretty nice!

Regarding the coverage failure - the case it's reporting is the "bucket else" - when the exception type isn't any of the three cases that have been explicitly reported. I'm not sure if it's actually possible to have an exception other than those three types; but regardless, there's no way for coverage to do that reasoning because exceptions aren't type described by Python code.

Aha! Thank you for the explanation.

The isinstance check order does not matter in this case, so I've pushed a change to match your suggestion with TransportError as the default else case.


As an aside, if httpx introduces a fourth kind of RequestError, that else branch starts to handle more than just the TransportError. Ideally, the check would be exhaustive, and report when the technically possible but practically (currently) impossible case occurs. Is the better thing to do to add an else that re-raises the unknown error type instead of swallowing it into the logs, or transforms into a Briefcase error that explicitly indicates an unknown error occurred and to file a bug report with logs attached? Not sure how defensive you prefer to be in these kinds of situations. The stakes in this particular case are pretty low, as are the chances it ever matters, and the error is generic enough that a user might check the logs anyway, so truly exhaustive checks is maybe going overboard.

@sarayourfriend
Copy link
Contributor Author

Looks like the failing test found an unhandled edge case: https://github.com/beeware/briefcase/actions/runs/11411470307/job/31756731325?pr=2041#step:27:511

It should be fixed with this patch, I'll push that, but only after revisiting the tests to figure out how to accurately exercise that case:

diff --git a/src/briefcase/integrations/file.py b/src/briefcase/integrations/file.py
index 81922290..002bd4f2 100644
--- a/src/briefcase/integrations/file.py
+++ b/src/briefcase/integrations/file.py
@@ -256,6 +256,7 @@ class File(Tool):
             with temp_file:
                 total = response.headers.get("content-length")
                 if total is None:
+                    response.read()
                     temp_file.write(response.content)
                 else:
                     progress_bar = self.tools.input.progress_bar()

The current tests don't mock httpx.Response correctly, so never catch this case. They mock response.content directly with bytes, but it's a property in the real httpx.Response class. This gets to a limitation of mocking httpx in this way that I was a little worried about when refactoring, and which pytest-httpx or pook might be better suited to handle, they let the testing code get away from mocking the request/response cycle and act more like integration tests (but eliminating the outbound network requests).

However, a less invasive change would be to use a real response object and only mock (with wraps) the particular methods that need asserting on. I'll try to make that work.

@sarayourfriend
Copy link
Contributor Author

Okay, 5ffe0f1 refactors one of the file__download tests to reduce the extent of mocking and expose the bug. The following commit, 568644d, addresses the bug.

I'm going to convert this to a draft because I'd like to refactor the rest of the tests in test_File__download.py to match the more thinly mocked approach of the test I refactored in 5ffe0f1, and make sure to look for other tests that would benefit from the same treatment with respect to HTTP client mocking.

To clarify what I said before about pook and pytest-httpx, they significantly improve the ergonomics of writing tests like this. Compare the setup of the refactored test in 5ffe0f1 to the pook version, (which is very similar to the pytest-httpx but slightly more comprehensive in its requirements by default):

# one shot
pook.get("https://example.com/support?useful=Yes").reply(200).body(b"all content")

# chunked
pook.get("https://example.com/support?useful=Yes").reply(200).body([b"chunk 1;", b"chunk 2;", b"chunk 3;"], chunked=True)

Briefcase's usage is so minimal, that it doesn't feel worth pulling in a test dependency for just this test_File__download.py suite, but something to keep in mind if the need to mock HTTP request/response cycles grows.

Anyway, I'll work on the rest of this later this weekend or early next week 🙂

@sarayourfriend sarayourfriend marked this pull request as draft October 18, 2024 23:44
@freakboy3742
Copy link
Member

Not sure how defensive you prefer to be in these kinds of situations. The stakes in this particular case are pretty low, as are the chances it ever matters, and the error is generic enough that a user might check the logs anyway, so truly exhaustive checks is maybe going overboard.

In this case, the error message in the default case is already framed in a "maybe...?", so if the eventuality you've flagged ever happens, the error won't be wrong, just not as helpful as it could be under that specific failure mode, and we can easily adapt. If the error message pointed at a specific cause and mitigation, I'd probably prefer explicit and distinct bucket handling.

Briefcase's usage is so minimal, that it doesn't feel worth pulling in a test dependency for just this test_File__download.py suite, but something to keep in mind if the need to mock HTTP request/response cycles grows.

Good call. I'm not opposed to introducing a dependency if it's going to radically improve testing; but given we've already got 98% of the testing of downloads that we need without the dependency, and we've pretty much hit the limit of the downloading that Briefcase will need to do, if we can live without the extra dependency, that would be my preference for now.

@sarayourfriend sarayourfriend marked this pull request as ready for review October 19, 2024 22:36
Copy link
Member

@freakboy3742 freakboy3742 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another awesome contribution - thanks!

# ``response.read()``, httpx will still consume the ``stream`` response
# content internally. This allows testing both the non-streaming and
# streaming download paths without needing to complicate the response params
stream=_IteratorByteSteam(stream),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a second to work out what was going on here - but once I did get it, it's a very nice simplification.

@freakboy3742 freakboy3742 merged commit a95d336 into beeware:main Oct 20, 2024
49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace requests with httpx Handle Requests' ChunkedEncodingError exceptions when downloading files
2 participants