Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catch errors while handling redirects #1944

Merged
merged 4 commits into from
May 12, 2014
Merged

Catch errors while handling redirects #1944

merged 4 commits into from
May 12, 2014

Conversation

schlamar
Copy link
Contributor

This is my proposal which fixes #1939 completely.

@sigmavirus24 missed one possible exception in #1940 (ChunkedEncodingError) and didn't handle RuntimeError correctly (because this means that the content is already consumed).

This also addresses the issue that a decoding error is already thrown in Adapter.send by moving the content reading part at the end of Session.send.

@maxcountryman
Copy link
Contributor

👍

@schlamar
Copy link
Contributor Author

@maxcountryman Does this fix your issue (urllib3/urllib3#206)?

@maxcountryman
Copy link
Contributor

@schlamar it looks like it should. I can't test it at the moment. A little later on I'll see if I have time to give it a proper run with Photobucket.

@Lukasa
Copy link
Member

Lukasa commented Mar 10, 2014

This looks reasonable to me. I'll let @sigmavirus24 take a look too.

@sigmavirus24
Copy link
Contributor

and didn't handle RuntimeError correctly (because this means that the content is already consumed)

In my tests it did not block and I prefer to do something as opposed to using pass in an except block. How was the handling incorrect?

@schlamar
Copy link
Contributor Author

How was the handling incorrect?

It is just not logical to read from the response in case of RuntimeError because this exception is thrown if the content is already consumed (so a read is not necessary).

However, I'm willing to handle this as in your PR if you prefer it.

@sigmavirus24
Copy link
Contributor

a read is not necessary

While the read is not necessary, it is also not harmful.

It's my own stylistic preference that except blocks not be a simple "pass". (If there's no action to take then perhaps the exception should be handled somewhere else (or not thrown at all).) It's up to Kenneth and Cory if they like it as is. I have no strenuous objections to it, I just have a bit of a pet peeve about this particular style.

@smallcode
Copy link

👍

@zackw
Copy link
Contributor

zackw commented Mar 13, 2014

In addition to #1939, I believe this also fixes #1952 (which I just filed - sorry about that).

@schlamar Could you please add some test cases for these bugs as well? A test case for #1952 can be dug out of the (now-scrapped) pull request #1919.

@sigmavirus24 Regarding "except RuntimeError: pass # already decoded", I sympathize with your "should this exception have been thrown at all?" reaction, but it makes sense for Response.content to throw an exception when all the content has already been consumed; it happens that in this case we don't care since we are only accessing .content for its side effects. (Would it make you more comfortable if a more specific exception were thrown?)

@schlamar
Copy link
Contributor Author

Could you please add some test cases for these bugs as well?

Very interesting. I guess this should be possible (while a test case for the original issue is actually hard, we would need a misbehaving web service for that).

should this exception have been thrown at all

I don't think this was his point. I guess you mixed two things together. My interpretation of @sigmavirus24 comments is:

  1. Instead of a RuntimeError there should be a more explicit exception if the content is already consumed
  2. Don't handle the RuntimeError separately in resolve_redirects, just do a r.raw.read(...) in all exception cases.

@zackw
Copy link
Contributor

zackw commented Mar 13, 2014

Here's the test case for #1952 (from #1919). Add to the big RequestsTestCase class in test_requests.py.

    def test_manual_redirect_with_partial_body_read(self):
        s = requests.Session()
        r1 = s.get(httpbin('redirect/2'), allow_redirects=False, stream=True)
        assert r1.is_redirect
        rg = s.resolve_redirects(r1, r1.request, stream=True)

        # read only the first eight bytes of the response body,
        # then follow the redirect
        r1.iter_content(8)
        r2 = next(rg)
        assert r2.is_redirect

        # read all of the response via iter_content,
        # then follow the redirect
        for _ in r2.iter_content(): pass
        r3 = next(rg)
        assert not r3.is_redirect

I agree #1939 is not feasible to test without adding stuff to httpbin. Maybe that should wait for #1166? A bunch of the other redirection bugs I just filed (and more to come) are going to be hard to test without adding stuff to httpbin, too.

@schlamar
Copy link
Contributor Author

Thanks, I added this test.

Maybe we can add a gzipped_redirect to httpbin to test this?

@sigmavirus24
Copy link
Contributor

My interpretation of @sigmavirus24 comments [snip]

Your interpretation is correct. I've discussed this with several (far more experienced and knowledgeable Python developers, including core PyPy and CPython developers). RuntimeError exceptions should never be raised by any library ever. Yes they're there but that does not mean you should use them.

This also works with your overall goal @zackw: If the error were instead a child of a RequestException we could name it well and have:

try:
    requests.get(...)
except requests.RequestException:
    #...

Work in every case. In the redirect case, before this PR, it wouldn't. I know that there are several ideas of "good" Python code that I disagree with others on (including Kenneth), so I don't push those issues normally.

I haven't found a good way to preserve backwards compatibility though for those catching the RuntimeError and allowing for a RequestException. I don't think creating a new one that inherits from both is a good idea in the slightest. I'd love if either of you had the solution. In fact, I'd probably send you 🍰 :)

@schlamar
Copy link
Contributor Author

While this is really interesting (never thought about how you would handle backwards compatibility when changing an exception type), it is actually a bit off topic here =) I think we should focus on this PR and maybe move this discussion into a new issue?

Points standing out:

  • Should I remove handling RuntimeError separately (I'm +0 on this)?
  • How to test against the original issue (Why decode the response body of a redirect? #1939)? Would a gzipped_redirect on httpbin a valid option? I don't know how Kenneth feels about such a "feature"...

@sigmavirus24
Copy link
Contributor

@schlamar it took me several months to get a feature merged and deployed on HTTPBin in order to test a bug fix. I'm not certain we should bother waiting that long. Do you have sufficient confidence that you could fake out a gzipped response from a redirect? You could take a similar approach to my test in #1963. Unfortunately, when I planned that test code, I was planning for the simplest case (mine). If there's a good way to adapt it to this PR, that would be awesome.

@schlamar
Copy link
Contributor Author

@sigmavirus24 @zackw Added a test. What do you think?

@schlamar
Copy link
Contributor Author

@sigmavirus24 Should I change

try:
    resp.content  # Consume socket so it can be released
except (ChunkedEncodingError, ContentDecodingError):
    resp.raw.read(decode_content=False)
except RuntimeError:
    pass  # already consumed

to

try:
    resp.content  # Consume socket so it can be released
except (ChunkedEncodingError, ContentDecodingError, RuntimeError):
    resp.raw.read(decode_content=False)

@schlamar
Copy link
Contributor Author

Now looking at this in comparison I'm +1 for the latter one =)

@schlamar
Copy link
Contributor Author

Updated and rebased against master.

@sigmavirus24
Copy link
Contributor

Now looking at this in comparison I'm +1 for the latter one =)

Me too ;)

@sigmavirus24
Copy link
Contributor

Gave this one more look over. LGTM. :shipit: 👍

@schlamar
Copy link
Contributor Author

I prefer to do something as opposed to using pass in an except block.

@sigmavirus24 I guess you'll like the new contextlib.suppress in Python 3.4. =)

@sigmavirus24
Copy link
Contributor

I really should stop using Python 2 and only use 3. ;)

@Lukasa
Copy link
Member

Lukasa commented Mar 18, 2014

That's what I've done with hyper. It's very freeing. =)

@kennethreitz
Copy link
Contributor

#1939 terrifies me. When Requests was handling decoding of transfer-encodings itself, it It was explicitly handling the case of when servers are misinforming about the transfer-encoding. This is one of the major reasons I want to be very wary of coupling too closely to urllib3.

Pehaps, akin to the mutable r.encoding, there should be a r.transfer_encoding. This would be, very, very nice.

This would unify the API we have today — and gives us the same behavior for character encodings (which was my intention, I'm not sure how this bug got introduced :P).

@Lukasa
Copy link
Member

Lukasa commented Mar 24, 2014

I suspect that r.transfer_encoding would be very useful for pip. I seem to recall that they've had some trouble with servers sending .tar.gz files with Content-Encoding: gzip set, causing r.content to save the ungzipped tar file. @dstufft, does that ring a bell?

@dstufft
Copy link
Contributor

dstufft commented Mar 24, 2014

Yes that's right. I had to reimplement parts of requests in order to disable the decoding.

@Lukasa
Copy link
Member

Lukasa commented Mar 24, 2014

@kennethreitz If .transfer_encoding is something you're interested in I'll take a crack at it.

@schlamar
Copy link
Contributor Author

When Requests was handling decoding of transfer-encodings itself, it It was explicitly handling the case of when servers are misinforming about the transfer-encoding.

But this was broken because it dropped at least one byte from the original response: https://github.com/kennethreitz/requests/issues/1249 To be able to recover gracefully from a decoding error and having the original bytes available afterwards you need basically cache the complete response because decoding could theoretically fail on the last byte (which makes streaming responses basically useless).

there should be a r.transfer_encoding

I don't see any other use case than disabling content decoding. And this can already be done with r.raw.decode_content = False on streaming responses.

@kennethreitz
Copy link
Contributor

@Lukasa by all means!

@kennethreitz
Copy link
Contributor

@schlamar or specifying a transfer encoding other than what the server provided. It'll just give is a more flexible architecture around this, in general.

@kennethreitz
Copy link
Contributor

The question is, if we implement this, is this pull request still necessary?

@schlamar
Copy link
Contributor Author

schlamar commented Apr 3, 2014

The question is, if we implement this, is this pull request still necessary?

The question is, how should the proposed feature help resolving this issue =)

@schlamar
Copy link
Contributor Author

schlamar commented Apr 3, 2014

The question is, how should the proposed feature help resolving this issue =)

Ah, I see:

This would unify the API we have today — and gives us the same behavior for character encodings

Character encoding does a replace in case of an error.

But in this case it would only make sense if stream=False as you cannot recover gracefully from a decoding error while you are iterating over the content (gracefully means keeping the original bytes from the response).

So a rough sketch how this could be done:

  • set raw.decode_content to False in case of stream=False
  • do decoding after consuming content (here)

However, in case of stream=True we will still see this error so this PR is necessary.

@kennethreitz
Copy link
Contributor

@schlamar if you perform a rebase this will be merged and released today!

@schlamar
Copy link
Contributor Author

Done.

kennethreitz added a commit that referenced this pull request May 12, 2014
Catch errors while handling redirects
@kennethreitz kennethreitz merged commit 739d153 into psf:master May 12, 2014
@kennethreitz
Copy link
Contributor

✨ 🍰 ✨

@schlamar schlamar deleted the redirect-decoding branch May 12, 2014 20:50
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Why decode the response body of a redirect?
8 participants