Catch errors while handling redirects #1944

schlamar · 2014-03-10T07:24:49Z

This is my proposal which fixes #1939 completely.

@sigmavirus24 missed one possible exception in #1940 (ChunkedEncodingError) and didn't handle RuntimeError correctly (because this means that the content is already consumed).

This also addresses the issue that a decoding error is already thrown in Adapter.send by moving the content reading part at the end of Session.send.

maxcountryman · 2014-03-10T14:30:05Z

👍

schlamar · 2014-03-10T14:41:12Z

@maxcountryman Does this fix your issue (urllib3/urllib3#206)?

maxcountryman · 2014-03-10T14:48:09Z

@schlamar it looks like it should. I can't test it at the moment. A little later on I'll see if I have time to give it a proper run with Photobucket.

Lukasa · 2014-03-10T15:54:51Z

This looks reasonable to me. I'll let @sigmavirus24 take a look too.

sigmavirus24 · 2014-03-11T13:20:34Z

and didn't handle RuntimeError correctly (because this means that the content is already consumed)

In my tests it did not block and I prefer to do something as opposed to using pass in an except block. How was the handling incorrect?

schlamar · 2014-03-11T13:30:54Z

How was the handling incorrect?

It is just not logical to read from the response in case of RuntimeError because this exception is thrown if the content is already consumed (so a read is not necessary).

However, I'm willing to handle this as in your PR if you prefer it.

sigmavirus24 · 2014-03-11T13:41:16Z

a read is not necessary

While the read is not necessary, it is also not harmful.

It's my own stylistic preference that except blocks not be a simple "pass". (If there's no action to take then perhaps the exception should be handled somewhere else (or not thrown at all).) It's up to Kenneth and Cory if they like it as is. I have no strenuous objections to it, I just have a bit of a pet peeve about this particular style.

smallcode · 2014-03-11T18:14:55Z

👍

zackw · 2014-03-13T15:56:30Z

In addition to #1939, I believe this also fixes #1952 (which I just filed - sorry about that).

@schlamar Could you please add some test cases for these bugs as well? A test case for #1952 can be dug out of the (now-scrapped) pull request #1919.

@sigmavirus24 Regarding "except RuntimeError: pass # already decoded", I sympathize with your "should this exception have been thrown at all?" reaction, but it makes sense for Response.content to throw an exception when all the content has already been consumed; it happens that in this case we don't care since we are only accessing .content for its side effects. (Would it make you more comfortable if a more specific exception were thrown?)

schlamar · 2014-03-13T16:39:43Z

Could you please add some test cases for these bugs as well?

Very interesting. I guess this should be possible (while a test case for the original issue is actually hard, we would need a misbehaving web service for that).

should this exception have been thrown at all

I don't think this was his point. I guess you mixed two things together. My interpretation of @sigmavirus24 comments is:

Instead of a RuntimeError there should be a more explicit exception if the content is already consumed
Don't handle the RuntimeError separately in resolve_redirects, just do a r.raw.read(...) in all exception cases.

zackw · 2014-03-13T17:22:02Z

Here's the test case for #1952 (from #1919). Add to the big RequestsTestCase class in test_requests.py.

    def test_manual_redirect_with_partial_body_read(self):
        s = requests.Session()
        r1 = s.get(httpbin('redirect/2'), allow_redirects=False, stream=True)
        assert r1.is_redirect
        rg = s.resolve_redirects(r1, r1.request, stream=True)

        # read only the first eight bytes of the response body,
        # then follow the redirect
        r1.iter_content(8)
        r2 = next(rg)
        assert r2.is_redirect

        # read all of the response via iter_content,
        # then follow the redirect
        for _ in r2.iter_content(): pass
        r3 = next(rg)
        assert not r3.is_redirect

I agree #1939 is not feasible to test without adding stuff to httpbin. Maybe that should wait for #1166? A bunch of the other redirection bugs I just filed (and more to come) are going to be hard to test without adding stuff to httpbin, too.

schlamar · 2014-03-13T17:33:21Z

Thanks, I added this test.

Maybe we can add a gzipped_redirect to httpbin to test this?

sigmavirus24 · 2014-03-14T13:43:41Z

My interpretation of @sigmavirus24 comments [snip]

Your interpretation is correct. I've discussed this with several (far more experienced and knowledgeable Python developers, including core PyPy and CPython developers). RuntimeError exceptions should never be raised by any library ever. Yes they're there but that does not mean you should use them.

This also works with your overall goal @zackw: If the error were instead a child of a RequestException we could name it well and have:

try:
    requests.get(...)
except requests.RequestException:
    #...

Work in every case. In the redirect case, before this PR, it wouldn't. I know that there are several ideas of "good" Python code that I disagree with others on (including Kenneth), so I don't push those issues normally.

I haven't found a good way to preserve backwards compatibility though for those catching the RuntimeError and allowing for a RequestException. I don't think creating a new one that inherits from both is a good idea in the slightest. I'd love if either of you had the solution. In fact, I'd probably send you 🍰 :)

schlamar · 2014-03-14T13:55:59Z

While this is really interesting (never thought about how you would handle backwards compatibility when changing an exception type), it is actually a bit off topic here =) I think we should focus on this PR and maybe move this discussion into a new issue?

Points standing out:

Should I remove handling RuntimeError separately (I'm +0 on this)?
How to test against the original issue (Why decode the response body of a redirect? #1939)? Would a gzipped_redirect on httpbin a valid option? I don't know how Kenneth feels about such a "feature"...

sigmavirus24 · 2014-03-15T18:01:41Z

@schlamar it took me several months to get a feature merged and deployed on HTTPBin in order to test a bug fix. I'm not certain we should bother waiting that long. Do you have sufficient confidence that you could fake out a gzipped response from a redirect? You could take a similar approach to my test in #1963. Unfortunately, when I planned that test code, I was planning for the simplest case (mine). If there's a good way to adapt it to this PR, that would be awesome.

schlamar · 2014-03-15T20:35:19Z

@sigmavirus24 @zackw Added a test. What do you think?

schlamar · 2014-03-16T17:47:13Z

@sigmavirus24 Should I change

try:
    resp.content  # Consume socket so it can be released
except (ChunkedEncodingError, ContentDecodingError):
    resp.raw.read(decode_content=False)
except RuntimeError:
    pass  # already consumed

to

try:
    resp.content  # Consume socket so it can be released
except (ChunkedEncodingError, ContentDecodingError, RuntimeError):
    resp.raw.read(decode_content=False)

schlamar · 2014-03-16T17:47:53Z

Now looking at this in comparison I'm +1 for the latter one =)

schlamar · 2014-03-16T17:53:31Z

Updated and rebased against master.

sigmavirus24 · 2014-03-16T17:57:36Z

Now looking at this in comparison I'm +1 for the latter one =)

Me too ;)

sigmavirus24 · 2014-03-16T17:58:35Z

Gave this one more look over. LGTM. 👍

schlamar · 2014-03-18T11:10:26Z

I prefer to do something as opposed to using pass in an except block.

@sigmavirus24 I guess you'll like the new contextlib.suppress in Python 3.4. =)

sigmavirus24 · 2014-03-18T13:01:20Z

I really should stop using Python 2 and only use 3. ;)

Lukasa · 2014-03-18T14:05:59Z

That's what I've done with hyper. It's very freeing. =)

kennethreitz · 2014-03-24T15:54:14Z

#1939 terrifies me. When Requests was handling decoding of transfer-encodings itself, it It was explicitly handling the case of when servers are misinforming about the transfer-encoding. This is one of the major reasons I want to be very wary of coupling too closely to urllib3.

Pehaps, akin to the mutable r.encoding, there should be a r.transfer_encoding. This would be, very, very nice.

This would unify the API we have today — and gives us the same behavior for character encodings (which was my intention, I'm not sure how this bug got introduced :P).

Lukasa · 2014-03-24T16:26:40Z

I suspect that r.transfer_encoding would be very useful for pip. I seem to recall that they've had some trouble with servers sending .tar.gz files with Content-Encoding: gzip set, causing r.content to save the ungzipped tar file. @dstufft, does that ring a bell?

dstufft · 2014-03-24T16:28:40Z

Yes that's right. I had to reimplement parts of requests in order to disable the decoding.

Lukasa · 2014-03-24T17:00:25Z

@kennethreitz If .transfer_encoding is something you're interested in I'll take a crack at it.

schlamar · 2014-03-24T17:33:52Z

When Requests was handling decoding of transfer-encodings itself, it It was explicitly handling the case of when servers are misinforming about the transfer-encoding.

But this was broken because it dropped at least one byte from the original response: https://github.com/kennethreitz/requests/issues/1249 To be able to recover gracefully from a decoding error and having the original bytes available afterwards you need basically cache the complete response because decoding could theoretically fail on the last byte (which makes streaming responses basically useless).

there should be a r.transfer_encoding

I don't see any other use case than disabling content decoding. And this can already be done with r.raw.decode_content = False on streaming responses.

kennethreitz · 2014-03-31T14:31:38Z

@Lukasa by all means!

kennethreitz · 2014-03-31T14:32:36Z

@schlamar or specifying a transfer encoding other than what the server provided. It'll just give is a more flexible architecture around this, in general.

kennethreitz · 2014-03-31T14:33:01Z

The question is, if we implement this, is this pull request still necessary?

schlamar · 2014-04-03T06:50:52Z

The question is, if we implement this, is this pull request still necessary?

The question is, how should the proposed feature help resolving this issue =)

schlamar · 2014-04-03T07:35:53Z

The question is, how should the proposed feature help resolving this issue =)

Ah, I see:

This would unify the API we have today — and gives us the same behavior for character encodings

Character encoding does a replace in case of an error.

But in this case it would only make sense if stream=False as you cannot recover gracefully from a decoding error while you are iterating over the content (gracefully means keeping the original bytes from the response).

So a rough sketch how this could be done:

set raw.decode_content to False in case of stream=False
do decoding after consuming content (here)

However, in case of stream=True we will still see this error so this PR is necessary.

kennethreitz · 2014-05-12T19:22:05Z

@schlamar if you perform a rebase this will be merged and released today!

@zackw

Credits go to @zackw.

schlamar · 2014-05-12T20:48:09Z

Done.

Catch errors while handling redirects

kennethreitz · 2014-05-12T20:50:38Z

✨ 🍰 ✨

schlamar mentioned this pull request Mar 10, 2014

GZip decoding regression urllib3/urllib3#206

Closed

zackw mentioned this pull request Mar 13, 2014

In stream=True mode, redirect-following crashes when body already consumed #1952

Closed

zackw mentioned this pull request Mar 15, 2014

Fix #1955: Do not use original request in redirect #1963

Merged

sigmavirus24 assigned kennethreitz Mar 16, 2014

schlamar added 4 commits May 12, 2014 22:42

Catch possible exceptions while consuming content of redirect responses.

c2a1f28

Read content in Session.send instead of Adapter.send.

59c8d81

Added test case to handle response of streamed redirects.

5e860c0

Credits go to @zackw.

Added test for redirect with wrong gzipped header.

1645991

kennethreitz added a commit that referenced this pull request May 12, 2014

Merge pull request #1944 from schlamar/redirect-decoding

739d153

Catch errors while handling redirects

kennethreitz merged commit 739d153 into psf:master May 12, 2014

schlamar deleted the redirect-decoding branch May 12, 2014 20:50

sigmavirus24 mentioned this pull request Sep 7, 2014

Download time of response body is not included in Response.elapsed #2209

Closed

xjsender mentioned this pull request Feb 6, 2015

Can't decompress .gzip file when creating new project xjsender/haoide#4

Closed

github-actions bot locked as resolved and limited conversation to collaborators Sep 8, 2021

Catch errors while handling redirects #1944

Catch errors while handling redirects #1944

Conversation

schlamar commented Mar 10, 2014

maxcountryman commented Mar 10, 2014

schlamar commented Mar 10, 2014

maxcountryman commented Mar 10, 2014

Lukasa commented Mar 10, 2014

sigmavirus24 commented Mar 11, 2014

schlamar commented Mar 11, 2014

sigmavirus24 commented Mar 11, 2014

smallcode commented Mar 11, 2014

zackw commented Mar 13, 2014

schlamar commented Mar 13, 2014

zackw commented Mar 13, 2014

schlamar commented Mar 13, 2014

sigmavirus24 commented Mar 14, 2014

schlamar commented Mar 14, 2014

sigmavirus24 commented Mar 15, 2014

schlamar commented Mar 15, 2014

schlamar commented Mar 16, 2014

schlamar commented Mar 16, 2014

schlamar commented Mar 16, 2014

sigmavirus24 commented Mar 16, 2014

sigmavirus24 commented Mar 16, 2014

schlamar commented Mar 18, 2014

sigmavirus24 commented Mar 18, 2014

Lukasa commented Mar 18, 2014

kennethreitz commented Mar 24, 2014

Lukasa commented Mar 24, 2014

dstufft commented Mar 24, 2014

Lukasa commented Mar 24, 2014

schlamar commented Mar 24, 2014

kennethreitz commented Mar 31, 2014

kennethreitz commented Mar 31, 2014

kennethreitz commented Mar 31, 2014

schlamar commented Apr 3, 2014

schlamar commented Apr 3, 2014

kennethreitz commented May 12, 2014

schlamar commented May 12, 2014

kennethreitz commented May 12, 2014