-
Notifications
You must be signed in to change notification settings - Fork 30
Description
Hi @peterbe,
the code at …
Lines 67 to 72 in a64f778
location, _ = cgi.parse_header(r.headers.get("location", "")) | |
if not location: | |
raise PackageError( | |
"No 'Location' header on {0} ({1})".format(url, status_code) | |
) | |
return _download(location) |
…caught my attention with regard to two things:
The first is that return _download(location)
should probably be return _download(location, binary=binary)
or that choice will just be lost.
The second is that while in the past the location of the redirect URI was defined and guaranteed to be absolute, in the sense that…
absolute URIs always begin with a scheme name followed by a colon
…and hence recursing back into _download
would work (if we ignore endless loops for a second) but HTTP was changed in 2014 to allow URI references e.g. Location: /People.html#tim
and then to be able to recurse back into _download
we would now either (1) need to do URI resolution to always end up with a scheme and a domain name — an absolute URI — or (2) make urlopen
follow redirects or (3) use something else that does.
It turns out that urlopen
already handles redirects for us:
# wget -O/dev/null https://httpbin.org/status/302
--2024-10-28 16:14:58-- https://httpbin.org/status/302
Resolving httpbin.org... 54.84.141.5, 34.228.248.173
Connecting to httpbin.org|54.84.141.5|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: /redirect/1 [following]
--2024-10-28 16:15:00-- https://httpbin.org/redirect/1
Reusing existing connection to httpbin.org:443.
HTTP request sent, awaiting response... 302 FOUND
Location: /get [following]
--2024-10-28 16:15:00-- https://httpbin.org/get
Reusing existing connection to httpbin.org:443.
HTTP request sent, awaiting response... 200 OK
Length: 291 [application/json]
Saving to: ‘/dev/null’
/dev/null 100%[=====================================================================================>] 291 --.-KB/s in 0s
2024-10-28 16:15:01 (5.29 MB/s) - ‘/dev/null’ saved [291/291]
# python3 -c 'from urllib.request import urlopen; print(urlopen("https://httpbin.org/status/302").getcode())'
200
So maybe that's dead code then and we can just rip the broken redirect handling out. @peterbe what do you think? Should I make a pull request?
PS: This started out at #195 (review) .