-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add retries to file download to avoid throwing an exception in the case of normal retry errors #1002
Conversation
Release 2.1.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreiberkyle Excellent! I'm requesting that two client features (directory assurance and progress bar) be moved up a layer.
planet/http.py
Outdated
new = response.num_bytes_downloaded - previous | ||
progress.update(new - previous) | ||
previous = new | ||
progress.update() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreiberkyle do you mean to put this statement within the chunk iteration? Unless you do, I don't see how progress is shown.
planet/http.py
Outdated
directory: Path = Path('.'), | ||
overwrite: bool = False, | ||
progress_bar: bool = False) -> Path: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreiberkyle this is not the right level for these features. Let's allow the Session class to work in a simple space with no progress bars and no directories. Just requests and responses and bytes going in and out. If we do, I think we'll find it easier to refactor and maintain over time. Principles of least concern and strong boundaries are what I'm invoking.
Concretely, let's give the download methods the responsibility of ensuring that directories exist and that we have a valid destination filename. Also, those methods should create progress bars and pass them to Session.write, which should just do progress.update()
(dependency injection).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed on separating concerns. I'm getting stuck on initiating the progress bar, which requires a total
which can only be read by the response. I will look deeper into how to make this work with the client download functions handling creating of the progress bar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this time, retry is managed by Session
. For Session
to manage retry, it has to submit the original request for the download (so it can retry that request). When it submits the request, the response is not available to the client download function. So, the download function cannot access the total
value in the response headers. There are two ways I can think of to allow download function to create the progress bar:
- pass two callbacks to
Session.write
, one for update and one for setting the total - have the download function submit the original request to get the total value, then have
Session.write
submit the original request again as a part of the write process
I tend to think more than one callback is complex and so prefer option 2 even though it will be less efficient communication-wise, but I am not sure my aversion to multiple callbacks is warrented. @sgillies what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like tqdm really supports setting the total after the progress bar has been initialized, so I am going to go with option 2: have the download function submit the original request to get the total, then have session.write submit it again to perform the download.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreiberkyle option 2 is better, I agree. Can the double requests be avoided by making a decorator form of the retrier and re-trying not the HTTP requests, but download_asset()
? Like this
@retry
async def download_asset(...):
...
async with self._session._client.stream, method='GET', url=location) as resp:
content_length = resp.headers['Content-Length']
dl_path = Path(directory, filename or parsed_from_content_disposition_header)
dl_path.parent.mkdir(exist_ok=True, parents=True)
return await self._session._write_response(
resp,
dl_path,
overwrite=overwrite,
progress_bar=tqdm(total=content_length, ...)
)
Maybe for the next iteration? I think double requests are tolerable for now. Would it be best to make sure the first one gets closed/finalized before the second one is made? That's something to look into.
It also occurs to me that file sizes are in the order manifest.json file, so there's another way to get them. Also making another request, but only one extra per order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decorating the client method has an example here: https://github.com/hynek/stamina#production-grade-retries-made-easy. I bet you could turn the SDK's retrier into such a decorator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed that we can generalize the retry function and make it work for retrying download_asset()
. In that case, we may want to change Session.request
so that it does not also retry (thereby squaring the effective MAX_RETRIES), and then change all the functions calling Session.request
so they DO retry. At any rate, it would require some more thought. So, I went with option 2 - two calls.
…bar management to clients
@sgillies merging to prepare a release. You will get a chance to review before the release as you will be the release manager =) |
Related Issue(s):
Closes #974
Proposed Changes:
For inclusion in changelog (if applicable):
models.StreamingBody
(which has been removed) tohttp.Session
Side note: Adding functionality while deleting a net of 59 lines seems like a win =)
Things I didn't do but believe would be a good idea:
StreamingBody
Not intended for changelog:
models.Response
now has newfilename
andlength
properties which areNone
unless the response is from a GET request for a downloadable resource.Diff of User Interface
Old behavior:
Given a list of activated order ids stored in
oids.txt
, the following script would result in some downloads failing due to http errors:New behavior:
All downloads are successful
PR Checklist:
(Optional) @mentions for Notifications: