Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #2108. New boto session for each thread #2136

Merged
merged 8 commits into from
Mar 11, 2020

Conversation

lauralorenz
Copy link

@lauralorenz lauralorenz commented Mar 9, 2020

Thanks for contributing to Prefect!

Please describe your work and make sure your PR:

  • adds new tests (if appropriate)
  • updates CHANGELOG.md (if appropriate)
  • updates docstrings for any new functions or function arguments, including docs/outline.toml for API reference docs (if appropriate)

Note that your PR will not be reviewed unless all three boxes are checked.

What does this PR change?

Instantiates a new boto session (and thus boto client) if we think we are in a new thread. Closes #2108 which has a lot of context for this. In the end (see commit history for the history) I went with memoizing the boto client not on the result handler instance as before, but in prefect.context. Anecdotally, initializing at every call to S3ResultHandler.client a la 0852c2c made the dask example flow twice as slow with two local dask workers using --nprocs 1 --nthreads 3.

Why is this PR important?

Tasks on multithreaded executors can now use the S3ResultHandler (before tasks would fail when initializing the boto client).

NOTE ❓ : Should I be attempting to add an integration test testing this edge between Dask executor, mapped flows and S3?

lauralorenz added 3 commits March 9, 2020 17:37
Naive solution that always initializes a new boto3 session and client whenever accessed, regardless of what thread we are in
Utilize the task context to initialize a new boto3 session and client if this task didn't make one already
In retrospect, I probably don't need to get in the cache key but can just go on context directly.
cicdw
cicdw previously approved these changes Mar 10, 2020
@codecov
Copy link

codecov bot commented Mar 10, 2020

Codecov Report

Merging #2136 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

@cicdw cicdw merged commit 7f12d0a into master Mar 11, 2020
@cicdw cicdw deleted the issue-2108-new-boto-session-for-each-thread branch March 11, 2020 21:04
zanieb pushed a commit that referenced this pull request Jul 8, 2022
Add a system block that pulls its value from env var
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flow S3ResultHandler Fails for Dask Worker with nthreads > 1
2 participants