-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In FlyteRemote.upload_file
, pass the file object directly rather than the entire bytes buffer
#2641
In FlyteRemote.upload_file
, pass the file object directly rather than the entire bytes buffer
#2641
Conversation
Signed-off-by: Reda Oulbacha <reda@artera.ai>
9f60643
to
f1ed53d
Compare
any recommendations on how this change might be tested in a more systematic way? |
@redartera Thank you so much for fixing it.
We can register a task with a large file in the integration test. flytekit/tests/flytekit/integration/remote/test_remote.py Lines 31 to 49 in ea6fa0d
btw, this is awesome. I didn't know |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2641 +/- ##
==========================================
- Coverage 78.77% 78.47% -0.31%
==========================================
Files 187 187
Lines 19149 19151 +2
Branches 3993 3994 +1
==========================================
- Hits 15085 15029 -56
- Misses 3374 3436 +62
+ Partials 690 686 -4 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Reda Oulbacha <reda@artera.ai>
Signed-off-by: Reda Oulbacha <reda@artera.ai>
Signed-off-by: redartera <120470035+redartera@users.noreply.github.com>
Signed-off-by: redartera <120470035+redartera@users.noreply.github.com>
sounds good - I added a test which writes a large file to a module then runs |
Signed-off-by: redartera <120470035+redartera@users.noreply.github.com>
Signed-off-by: redartera <120470035+redartera@users.noreply.github.com>
5f68f36
to
618c067
Compare
seems like the GH runners are running out of disk space, I reduced the file size to 2 and 3gb while adding the |
Signed-off-by: Kevin Su <pingsutw@apache.org>
What if we added a separate pytest marker to run just this test? |
…flyteremote-forked
Signed-off-by: redartera <120470035+redartera@users.noreply.github.com>
I like this approach - will find a cleaner way to adopt it |
Signed-off-by: redartera <120470035+redartera@users.noreply.github.com>
Signed-off-by: redartera <120470035+redartera@users.noreply.github.com>
Signed-off-by: redartera <120470035+redartera@users.noreply.github.com>
Signed-off-by: redartera <120470035+redartera@users.noreply.github.com>
Signed-off-by: redartera <120470035+redartera@users.noreply.github.com>
Signed-off-by: redartera <120470035+redartera@users.noreply.github.com>
Signed-off-by: redartera <120470035+redartera@users.noreply.github.com>
@eapolinario @pingsutw I made a few major changes to both the implementation and the integration test:
|
Signed-off-by: redartera <120470035+redartera@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
…an the entire bytes buffer (flyteorg#2641) * pass the local file directly for streaming in FlyteRemote.upload_file Signed-off-by: Reda Oulbacha <reda@artera.ai> * ruff format Signed-off-by: Reda Oulbacha <reda@artera.ai> * add an integration test Signed-off-by: Reda Oulbacha <reda@artera.ai> * remove unnecessary len Signed-off-by: redartera <120470035+redartera@users.noreply.github.com> * redo registration in the integration test Signed-off-by: redartera <120470035+redartera@users.noreply.github.com> * fix misspel Signed-off-by: redartera <120470035+redartera@users.noreply.github.com> * run the integration test serially Signed-off-by: redartera <120470035+redartera@users.noreply.github.com> * disable agent Signed-off-by: Kevin Su <pingsutw@apache.org> * use os.stat instead of os.seek to determine content_length Signed-off-by: redartera <120470035+redartera@users.noreply.github.com> * rewrite tests only uploda a file, use a separate marker Signed-off-by: redartera <120470035+redartera@users.noreply.github.com> * parametrize integration test makefile cmd Signed-off-by: redartera <120470035+redartera@users.noreply.github.com> * add workflow_dispatch for debugging Signed-off-by: redartera <120470035+redartera@users.noreply.github.com> * replace trigger with push Signed-off-by: redartera <120470035+redartera@users.noreply.github.com> * remove trailing whitespaces Signed-off-by: redartera <120470035+redartera@users.noreply.github.com> * remove agent disabling Signed-off-by: redartera <120470035+redartera@users.noreply.github.com> * remove trailing debug CI trigger Signed-off-by: redartera <120470035+redartera@users.noreply.github.com> * clean up botocore imports Signed-off-by: redartera <120470035+redartera@users.noreply.github.com> --------- Signed-off-by: Reda Oulbacha <reda@artera.ai> Signed-off-by: redartera <120470035+redartera@users.noreply.github.com> Signed-off-by: Kevin Su <pingsutw@apache.org> Co-authored-by: Kevin Su <pingsutw@apache.org> Signed-off-by: mao3267 <chenvincent610@gmail.com>
Why are the changes needed?
In the
upload_file
function, we read the entire file's content and hold onto it in memory - this is problematic when dealing with large files. Moreover, there is an issue whererequests.put
can raise anOverflowError
exception when dealing with large byte buffers.For a demo replicating the issue and demonstrating the fix, please refer to this comment.
Setup: Python 3.9, flytekit 1.12.3 and Ubuntu 22.04
What changes were proposed in this pull request?
The proposed changes follow the recommendation from the official requests documentation
Additionally, we add an integration test for
FlyteRemote.upload_file
that is run separately from the rest of our integration tests, using a newlftransfers
pytest marker. Changes were made to ourMakefile
, ourpythonbuild.yml
as well as the markers section ofpyproject.toml
accordingly.Check all the applicable boxes
Related PRs