Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(remote): handle SSLError on file upload #2634

Closed
wants to merge 1 commit into from
Closed

fix(remote): handle SSLError on file upload #2634

wants to merge 1 commit into from

Conversation

pingsutw
Copy link
Member

@pingsutw pingsutw commented Aug 1, 2024

Tracking issue

Why are the changes needed?

What changes were proposed in this pull request?

How was this patch tested?

Setup process

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

Signed-off-by: Kevin Su <pingsutw@apache.org>
@redartera
Copy link
Contributor

redartera commented Aug 1, 2024

A program of mine used to crash because of an out-of-memory issue, and as I was debugging it this morning, I narrowed down the issue to the same piece of code this PR is targeting. I used flytekit 1.12.3 on Python 3.9, and I was trying to upload many files, each >2Gb. Given the local_file.read() call, all of my files were loaded into RAM and that was the root cause.

A fix which worked for me was to use streaming uploads. I tested the same fix with the latest flytekit and that caused the SSL OverflowError to disappear as well.

        with open(str(to_upload), "rb") as local_file:
            # Move the file-like obj to the end of the file to get its size
            local_file.seek(0, os.SEEK_END) 
            content_length = local_file.tell()
            # Move the file-like obj back to the beginning of the file to make it readable
            local_file.seek(0)
            headers = {"Content-Length": str(content_length), "Content-MD5": encoded_md5}
            headers.update(extra_headers)
            rsp = requests.put(
                upload_location.signed_url,
                data=local_file,
                headers=headers,
                verify=False
                if self._config.platform.insecure_skip_verify is True
                else self._config.platform.ca_cert_file_path,
            )

@pingsutw I was going to open a PR for this, although I just saw you working on it, how would you like to move forward with the streaming upload approach?

P.S: For more context on the OverflowError I saw, please see psf/requests#2717

@pingsutw
Copy link
Member Author

pingsutw commented Aug 1, 2024

@redartera I'm not sure how to do a streaming upload. Do you want to hop on a call? I was trying to reproduce the error, but I can upload 10G file without any issue.

@pingsutw
Copy link
Member Author

pingsutw commented Aug 1, 2024

Feel free to open a PR as well

@redartera
Copy link
Contributor

@redartera I'm not sure how to do a streaming upload. Do you want to hop on a call? I was trying to reproduce the error, but I can upload 10G file without any issue.

@pingsutw I uploaded a video here where I demo two runs - one which fails and one which succeeds after having applied the streaming upload fix.

  • First run: I try to upload a 10Gb file using upload_file, but you can see very quickly how memory usage goes up before I get an OverflowError exception. We can imagine if the file was 20Gb, I'd be out of memory.

  • Second run: I try to upload a 4Gb file after having applied the streaming upload fix, and everything works out nicely with very little memory overhead. I couldn't upload 10Gb because AWS S3 pre-signed URL uploads only allow files of up to 5gb (see screenshot below)

Screenshot 2024-08-01 at 7 18 29 PM

@pingsutw
Copy link
Member Author

pingsutw commented Aug 6, 2024

fixed by #2641

@pingsutw pingsutw closed this Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants