Skip to content

Conversation

@sfc-gh-zyao
Copy link
Contributor

@sfc-gh-zyao sfc-gh-zyao commented Mar 5, 2025

…ream methods to the public connector

Description

NO-CHANGELOG-UPDATES

  • The change appears large, but it is basically porting the _upload / _download / _upload_stream / _download_stream methods (along with direct_file_operation_utils.py, which provides needed utils for the implementation of the 4 methods) from SP connector to public connector (reference: a similar change on SP connector: https://github.com/snowflakedb/Stored-Proc-Python-Connector/pull/194). The same implementation has been adopted in SP connector and porting it back to public connector is needed based on this decision
    • These methods (_upload / _download / _upload_stream / _download_stream) are necessary to run Stored Proc with public connector (this is how Stored Proc connector implements these methods. The implementation of these methods for public Python connector and SP connector should be exactly the same. It appears a bit different at the moment only because we haven't cleaned up the parameter protection yet)
    • They are helpful for non-SP sessions as well
    • The Python Stored Proc has its own logic of file operation parser and stream downloader, so we abstract away the detailed implementation of FileOperationParser and StreamDownloader on the server side
    • We do not prioritize file operation parser and stream downloader implementation for non-stored-proc here, because it is a non-goal for stored proc fork removal project (this is why the parse_file_operation and download_as_stream methods are set to raise NotImplemented error

Testing

  • existing unit tests and integ tests to make sure the new change does not break them
  • a new sanity-check test to assert that the needed file utils are present (i.e. parse_file_operation and download_as_stream)
  • local testing to show that the new code works for Python SP connection (we cannot test it formally in a integ test as the entire project is still WIP)
    1. testing _upload and _download_stream, no compression
upload-then-download-stream
  1. testing _upload and _download_stream, with compression
compress-upload-then-download-stream
  1. testing _upload_stream and _download, no compression
upload-stream-then-download
  1. testing _upload_stream and _download, with compression
compress-upload-stream-then-download

Please answer these questions before submitting your pull requests. Thanks!

  1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes #NNNN

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am modifying authorization mechanisms
    • I am adding new credentials
    • I am modifying OCSP code
    • I am adding a new dependency
  3. Please describe how your code solves the related issue.

    Please write a short description of how your code change solves the related issue.

  4. (Optional) PR for stored-proc connector:

…ream methods to the public connector

### Description
- The same implementation has been adopted in SP connector and porting it back to public connector is needed based on [this decision](https://docs.google.com/document/d/1KegQLzyxB5G2Viti5Tr8fD24IPjIKU3cCZtRUUWqm4k)
  - These methods (_upload / _download / _upload_stream / _download_stream) are necessary to run Stored Proc with public connector ([this](https://github.com/snowflakedb/Stored-Proc-Python-Connector/blob/1e2519f3ddc71547642b007b712aae59ef00dfff/src/snowflake/connector/cursor.py#L808-L833) is how Stored Proc connector implements these methods. The implementation of these methods for public Python connector and SP connector should be exactly the same. It appears a bit different at the moment only because we haven't cleaned up the parameter protection yet)
  - They are helpful for non-SP sessions as well
  - The Python Stored Proc has its own logic of file operation parser and stream downloader, so we abstract away the detailed implementation of FileOperationParser and StreamDownloader on the server side
  - We do not prioritize file operation parser and stream downloader implementation for non-stored-proc here, because it is a non-goal for stored proc fork removal project (this is why the `parse_file_operation` and `download_as_stream` methods are set to raise NotImplemented error

 ### Testing
 - existing unit tests and integ tests to make sure the new change does not break them
 - local testing to show that the new code works for Python SP connection
 - a new sanity-check test to assert that the needed file utils are present (i.e. parse_file_operation and download_as_stream)
@sfc-gh-zyao sfc-gh-zyao requested a review from sfc-gh-sfan March 5, 2025 17:41
# check SNOW-1218851 for long term improvement plan to refactor ocsp code
atexit.register(self._close_at_exit)

# Set up the file operation parser and stream downloader.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To reviewers (especially @sfc-gh-sfan ) SP connector will also need to set connection.rso_id attribute, in order for SP file operation to work as intended (reference: SP connector code). I decided to do the attribute setting on the server side (something like this PoC code), since rso_id is not a general concept for non-SP connections.

The upside of this approach is that we can abstract away unnecessary/SP-specific concept from public connector

The downside (which I don't think will really happen) is that, it seems as if rso_id attribute does not exist in connection (if we only read public connector code). So let's say, if other future developer tries to introduce a seemingly-new attribute rso_id on connection, they may think, "okay, rso_id is not an existing attribute, so I can add and use it freely, without worrying about side effects". But in fact, rso_id would be an existing attribute of connection by that time, and messing the attribute up will affect some Python SP behaviors. But I think the chance is very very slim, because I would be very surprised if any future dev needs to introduce another attribute, and it happens to be literally rso_id

DESCRIPTION.md Outdated

# Release Notes
- v3.14.1(TBD)
- introduce several file operation utilities for easier internal dev
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need release note given the added functions are all internal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, reverted the changelog

use_s3_regional_url=self._use_s3_regional_url,
unsafe_file_write=self._unsafe_file_write,
)
elif self._stage_location_type == STORED_PROC_FS:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have some override mechanism internally so that we don't have to expose this? We could potentially override _create_file_transfer_client.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested and yes, we can override _create_file_transfer_client on the server side for Stored Proc. Does this look good 088b553? It stills defines the STORED_PROC_FS label though. We need a file system category to bypass the check here

if self._stage_location_type not in VALID_STORAGE:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I can actually append STORED_PROC_FS to VALID_STORAGE list on the server side. This way we don't need to expose the tag STORED_PROC_FS at all. Let me go with this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@sfc-gh-zyao sfc-gh-zyao requested a review from sfc-gh-sfan March 14, 2025 00:39
…we can do the insertion to `VALID_STORAGE` on the fly on the server side
…m/_download_stream methods

  - the unit tests are more like sanity check of the implementation, to make sure certain methods are hit vs certain methods are not
Copy link
Contributor

@sfc-gh-mmishchenko sfc-gh-mmishchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sfc-gh-zyao sfc-gh-zyao added the NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md label Mar 21, 2025
@sfc-gh-zyao sfc-gh-zyao merged commit a3229c3 into main Mar 21, 2025
95 of 96 checks passed
@sfc-gh-zyao sfc-gh-zyao deleted the zyao-SNOW-1963078-port-download-upload-methods branch March 21, 2025 19:58
@github-actions github-actions bot locked and limited conversation to collaborators Mar 21, 2025
@sfc-gh-zyao sfc-gh-zyao added the python-sproc-fork-removal PRs related to Python Sproc fork removal label Apr 25, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md python-sproc-fork-removal PRs related to Python Sproc fork removal

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants