-
Notifications
You must be signed in to change notification settings - Fork 319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#5221] feat(python-client): Support OSS for fileset python client #5225
Conversation
Please rebase this PR. |
@xloya |
|
||
@unittest.skip("This test require oss service account") | ||
class TestGvfsWithOSS(TestGvfsWithHDFS): | ||
# Before running this test, please set the make sure aws-bundle-x.jar has been |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aws-bundle-x.jar
-> aliyun-bundle-x.jar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
@@ -567,6 +569,9 @@ def _convert_actual_path( | |||
or storage_location.startswith(f"{StorageType.S3A.value}://") | |||
): | |||
actual_prefix = infer_storage_options(storage_location)["path"] | |||
elif storage_location.startswith(f"{StorageType.OSS.value}:/"): | |||
ops = infer_storage_options(storage_location) | |||
actual_prefix = ops["host"] + ops["path"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's better to check the ops
whether has host
and path
attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
@@ -762,6 +769,13 @@ def _strip_storage_protocol(storage_type: StorageType, path: str): | |||
if storage_type == StorageType.LOCAL: | |||
return path[len(f"{StorageType.LOCAL.value}:") :] | |||
|
|||
# OSS has different behavior than S3 and GCS, if we do not remove the | |||
# protocol, it will always return an empty array. | |||
if storage_type == StorageType.OSS: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better add the uri
example in the method annotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, please explain more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
@@ -22,7 +22,7 @@ black==24.4.2 | |||
twine==5.1.1 | |||
coverage==7.5.1 | |||
pandas==2.0.3 | |||
pyarrow==15.0.2 | |||
pyarrow==17.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need upgrade pyarrow version here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an unexpected change and I have reverted it.
@@ -756,12 +768,42 @@ def _strip_storage_protocol(storage_type: StorageType, path: str): | |||
:param storage_type: The storage type | |||
:param path: The path | |||
:return: The stripped path | |||
|
|||
We will handle OSS differently from S3 and GCS, because OSS has different behavior than S3 and GCS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'd better explain the difference behavior clearly, otherwise someone may not be able to spot the difference at first glance when looking at the examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add sentence
Please take a look at the above example: if we do not remove the protocol (starts with oss://),
it will always return an empty array when we call `oss.ls`, however, if we remove the protocol,
it will return the correct result as expected.
Overall LGTM, but a minor annotation issue. |
…5225) ### What changes were proposed in this pull request? Add support for Aliyun OSS python client. ### Why are the changes needed? It's a need Fix: #5221 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? Test locally. Do the following change to `test_gvfs_with_oss.py` and run `./gradlew :clients:client-python:test -PskipDockerTests=false` <img width="1261" alt="image" src="https://github.com/user-attachments/assets/a57a3f2e-178b-4b50-8219-5a95d0329cd1"> <img width="1211" alt="image" src="https://github.com/user-attachments/assets/b2675a44-d9db-4d11-a505-8e098ef3173f">
What changes were proposed in this pull request?
Add support for Aliyun OSS python client.
Why are the changes needed?
It's a need
Fix: #5221
Does this PR introduce any user-facing change?
N/A
How was this patch tested?
Test locally. Do the following change to
test_gvfs_with_oss.py
and run./gradlew :clients:client-python:test -PskipDockerTests=false