Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#5221] feat(python-client): Support OSS for fileset python client #5225

Merged
merged 155 commits into from
Oct 24, 2024

Conversation

yuqi1129
Copy link
Contributor

@yuqi1129 yuqi1129 commented Oct 23, 2024

What changes were proposed in this pull request?

Add support for Aliyun OSS python client.

Why are the changes needed?

It's a need

Fix: #5221

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

Test locally. Do the following change to test_gvfs_with_oss.py and run ./gradlew :clients:client-python:test -PskipDockerTests=false

image image

@jerryshao
Copy link
Contributor

Please rebase this PR.

@yuqi1129 yuqi1129 requested a review from xloya October 24, 2024 06:05
@yuqi1129
Copy link
Contributor Author

@xloya
Please help review this PR, thanks.

@yuqi1129 yuqi1129 added the 0.7.0 Release v0.7.0 label Oct 24, 2024
@yuqi1129 yuqi1129 changed the title [#5211] feat(python-client): Support OSS for fileset python client [#5221] feat(python-client): Support OSS for fileset python client Oct 24, 2024
@jerryshao jerryshao added branch-0.7 Automatically cherry-pick commit to branch-0.7 and removed 0.7.0 Release v0.7.0 labels Oct 24, 2024
@yuqi1129 yuqi1129 added branch-0.7 Automatically cherry-pick commit to branch-0.7 and removed branch-0.7 Automatically cherry-pick commit to branch-0.7 labels Oct 24, 2024
@jerryshao jerryshao added the need backport Issues that need to backport to another branch label Oct 24, 2024

@unittest.skip("This test require oss service account")
class TestGvfsWithOSS(TestGvfsWithHDFS):
# Before running this test, please set the make sure aws-bundle-x.jar has been
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aws-bundle-x.jar -> aliyun-bundle-x.jar

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@@ -567,6 +569,9 @@ def _convert_actual_path(
or storage_location.startswith(f"{StorageType.S3A.value}://")
):
actual_prefix = infer_storage_options(storage_location)["path"]
elif storage_location.startswith(f"{StorageType.OSS.value}:/"):
ops = infer_storage_options(storage_location)
actual_prefix = ops["host"] + ops["path"]
Copy link
Collaborator

@xloya xloya Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's better to check the ops whether has host and path attributes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@@ -762,6 +769,13 @@ def _strip_storage_protocol(storage_type: StorageType, path: str):
if storage_type == StorageType.LOCAL:
return path[len(f"{StorageType.LOCAL.value}:") :]

# OSS has different behavior than S3 and GCS, if we do not remove the
# protocol, it will always return an empty array.
if storage_type == StorageType.OSS:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better add the uri example in the method annotation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, please explain more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@@ -22,7 +22,7 @@ black==24.4.2
twine==5.1.1
coverage==7.5.1
pandas==2.0.3
pyarrow==15.0.2
pyarrow==17.0.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need upgrade pyarrow version here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an unexpected change and I have reverted it.

@@ -756,12 +768,42 @@ def _strip_storage_protocol(storage_type: StorageType, path: str):
:param storage_type: The storage type
:param path: The path
:return: The stripped path

We will handle OSS differently from S3 and GCS, because OSS has different behavior than S3 and GCS.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd better explain the difference behavior clearly, otherwise someone may not be able to spot the difference at first glance when looking at the examples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add sentence

Please take a look at the above example: if we do not remove the protocol (starts with oss://),
it will always return an empty array when we call `oss.ls`, however, if we remove the protocol,
it will return the correct result as expected.

@xloya
Copy link
Collaborator

xloya commented Oct 24, 2024

Overall LGTM, but a minor annotation issue.

@jerryshao jerryshao merged commit 13c4625 into apache:main Oct 24, 2024
21 checks passed
github-actions bot pushed a commit that referenced this pull request Oct 24, 2024
…5225)

### What changes were proposed in this pull request?

Add support for Aliyun OSS python client.

### Why are the changes needed?

It's a need

Fix: #5221

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

Test locally. Do the following change to `test_gvfs_with_oss.py` and run
`./gradlew :clients:client-python:test -PskipDockerTests=false`


<img width="1261" alt="image"
src="https://github.com/user-attachments/assets/a57a3f2e-178b-4b50-8219-5a95d0329cd1">
<img width="1211" alt="image"
src="https://github.com/user-attachments/assets/b2675a44-d9db-4d11-a505-8e098ef3173f">
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-0.7 Automatically cherry-pick commit to branch-0.7 need backport Issues that need to backport to another branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Support OSS fileset python client
3 participants