Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(controller): add object storage for dataset #1993

Merged
merged 6 commits into from
Mar 29, 2023

Conversation

anda-ren
Copy link
Member

@anda-ren anda-ren commented Mar 22, 2023

Description

AS-IS:

  1. BLOB files in dataset are saved in storage under the path of dataset version
  2. There is no constraint or convention on the name of BLOB files

AFTER THIS PR:

  1. BLOB files in dataset are saved in storage under the path of dataset
  2. The name of BLOB files are the hash of the files by convention

Why:

The meta info in the same dataset with different versions are save in one datastore table after dataset versioning is done. The file storage of the same dataset with different versions should also has the same storage location in case of tiny change among versions.

Side effect

4 APIs are added:

  1. POST /project/{projectName}/dataset/{datasetName}/hashedBlob/{hash}
  2. HEAD /project/{projectName}/dataset/{datasetName}/hashedBlob/{hash}
  3. GET /project/{projectName}/dataset/{datasetName}/uri
  4. POST /project/{projectName}/dataset/{datasetName}/uri/sign-links

2 APIs are deleted:

  1. GET /project/{projectUrl}/dataset/{datasetUrl}/version/{versionUrl}/link
  2. POST /project/{projectName}/dataset/{datasetName}/version/{version}/sign-links

Verification

asciicast
image

TODO

return absolute path when after saving hashed blob

Modules

  • UI
  • Controller
  • Agent
  • Client
  • Python-SDK
  • Others

Checklist

  • run code format and lint check
  • add unit test
  • add necessary doc

@tianweidut
Copy link
Member

#1940

@anda-ren anda-ren changed the title WIP: refactor(dataset) add object storage for dataset WIP: refactor(controller): add object storage for dataset Mar 22, 2023
@codecov
Copy link

codecov bot commented Mar 22, 2023

Codecov Report

Merging #1993 (30232ba) into main (939d121) will decrease coverage by 0.13%.
The diff coverage is 62.36%.

@@             Coverage Diff              @@
##               main    #1993      +/-   ##
============================================
- Coverage     82.21%   82.09%   -0.13%     
- Complexity     2220     2226       +6     
============================================
  Files           410      412       +2     
  Lines         21586    21612      +26     
  Branches       1207     1211       +4     
============================================
- Hits          17748    17742       -6     
- Misses         3291     3322      +31     
- Partials        547      548       +1     
Flag Coverage Δ
controller 73.45% <61.95%> (-0.14%) ⬇️
standalone 90.56% <100.00%> (-0.12%) ⬇️
unittests 90.56% <100.00%> (-0.12%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
client/starwhale/api/_impl/dataset/loader.py 100.00% <ø> (ø)
client/starwhale/core/dataset/store.py 94.91% <ø> (-0.47%) ⬇️
...bjectstore/HashNamedDatasetObjectStoreFactory.java 0.00% <0.00%> (ø)
...e/mlops/domain/dataset/upload/DatasetUploader.java 70.99% <38.88%> (-6.65%) ⬇️
...ai/starwhale/mlops/domain/storage/UriAccessor.java 60.97% <80.00%> (ø)
...ale/mlops/domain/storage/HashNamedObjectStore.java 84.21% <84.21%> (ø)
...java/ai/starwhale/mlops/api/DatasetController.java 81.02% <88.88%> (-0.23%) ⬇️
client/starwhale/core/dataset/view.py 60.13% <100.00%> (ø)
...starwhale/mlops/domain/dataset/DatasetService.java 83.46% <100.00%> (+0.78%) ⬆️
...aset/index/datastore/DataStoreTableNameHelper.java 100.00% <100.00%> (ø)
... and 1 more

... and 25 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@anda-ren anda-ren changed the title WIP: refactor(controller): add object storage for dataset refactor(controller): add object storage for dataset Mar 27, 2023
@anda-ren anda-ren marked this pull request as ready for review March 27, 2023 06:32
Copy link
Member

@tianweidut tianweidut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tianweidut tianweidut merged commit 9b50838 into star-whale:main Mar 29, 2023
dreamlandliu pushed a commit to dreamlandliu/starwhale that referenced this pull request Mar 29, 2023
@anda-ren anda-ren deleted the ds-obj-store branch August 11, 2023 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants