Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unified the dataset table name when copy #1731

Closed
goldenxinxing opened this issue Jan 28, 2023 · 3 comments · Fixed by #1772
Closed

Unified the dataset table name when copy #1731

goldenxinxing opened this issue Jan 28, 2023 · 3 comments · Fixed by #1772
Assignees
Labels
bug 🐛 Something isn't working

Comments

@goldenxinxing
Copy link
Contributor

goldenxinxing commented Jan 28, 2023

Describe the bug
It will have different table names when copy dataset to cloud use bellow commands:

  • swcli dataset copy ucf101/version/latest cloud://e2e0128/project/1(table name is 'project/1/...')
  • swcli dataset copy ucf101/version/latest cloud://e2e0128/project/starwhale(table name is 'project/starwhale/...')
    But it is unified on the cloud with the same name 'project/starwhale/...'.
    And it will lead error when dataset load data: ai.starwhale.mlops.exception.SwValidationException: invalid request on subject Starwhale Internal DataStore invalid table name project/starwhale/dataset/ucf101/7m/7mguzpokk7ez3xssi4l3rk4zhesd6we3dvyj4aot/meta,and the reason is that the real name is 'project/1/dataset/ucf101/7m/7mguzpokk7ez3xssi4l3rk4zhesd6we3dvyj4aot/meta'

image
image
image
image

To Reproduce

Expected behavior

Environment

@goldenxinxing goldenxinxing added the bug 🐛 Something isn't working label Jan 28, 2023
@goldenxinxing goldenxinxing changed the title Unified the dataset tableName Unified the dataset tableName when copy Jan 28, 2023
@goldenxinxing goldenxinxing changed the title Unified the dataset tableName when copy Unified the dataset table name when copy Jan 28, 2023
@goldenxinxing
Copy link
Contributor Author

goldenxinxing commented Feb 1, 2023

The simple way to fix it is change the upload logic upload code from datasetVersionEntity = from(projectEntity.getProjectName(), datasetEntity, manifest); to datasetVersionEntity = from(uploadRequest.getProject(), datasetEntity, manifest);
But it will have a new problem when user use projectName again(because client use projectName to update datastore)
Environment.

@anda-ren
Copy link
Member

anda-ren commented Feb 1, 2023

The simple way to fix it is change the upload logic upload code from datasetVersionEntity = from(projectEntity.getProjectName(), datasetEntity, manifest); to datasetVersionEntity = from(uploadRequest.getProject(), datasetEntity, manifest); But it will have a new problem when user use projectName again(because client use projectName to update datastore) Environment.

It seems like a bug for client. Changing server to workaround for that is not proper I think.

@goldenxinxing
Copy link
Contributor Author

The simple way to fix it is change the upload logic upload code from datasetVersionEntity = from(projectEntity.getProjectName(), datasetEntity, manifest); to datasetVersionEntity = from(uploadRequest.getProject(), datasetEntity, manifest); But it will have a new problem when user use projectName again(because client use projectName to update datastore) Environment.

It seems like a bug for client. Changing server to workaround for that is not proper I think.

yes, another way is return the tablename from the response which is returned at first uploading, then datasetTablar receive tableName but not project. cc @tianweidut

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Something isn't working
Projects
None yet
3 participants