-
Notifications
You must be signed in to change notification settings - Fork 319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#2695] feat(doc): Add docs for fileset catalog #2781
Conversation
@coolderli @xloya would you please help to review? Thanks. |
@jerryshao Do we need to introduce how to use the Fileset in the Spark engine? In addition, I have already tested the Tensorflow and submitted an MR: tensorflow/io#1970. After #2779 is resolved, we can support tensorflow. I think we can add a doc like https://help.aliyun.com/zh/hdfs/using-tensorflow-on?spm=a2c4g.11186623.0.i6. What do you think? |
I would suggest to have another doc about gvfs and add Spark, TF related things there. |
|
||
FilesetCatalog filesetCatalog = catalog.asFilesetCatalog(); | ||
NameIdentifier[] identifiers = | ||
filesetCatalog.listFilesets(Namespace.ofFileset("metalake", "catalog", "schema")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's another issue. The metalake
in Namespace seems redundant. The new GravitinoClient, we have declared the name of the current metalake. It is not related to this MR. Never mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it is unrelated, and we are working on the client refactoring things.
@jerryshao Left some comments. Overall, it looks good to me. |
@shaofengshi would you please also check the java client part? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
tabular data and others in Gravitino with a unified way. | ||
|
||
After fileset is created, users can easily access, manage the files/directories through | ||
Fileset's identifier, without needing to know the physical path of the managed datasets. Also, with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe of the managed datasets
is not necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like it is still necessary, it means that the dataset is managed by Gravitino, so users don't need to know the physical path. Fro unmanaged dataset, users still need to know the physical path before visiting the dataset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
What changes were proposed in this pull request?
This PR proposes to add docs for fileset catalog.
Why are the changes needed?
Fix: #2695
Does this PR introduce any user-facing change?
No.
How was this patch tested?
No.