[#2695] feat(doc): Add docs for fileset catalog #2781

jerryshao · 2024-04-02T12:01:11Z

What changes were proposed in this pull request?

This PR proposes to add docs for fileset catalog.

Why are the changes needed?

Fix: #2695

Does this PR introduce any user-facing change?

No.

How was this patch tested?

No.

jerryshao · 2024-04-02T13:27:28Z

@coolderli @xloya would you please help to review? Thanks.

docs/hadoop-catalog.md

coolderli · 2024-04-02T14:34:25Z

@jerryshao Do we need to introduce how to use the Fileset in the Spark engine? In addition, I have already tested the Tensorflow and submitted an MR: tensorflow/io#1970. After #2779 is resolved, we can support tensorflow. I think we can add a doc like https://help.aliyun.com/zh/hdfs/using-tensorflow-on?spm=a2c4g.11186623.0.i6. What do you think?

jerryshao · 2024-04-02T14:38:12Z

@jerryshao Do we need to introduce how to use the Fileset in the Spark engine? In addition, I have already tested the Tensorflow and submitted an MR: tensorflow/io#1970. After #2779 is resolved, we can support tensorflow. I think we can add a doc like https://help.aliyun.com/zh/hdfs/using-tensorflow-on?spm=a2c4g.11186623.0.i6. What do you think?

I would suggest to have another doc about gvfs and add Spark, TF related things there.

docs/hadoop-catalog.md

docs/manage-fileset-metadata-using-gravitino.md

docs/hadoop-catalog.md

docs/manage-fileset-metadata-using-gravitino.md

coolderli · 2024-04-03T03:50:30Z

docs/manage-fileset-metadata-using-gravitino.md

+
+FilesetCatalog filesetCatalog = catalog.asFilesetCatalog();
+NameIdentifier[] identifiers =
+    filesetCatalog.listFilesets(Namespace.ofFileset("metalake", "catalog", "schema"));


I think there's another issue. The metalake in Namespace seems redundant. The new GravitinoClient, we have declared the name of the current metalake. It is not related to this MR. Never mind.

Yeah, it is unrelated, and we are working on the client refactoring things.

coolderli · 2024-04-03T03:53:49Z

@jerryshao Left some comments. Overall, it looks good to me.

jerryshao · 2024-04-03T06:02:36Z

@shaofengshi would you please also check the java client part? Thanks.

docs/manage-fileset-metadata-using-gravitino.md

docs/index.md

shaofengshi

LGTM

docs/manage-fileset-metadata-using-gravitino.md

xloya · 2024-04-03T07:03:45Z

docs/manage-fileset-metadata-using-gravitino.md

+tabular data and others in Gravitino with a unified way.
+
+After fileset is created, users can easily access, manage the files/directories through
+Fileset's identifier, without needing to know the physical path of the managed datasets. Also, with


Maybe of the managed datasets is not necessary.

I feel like it is still necessary, it means that the dataset is managed by Gravitino, so users don't need to know the physical path. Fro unmanaged dataset, users still need to know the physical path before visiting the dataset.

docs/manage-fileset-metadata-using-gravitino.md

qqqttt123

LGTM.

jerryshao self-assigned this Apr 2, 2024

jerryshao requested review from yuqi1129, mchades and qqqttt123 April 2, 2024 13:26

mchades reviewed Apr 2, 2024

View reviewed changes

docs/hadoop-catalog.md Outdated Show resolved Hide resolved

docs/hadoop-catalog.md Outdated Show resolved Hide resolved

qqqttt123 reviewed Apr 2, 2024

View reviewed changes

docs/hadoop-catalog.md Outdated Show resolved Hide resolved

docs/hadoop-catalog.md Outdated Show resolved Hide resolved

shaofengshi force-pushed the issue-2695 branch from 788c002 to cb9f036 Compare April 3, 2024 02:28

jerryshao force-pushed the issue-2695 branch from cb9f036 to d0dd632 Compare April 3, 2024 03:17

qqqttt123 reviewed Apr 3, 2024

View reviewed changes

docs/hadoop-catalog.md Outdated Show resolved Hide resolved

docs/manage-fileset-metadata-using-gravitino.md Show resolved Hide resolved

docs/hadoop-catalog.md Outdated Show resolved Hide resolved

coolderli reviewed Apr 3, 2024

View reviewed changes

jerryshao added 4 commits April 3, 2024 12:13

Add docs for fileset catalog

4e79f50

Polish the code

2b4dff7

Polish the code

1a06753

Polish the code

874d655

jerryshao force-pushed the issue-2695 branch from d0dd632 to 874d655 Compare April 3, 2024 04:13

jerryshao requested a review from shaofengshi April 3, 2024 06:02

qqqttt123 reviewed Apr 3, 2024

View reviewed changes

docs/manage-fileset-metadata-using-gravitino.md Outdated Show resolved Hide resolved

qqqttt123 reviewed Apr 3, 2024

View reviewed changes

docs/index.md Outdated Show resolved Hide resolved

shaofengshi previously approved these changes Apr 3, 2024

View reviewed changes

qqqttt123 mentioned this pull request Apr 3, 2024

[#2640] feat(docs): Add the doc for gvfs #2791

Merged

Address the comments

601bfe6

jerryshao dismissed shaofengshi’s stale review via 601bfe6 April 3, 2024 06:54

qqqttt123 reviewed Apr 3, 2024

View reviewed changes

docs/manage-fileset-metadata-using-gravitino.md Outdated Show resolved Hide resolved

qqqttt123 reviewed Apr 3, 2024

View reviewed changes

docs/manage-fileset-metadata-using-gravitino.md Show resolved Hide resolved

xloya reviewed Apr 3, 2024

View reviewed changes

Address the comments

b2bcd1f

qqqttt123 approved these changes Apr 3, 2024

View reviewed changes

jerryshao merged commit f119d90 into apache:main Apr 3, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[#2695] feat(doc): Add docs for fileset catalog #2781

[#2695] feat(doc): Add docs for fileset catalog #2781

jerryshao commented Apr 2, 2024

jerryshao commented Apr 2, 2024

coolderli commented Apr 2, 2024

jerryshao commented Apr 2, 2024

coolderli Apr 3, 2024

jerryshao Apr 3, 2024

coolderli commented Apr 3, 2024

jerryshao commented Apr 3, 2024

shaofengshi left a comment

xloya Apr 3, 2024

jerryshao Apr 3, 2024

xloya Apr 3, 2024

qqqttt123 left a comment

[#2695] feat(doc): Add docs for fileset catalog #2781

[#2695] feat(doc): Add docs for fileset catalog #2781

Conversation

jerryshao commented Apr 2, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

jerryshao commented Apr 2, 2024

coolderli commented Apr 2, 2024

jerryshao commented Apr 2, 2024

coolderli Apr 3, 2024

Choose a reason for hiding this comment

jerryshao Apr 3, 2024

Choose a reason for hiding this comment

coolderli commented Apr 3, 2024

jerryshao commented Apr 3, 2024

shaofengshi left a comment

Choose a reason for hiding this comment

xloya Apr 3, 2024

Choose a reason for hiding this comment

jerryshao Apr 3, 2024

Choose a reason for hiding this comment

xloya Apr 3, 2024

Choose a reason for hiding this comment

qqqttt123 left a comment

Choose a reason for hiding this comment