[FEATURE] Add Gravitino Hadoop File System #1616

xloya · 2024-01-19T09:39:19Z

Describe the feature

When Gravitino supports managing fileset and for storage systems that support reading and writing through the Hadoop File System interface, we can provide a Gravitino File System. In this file system, we would implement access to the Gravitino server to obtain the actual storage location and act as a proxy file system for accessing these storage systems. In this way, engines such as Spark can be easily accessed and used.

Motivation

No response

Describe the solution

No response

Additional context

No response

xloya · 2024-01-19T09:41:31Z

@jerryshao What do you think about this issue? Is it possible to consider implementing a File System framework first, and then consider linking it with the Server later?

jerryshao · 2024-01-19T09:44:37Z

I think we can have a prototype/skeleton first, but I'm afraid it cannot work until the fileset REST API is ready.

jerryshao · 2024-01-19T09:48:28Z

You can have some basic design (so we can discuss more about it) if you want to take a shot 😄 .

xloya · 2024-01-19T09:52:38Z

Sure, I can open a prototype patch first next week, please take a look at that time

### What changes were proposed in this pull request? This PR proposes to add the code skeleton for gravitino hadoop file system to proxy hdfs file system. ### Why are the changes needed? Fix: #1616 ### How was this patch tested? Add uts to cover the main interface method. --------- Co-authored-by: xiaojiebao <xiaojiebao@xiaomi.com>

…apache#1700) ### What changes were proposed in this pull request? This PR proposes to add the code skeleton for gravitino hadoop file system to proxy hdfs file system. ### Why are the changes needed? Fix: apache#1616 ### How was this patch tested? Add uts to cover the main interface method. --------- Co-authored-by: xiaojiebao <xiaojiebao@xiaomi.com>

zuston · 2024-04-30T03:43:16Z

From the doc, I don't find the real cases to show the usage about this gvfs. Is this a virtual unified namespace for different remote filesystems? And If using the managed(is local file on gravitino server?) fileset, how to access it for spark on Yarn ?

jerryshao · 2024-04-30T03:58:48Z

@zuston the usage of gvfs is exactly the same as using hdfs, the major difference is that the path is a virtual path, not a physical path. You can use it from Spark, it is a Hadoop compatible filesystem, like you use hdfs, s3, oss and others.

Here's the doc, it has more details on how to use it with Spark https://datastrato.ai/docs/0.5.0/how-to-use-gvfs.

zuston · 2024-04-30T07:19:20Z

Yes. I have seen the doc and understand the hcfs interface that gvfs used. And I want to know the gvfs design motivation, I'm evaluating wether this could be used for unified namespace. From current doc, this feature is only for the remote filesystem one-one mapping.

jerryshao mentioned this issue Jan 19, 2024

[EPIC] Add file type catalog support for Gravitino #1241

Closed

18 tasks

xloya mentioned this issue Jan 25, 2024

[#1616] feat(fileset): Add gravitino hadoop file system support #1700

Merged

jerryshao added this to the Gravitino 0.5.0 milestone Feb 4, 2024

xloya mentioned this issue Mar 22, 2024

[Subtask] Add the gvfs user doc #2640

Closed

jerryshao closed this as completed in #1700 Mar 22, 2024

yuqi1129 assigned xloya Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Add Gravitino Hadoop File System #1616

[FEATURE] Add Gravitino Hadoop File System #1616

xloya commented Jan 19, 2024 •

edited

Loading

xloya commented Jan 19, 2024

jerryshao commented Jan 19, 2024

jerryshao commented Jan 19, 2024 •

edited

Loading

xloya commented Jan 19, 2024

zuston commented Apr 30, 2024

jerryshao commented Apr 30, 2024

zuston commented Apr 30, 2024

[FEATURE] Add Gravitino Hadoop File System #1616

[FEATURE] Add Gravitino Hadoop File System #1616

Comments

xloya commented Jan 19, 2024 • edited Loading

Describe the feature

Motivation

Describe the solution

Additional context

xloya commented Jan 19, 2024

jerryshao commented Jan 19, 2024

jerryshao commented Jan 19, 2024 • edited Loading

xloya commented Jan 19, 2024

zuston commented Apr 30, 2024

jerryshao commented Apr 30, 2024

zuston commented Apr 30, 2024

xloya commented Jan 19, 2024 •

edited

Loading

jerryshao commented Jan 19, 2024 •

edited

Loading