-
Notifications
You must be signed in to change notification settings - Fork 319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Add Gravitino Hadoop File System #1616
Comments
@jerryshao What do you think about this issue? Is it possible to consider implementing a File System framework first, and then consider linking it with the Server later? |
I think we can have a prototype/skeleton first, but I'm afraid it cannot work until the fileset REST API is ready. |
You can have some basic design (so we can discuss more about it) if you want to take a shot 😄 . |
Sure, I can open a prototype patch first next week, please take a look at that time |
### What changes were proposed in this pull request? This PR proposes to add the code skeleton for gravitino hadoop file system to proxy hdfs file system. ### Why are the changes needed? Fix: #1616 ### How was this patch tested? Add uts to cover the main interface method. --------- Co-authored-by: xiaojiebao <xiaojiebao@xiaomi.com>
…apache#1700) ### What changes were proposed in this pull request? This PR proposes to add the code skeleton for gravitino hadoop file system to proxy hdfs file system. ### Why are the changes needed? Fix: apache#1616 ### How was this patch tested? Add uts to cover the main interface method. --------- Co-authored-by: xiaojiebao <xiaojiebao@xiaomi.com>
From the doc, I don't find the real cases to show the usage about this gvfs. Is this a virtual unified namespace for different remote filesystems? And If using the managed(is local file on gravitino server?) fileset, how to access it for spark on Yarn ? |
@zuston the usage of gvfs is exactly the same as using hdfs, the major difference is that the path is a virtual path, not a physical path. You can use it from Spark, it is a Hadoop compatible filesystem, like you use hdfs, s3, oss and others. Here's the doc, it has more details on how to use it with Spark https://datastrato.ai/docs/0.5.0/how-to-use-gvfs. |
Yes. I have seen the doc and understand the hcfs interface that gvfs used. And I want to know the gvfs design motivation, I'm evaluating wether this could be used for unified namespace. From current doc, this feature is only for the remote filesystem one-one mapping. |
Describe the feature
When Gravitino supports managing fileset and for storage systems that support reading and writing through the Hadoop File System interface, we can provide a Gravitino File System. In this file system, we would implement access to the Gravitino server to obtain the actual storage location and act as a proxy file system for accessing these storage systems. In this way, engines such as Spark can be easily accessed and used.
Motivation
No response
Describe the solution
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: