Replies: 1 comment
-
@simonwo Yes, I was thinking to build a general interface for all downloaders to use, similar to how publishers now work. Only difference between them should be the way how they fetch their data but the outputs should remain the same (directory output structure, etc). As far as the downloading from Estuary goes, it is possible to download data from estuary as unauthenticated user by providing HTTP downloader is a great idea, I think that's the way to go 👍🏻 |
Beta Was this translation helpful? Give feedback.
-
Continuing the discussion that you started in #1514, @pyropy.
Yes, this does sound like a good idea if we're going to add specialised kinds of downloaders.
There is a downloader in the
pkg/ipfs
package which would fit naturally in there. You should also be able to re-use the tests for it for all downloaders.These three constants should probably move to somewhere in
model
as they used by nearly everything and are not just about downloading: https://github.com/filecoin-project/bacalhau/blob/e39c840695e2700025f974299dc1761828a9a66b/pkg/ipfs/downloader.go#L23-L25The IPFS downloader is a big mix of IPFS-specific things and non-specific things like how the results are output on disk. I'm thinking that new downloaders should probably all output results in the same directory structure, unless we encounter a good reason to not enforce that? So maybe it would be good to extract the directory structuring code from the IPFS downloader and have it as a general algorithm that can use many different types of downloader.
Have you thought about how the client will know what to download? At the moment, results from Estuary are populated with a CID, but Estuary also supplies HTTP retrieval URLs:
https://github.com/application-research/estuary/blob/baec8c4d827a7e13229db63cfbefb7fdfb9be763/util/content.go#L33-L39. You could store one of these in the
URL
section of themodel.StorageSpec
and then use that on the client for downloading.One nice thing about doing it that way is that maybe then you are not writing an Estuary downloader, you are writing an HTTP downloader! And if we add more publishers in the future that only supply HTTP URLs (i.e. like Amazon S3) then the downloader will just automatically work for them. What do you think?
Beta Was this translation helpful? Give feedback.
All reactions