This repository has been archived by the owner on May 15, 2024. It is now read-only.
Thinking through Motion deal preparation #24
hannahhoward
started this conversation in
General
Replies: 1 comment 1 reply
-
It sounds like we're hearing pretty strongly that the reality is that most SPs will want to pull data from the client. That will mean either exposing an HTTP server at the ISP for them to pull from directly, or provide that interface by:
|
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Problem Overview
For the motion MVP, we will received byte data sent to us via POST request.
Data preparation options:
Data preparation through Singularity
Singularity has a well built out pipeline for data preparation. An overview is found here (please read this first as the rest of this solution relies on understanding how this works).
To prepare data for singularity, Motion would first create a single Dataset and Source for the entire Motion instance. Then for each POST request, we could push a new Item into the created Source, and potentially return as an dataID the items primary key. Assuming we ran a dataset worker(s), singularity would automatically create CAR(s) for each item and it's associated Item Parts. We could then use the deal making system in Singularity (which needs to be built out further) to store the CAR files on filecoin.
In terms of storage, Singularity has two options for CAR files
For status and retrieval, we'd need to build a relational data query that could map Item -> ItemParts -> Chunks -> CARs -> StorageDeals
A retrieval would probably look like retrieving each ItemPart for an Item and stitching them back together. Since each ItemPart is ultimately always stored in a single piece, we could execute Lassie retrievals for each item part individually, and make use of all supported protocols to get entire ItemParts in a single request (1GB). We can also support range requests because the range of each ItemPart in the original Item is stored in the database. (I'm going to explore retrieval paths more in depth in another issue)
A few things to consider about this system:
Data Preparation Through RIBS
RIBS offers a blockstore interface that offloads portions of the store to Filecoin deals. An overview of their approach is found here.
Since RIBS assumes a blockstore model, for data preparation for the RIBs case, we could need to convert each POST request of data into an IPLD UnixFS DAG. We could do this by running Kubo or a more lightweight DAG generator. We'd then put every block into the RIBS blockstore, and RIBS would generate CAR files for us as we filled up its staging area.
Most likely, rather than use RIBS for deal making, we'd take the assembled CAR files and put them directly into Singularity's deal making engine. We can hook this up by providing RIBS with a custom ExternalStorageProvider (it's interface offloading to Filecoin) and Singularity's ability to add assembled CARs directly to its deal making engine.
For storage, RIBS can stage data locally, or it already has prototype S3 support. RIBS would simply manage staging storage for us.
RIBS already supports retrieval through Lassie. It retrieves individual blocks, transparently switching between those it has locally and retrieval through Filecoin. Currently, it retrieves blocks individually from Filecoin, which might not scale super well for large amounts of data. Retrieval through RIBS would simply look like executing a DAG traversal for a piece of data backed by the RIBS blockstore.
Status through RIBS might a bit harder since RIBS queries are almost entirely block based. RIBS provides a facility to map a set block hashes of a set of per-CAR groups. We could run this query with all hashes immediately after adding an item, and then store the associated CAR groups for a given item. We could query the status of Group in RIBS, then the the status of actual deal making through Singularity.
A few things to consider about this system:
Proposal: Aim for using Singularity only, use RIBS as needed to get to MVP faster
My gut feeling is that ultimately, Singularity's data prep is more robust, if more complex. There are some additional features to write for it, but writing those features is an investment in Singularity development, which seems like a win for the ecosystem.
That said, for an MVP, RIBS seems like a solution we can use as a prototype so we can focus on building Singularity deal making features, which make up the majority of technical complexity in MVP.
Beta Was this translation helpful? Give feedback.
All reactions