-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data Locality Module (DLM) support (an alternative to LOCKSS?) #3403
Comments
On Friday @joehand presented Dat ( http://dat-data.com ) to me, @djbrooke @scolapasta @landreev @sekmiller @kcondon and @bsilverstein . Joe knows all about LOCKSS and explained the vision for using Dat to replicate data to various data centers. I showed Joe https://data.sbgrid.org/dataset/1/ (the system we are migrating to Dataverse) and how there are rsync URLs to sites in the US, Sweden, Uruguay, and China: Joe's reaction was that you could replace all those rsync links with a single Dat link and have the data in multiple data centers. While this is interesting, it's not quite what we have in mind. The technology we plan to use to replicate data from one site to others is Globus/GridFTP. (I still owe @pameyer feedback on his DLM write up at https://docs.google.com/document/d/1VCblZjSnC71MuX78GBKDuPk0HyweswNkJxbzinMq2Kw/edit?ts=57f5026a ). The next logical step in my mind is to open source the DLM code so developers like Joe can see how it works. Note that #3249 is the issue tracking what the end users sees on a dataset page in terms how how to download the files (such as rsync). There's a big difference between pushing data around between data centers and the download mechanisms available to end users for ultimately downloading the data from their data center of choice (probably the one geographically closest to them). Anyway, great presentation by Joe. Dat is a very interesting and promising new technology. |
Maybe this would of interest to @axfelix over at Compute Canada |
Yup, we've done some work with Globus, would be nice if Dataverse somehow implemented the Globus APIs in a way that facilitated easy transfer of data from a Dataverse instance to a Globus Endpoint. |
Similar to work Dataverse did with Open Science Framework... E. On Mon, Nov 7, 2016 at 8:44 AM, axfelix notifications@github.com wrote:
|
@axfelix - Globus is the first protocol we're working on. This is more 'move dataset from a Globus Endpoint in the same installation as Dataverse to another Endpoint' than 'Dataverse with no endpoint into Globus'. I'm not fully up to speed on the Open Science Framework, but I imagine @pdurbin could bring me up to speed if necessary. |
Related: #4396 |
@pameyer I think visuals help so I'm attaching slide 7 from the slides you used during the Biomedical Dataverse: Structural Biology and Beyond talk over the summer during the 2018 Dataverse Community Meeting: |
Overall issue for supporting Data Locality Module (DLM) within Dataverse. Why would a user want to install/configure a DLM? - replicating datasets to remote storage sites (preservation, facilitating remote access, facilitating local access at the remote sites, data close to compute resources).
Similar to DCM this will be another separate component coupled to Dataverse application communicating with HTTP API calls and sharing a filesystem. This will most likely need more changes on the Dataverse end, since Dataverse's "model of the world" will need to be expanded ("remote sites" and "storage locations for datasets"), and there will be more administrator and user level interactions.
We should be getting this more specified next week.
The text was updated successfully, but these errors were encountered: