-
Notifications
You must be signed in to change notification settings - Fork 494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add/Remove download options for files uploaded using rsync #3350
Comments
During the 2016-09-08 SBGrid Sprint Planning meeting ( https://docs.google.com/document/d/1wWSdKUOGA1L7UqFsgF3aOs8_9uyjnVpsPAxk7FObOOI/edit ) this issue was given an effort level of "8". The system that @bmckinney and @pameyer are migrating from ( https://data.sbgrid.org ) only permits files to enter and leave the system via rsync because the datasets are relatively big (55 GB or so, I believe). In contrast, files in Dataverse can be downloaded via HTTP either one at a time or in batches (including "all files") as a zip. We have concerns that Glassfish or some other part of the innards of Dataverse will fall over (due to memory pressure or what have you) if someone tries to download all the files in a large datasets as a zip so we are considering disabling that feature based on the fact that dataset supports rsync or perhaps based on the total amount of storage used by that dataset. It's pretty easy to turn off "download all as zip" based on if a dataset supports rsync. I don't think we actually have any mechanisms currently for determining the total storage used by a dataset (this would be useful for implementing quotas, by the way). @scolapasta has argued a few times that it shouldn't matter how data gets in to Dataverse. We should strive to support a scenario where the author uploads files via rsync and a researcher later downloads them via zip (as long as the files are too big). I definitely agree with this aspiration but I think this issue need to be scoped properly to make it into a release. What did we mean by "8"? What are we trying to achieve at this time. I'm going to assign this issue to @djbrooke @scolapasta @bmckinney and myself to discuss further. |
If a user currently tries to download a dataset that's too large, the user gets a zip file with some of the files and an additional file that communicates that the file size was too high to be successfully downloaded. Instead of this experience, it was suggested that the download button is disabled with a message that files can be downloaded individually. This provides a consistent experience for files uploaded using rsync and those files uploaded through other methods. |
Moving to the backlog - this is not a blocker for the SBGrid folks. |
We noticed this in the code and @bmckinney is going to play around with setting this to "-1":
|
Right. This issue is related (error message saved a file within the zip): #2060 |
On the themes of "not caring how files get in", it would seem like it would make sense to hide this UI option (and disable through API as appropriate) if the total dataset size is larger than an admin-configurable threshold (aka - return "-1" for the above method). |
This also intersects with "package files", which are currently showing (intentional with current "experimental" status) non-functional download links. |
The necessary adjustments to the download process for SBGrid are covered in #3348. Once we have the case of users uploading via rsync and wanting to download individual files, we'll create a new issue. |
No description provided.
The text was updated successfully, but these errors were encountered: