Support partially unzipped archival bags#12144
Open
qqmyers wants to merge 48 commits intoIQSS:developfrom
Open
Support partially unzipped archival bags#12144qqmyers wants to merge 48 commits intoIQSS:developfrom
qqmyers wants to merge 48 commits intoIQSS:developfrom
Conversation
Spec doesn't allow empty lines, dropping whitespace-only lines seems reasonable as well (users can't see from the Dataverse display whether an empty line would appear in bag-info.txt or not if we all whotespace only lines (or whitespace beyond the 78 char wrap limit)
affects manifest and pid-mapping files as well as data file placement
Added unit tests for multilineWrap
This reverts commit 884b81b.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it: Extending #12133, this PR leverages the logic there which allowed setting per file and per dataset size limits above which, datafiles would just be listed in the bag's fetch.txt file (creating a "holey" bag). This PR adds an option to go ahead and send the oversized and place them such that, when the zipped bag is unzipped you will have a complete bag (perhaps an "un-holey one?). This allows admins to avoid arbitrarily large zipped bags (and temp space to create them) while avoiding the need for some active process on the archival platform to retrieve additional (possibly restricted etc.) files before the bag is complete.
Which issue(s) this PR closes:
Special notes for your reviewer: Since this is intended to go out with 6.10, I've gone ahead and changed the .holey.* config options from #12133 (since they don't just apply to holey bags now) and will just document the final names.
Suggestions on how to test this: As this nominally affects/works with any/all archivers, one could test all the variants we have. I'd suggest Harvard do a basic test of the local file archiver and probably the DRS one (it has no changes but it uses the S3 archiver which has been updated) that was created specifically for Harvard use. (Nominally could just regression test the DRS one if there's no interest in using the bag size limits or holey bag concept.)
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Is there a release notes update needed for this change?:
Additional documentation: