Skip to content

Support partially unzipped archival bags#12144

Open
qqmyers wants to merge 48 commits intoIQSS:developfrom
GlobalDataverseCommunityConsortium:DANS-2157_holey_bags3
Open

Support partially unzipped archival bags#12144
qqmyers wants to merge 48 commits intoIQSS:developfrom
GlobalDataverseCommunityConsortium:DANS-2157_holey_bags3

Conversation

@qqmyers
Copy link
Member

@qqmyers qqmyers commented Feb 4, 2026

What this PR does / why we need it: Extending #12133, this PR leverages the logic there which allowed setting per file and per dataset size limits above which, datafiles would just be listed in the bag's fetch.txt file (creating a "holey" bag). This PR adds an option to go ahead and send the oversized and place them such that, when the zipped bag is unzipped you will have a complete bag (perhaps an "un-holey one?). This allows admins to avoid arbitrarily large zipped bags (and temp space to create them) while avoiding the need for some active process on the archival platform to retrieve additional (possibly restricted etc.) files before the bag is complete.

Which issue(s) this PR closes:

  • Closes #

Special notes for your reviewer: Since this is intended to go out with 6.10, I've gone ahead and changed the .holey.* config options from #12133 (since they don't just apply to holey bags now) and will just document the final names.

Suggestions on how to test this: As this nominally affects/works with any/all archivers, one could test all the variants we have. I'd suggest Harvard do a basic test of the local file archiver and probably the DRS one (it has no changes but it uses the S3 archiver which has been updated) that was created specifically for Harvard use. (Nominally could just regression test the DRS one if there's no interest in using the bag size limits or holey bag concept.)

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

qqmyers and others added 30 commits December 6, 2025 18:26
Spec doesn't allow empty lines, dropping whitespace-only lines seems
reasonable as well (users can't see from the Dataverse display whether
an empty line would appear in bag-info.txt or not if we all whotespace
only lines (or whitespace beyond the 78 char wrap limit)
affects manifest and pid-mapping files as well as data file placement
Added unit tests for multilineWrap
@qqmyers qqmyers added this to the 6.10 milestone Feb 4, 2026
@qqmyers qqmyers added the Size: 10 A percentage of a sprint. 7 hours. label Feb 4, 2026
@coveralls
Copy link

coveralls commented Feb 4, 2026

Coverage Status

coverage: 24.493% (+0.2%) from 24.334%
when pulling 5c82ab8 on GlobalDataverseCommunityConsortium:DANS-2157_holey_bags3
into 180aa55 on IQSS:develop.

@qqmyers qqmyers marked this pull request as ready for review February 4, 2026 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Size: 10 A percentage of a sprint. 7 hours.

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants