Skip to content

Comments

feat(scripts): Add script for hiding S3 objects that do not appear in latest tag#3773

Open
effigies wants to merge 3 commits intoOpenNeuroOrg:masterfrom
effigies:script/hide-old-files.py
Open

feat(scripts): Add script for hiding S3 objects that do not appear in latest tag#3773
effigies wants to merge 3 commits intoOpenNeuroOrg:masterfrom
effigies:script/hide-old-files.py

Conversation

@effigies
Copy link
Contributor

Will allow us to efficiently address #3709.

@codecov
Copy link

codecov bot commented Feb 11, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 44.87%. Comparing base (fda81e0) to head (6a064df).
⚠️ Report is 4 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3773      +/-   ##
==========================================
+ Coverage   42.60%   44.87%   +2.26%     
==========================================
  Files         642      642              
  Lines       34488    34497       +9     
  Branches     1557     1653      +96     
==========================================
+ Hits        14695    15480     +785     
+ Misses      19646    18880     -766     
+ Partials      147      137      -10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@effigies effigies force-pushed the script/hide-old-files.py branch from ef63be2 to 351f12f Compare February 11, 2026 15:04
@effigies effigies force-pushed the script/hide-old-files.py branch from 351f12f to f5da737 Compare February 11, 2026 15:17
@effigies
Copy link
Contributor Author

@nellh Figured it be best you have a review of this before I start running. I included a dry-run mode for verification:

git clone https://github.com/openneurodatasets/ds000248 /tmp/ds000248
uv run scripts/s3-hide-old-files.py --dry-run /tmp/ds000248 [--config PATH/TO/secrets-production.yaml]

@effigies
Copy link
Contributor Author

Just checking: should annex-uuid be visible, or does that need to be special-cased?

worker-0:/srv# uv run https://raw.githubusercontent.com/effigies/openneuro/f5da7373a06ec6a2a23c47b29d5b31a99279db95/scripts/s3-hide-old-files.py --dry-run /datasets/ds000006/
2026-02-11 17:14:06 [info     ] Loaded repository              dataset=ds000006 tag=57fed7e1cce88d000bc175df
2026-02-11 17:14:07 [info     ] S3 bucket loaded               bucket=openneuro.org prefix=ds000006/
2026-02-11 17:14:07 [info     ] HIDE                           filename=annex-uuid

@nellh
Copy link
Contributor

nellh commented Feb 11, 2026

Just checking: should annex-uuid be visible, or does that need to be special-cased?

worker-0:/srv# uv run https://raw.githubusercontent.com/effigies/openneuro/f5da7373a06ec6a2a23c47b29d5b31a99279db95/scripts/s3-hide-old-files.py --dry-run /datasets/ds000006/
2026-02-11 17:14:06 [info     ] Loaded repository              dataset=ds000006 tag=57fed7e1cce88d000bc175df
2026-02-11 17:14:07 [info     ] S3 bucket loaded               bucket=openneuro.org prefix=ds000006/
2026-02-11 17:14:07 [info     ] HIDE                           filename=annex-uuid

This should be fine to hide.

Copy link
Contributor

@nellh nellh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested this on a few examples, it looks good to me as long as it is run on datasets where the latest snapshot is fully exported. My main concern with it is applying delete markers to versions that are part of a correctly exported previous version and then git-annex failing to unexport those when the current newest snapshot is exported after the delete markers are added.

@effigies
Copy link
Contributor Author

To be clear, this means we should run check-github-sync first and make sure S3/GitHub are up-to-date first? Or what else would you want to check?

@nellh
Copy link
Contributor

nellh commented Feb 11, 2026

To be clear, this means we should run check-github-sync first and make sure S3/GitHub are up-to-date first? Or what else would you want to check?

Just that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants