-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Description
Summary
The local area H5 publishing workflow was failing with httpx.ReadTimeout errors during HuggingFace uploads after ~57 minutes. Additionally, partial failures left production in an inconsistent state with a mix of old and new files.
What Was Implemented (db-work branch)
1. Parallel Build Workers
- Modal Volume staging for persistent cache across runs
- Configurable number of parallel workers (default: 8)
- Build time reduced from ~57 minutes to ~12 minutes
2. Retry Logic
- Added
tenacityfor exponential backoff on HuggingFace uploads - Retries on
httpx.ReadTimeout,httpx.ConnectTimeout,ConnectionError - Max 5 retries with 30-300 second waits
3. Atomic Staging Workflow
- Files upload to
staging/folder first - After all uploads succeed, atomic promotion via
CommitOperationCopy - Single commit replaces all production files
- Cleanup removes staging folder
Current State
- Code:
db-workbranch has the new staging workflow - Files: v1.56.0 files were manually migrated to production paths on both GCS and HuggingFace
- First successful run: 27 minutes (vs 57+ minute failures before)
TODO
- Test the new staging workflow end-to-end with a real run
- Consider if we need the
latest.json/manifest.jsonfiles (currently not used by downstream code) - Merge
db-worktomainafter testing - Clean up old versioned upload functions that are no longer used
Files Changed
modal_app/local_area.py- Parallel workers + staging upload flowmodal_app/worker_script.py- Worker subprocess script (new)policyengine_us_data/utils/data_upload.py- Staging upload/promote/cleanup functionspolicyengine_us_data/utils/manifest.py- Checksum validation (new).github/workflows/local_area_publish.yaml- Added num_workers parameterscripts/migrate_versioned_to_production.py- One-time migration script (new)
Related
- Original failure: https://github.com/PolicyEngine/policyengine-us-data/actions/runs/21462786864
- Successful run with new code: https://github.com/PolicyEngine/policyengine-us-data/actions/runs/21489808887
Metadata
Metadata
Assignees
Labels
No labels