Skip to content

Local Area H5 Publishing: Implemented Atomic Staging Workflow #490

@baogorek

Description

@baogorek

Summary

The local area H5 publishing workflow was failing with httpx.ReadTimeout errors during HuggingFace uploads after ~57 minutes. Additionally, partial failures left production in an inconsistent state with a mix of old and new files.

What Was Implemented (db-work branch)

1. Parallel Build Workers

  • Modal Volume staging for persistent cache across runs
  • Configurable number of parallel workers (default: 8)
  • Build time reduced from ~57 minutes to ~12 minutes

2. Retry Logic

  • Added tenacity for exponential backoff on HuggingFace uploads
  • Retries on httpx.ReadTimeout, httpx.ConnectTimeout, ConnectionError
  • Max 5 retries with 30-300 second waits

3. Atomic Staging Workflow

  • Files upload to staging/ folder first
  • After all uploads succeed, atomic promotion via CommitOperationCopy
  • Single commit replaces all production files
  • Cleanup removes staging folder

Current State

  • Code: db-work branch has the new staging workflow
  • Files: v1.56.0 files were manually migrated to production paths on both GCS and HuggingFace
  • First successful run: 27 minutes (vs 57+ minute failures before)

TODO

  • Test the new staging workflow end-to-end with a real run
  • Consider if we need the latest.json / manifest.json files (currently not used by downstream code)
  • Merge db-work to main after testing
  • Clean up old versioned upload functions that are no longer used

Files Changed

  • modal_app/local_area.py - Parallel workers + staging upload flow
  • modal_app/worker_script.py - Worker subprocess script (new)
  • policyengine_us_data/utils/data_upload.py - Staging upload/promote/cleanup functions
  • policyengine_us_data/utils/manifest.py - Checksum validation (new)
  • .github/workflows/local_area_publish.yaml - Added num_workers parameter
  • scripts/migrate_versioned_to_production.py - One-time migration script (new)

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions