Optimize daily_package_downloads with partitioning, clustering, and bootstrap migration #22
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The
daily_package_downloadstable lacks partitioning and clustering, causing 153GB scans for 14-day test queries. This adds ~$20/week in unnecessary costs and will worsen as the table grows.Changes
New optimized model
daily_package_downloads_optimised.sqlwith:PARTITION BY download_datefor temporal pruningCLUSTER BY package, package_versionfor package-specific queries{% if is_incremental() %} -- Standard incremental: new data only SELECT ... FROM {{ ref('file_downloads') }} WHERE download_date >= '{{ latest_partition_date }}' {% else %} -- Bootstrap: copy existing + new data SELECT * FROM {{ ref('daily_package_downloads') }} UNION ALL SELECT ... FROM {{ ref('file_downloads') }} WHERE download_date > '{{ old_table_latest_date }}' {% endif %}Test optimization
whereconfig (153GB → 2.5GB)downloads_and_vulnerabilities.sqland test filesDocumentation
MIGRATION_STRATEGY.md: Bootstrap rationale, deployment steps, rollback pathPIPELINE_REFACTORING_ANALYSIS.md: Recommends separate daily pipeline in same repo for fresher data at similar costExpected impact
Original table remains unchanged for safe rollback.
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.